https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • a

    alert-beach-77662

    11/30/2021, 9:43 AM
    hi
    b
    • 2
    • 1
  • h

    high-hospital-85984

    11/30/2021, 1:33 PM
    It seems like the MongoDB source prints the credentials in the logs along with the stack trace
    b
    m
    m
    • 4
    • 15
  • a

    abundant-policeman-24127

    11/30/2021, 5:12 PM
    Hello everyone, thanks for the help so far in #all-things-deployment. We successfully deployed datahub so far! 🙂 I'd like to ask a question, about ingestion/lineage. We hooked up lineage to airflow and it has been working flawlessly so far. What we have is that some tasks will fetch data from an API Rest and save this data in a database, having inlets as the api and outlets as the database. I can declare the outlet with a Dataset() obj, but how would I go about the API (inlet)? Should I declare it as a dataset or is there a more specific keyword that will mark it as an api in the UI?
    b
    l
    b
    • 4
    • 4
  • f

    fancy-sundown-164

    11/30/2021, 5:25 PM
    Has anyone found a faster way to run unit tests? Running testQuick (
    ../gradlew :metadata-ingestion:testQuick
    ) with a modified
    build.grade
    file to only run the tests I want takes 3-4 minutes, but probably 99% of this is just loading dependencies. Is there some way to cache this step or something between runs?
    👀 1
    m
    • 2
    • 5
  • w

    wooden-gpu-7761

    12/02/2021, 2:59 AM
    https://datahubspace.slack.com/archives/CUMUWQU66/p1637644851019500 Shedding light on this again and would be great if I could receive feedback! 🙏
    l
    • 2
    • 8
  • r

    rich-policeman-92383

    12/02/2021, 10:34 AM
    Hi While ingestion metadata from oracle database we are getting below error.
    Copy code
    dhubv0812/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1215, in _fetchone_impl
    return self.cursor.fetchone()
    
    DatabaseError: (cx_Oracle.DatabaseError) DPI-1037: column at array position 0 fetched with error 1406
    s
    b
    +2
    • 5
    • 10
  • o

    orange-flag-48535

    12/02/2021, 10:37 AM
    How do you create a folder like hierarchy when ingesting into Datahub, like the demo page does with prod > bigquery > bigquery-public-data? https://demo.datahubproject.io/browse/dataset/prod/bigquery/bigquery-public-data
    s
    • 2
    • 1
  • b

    billions-receptionist-60247

    12/02/2021, 12:31 PM
    Hi I'm new to datahub. I want to know how metadata is extracted from source (Ex: hive).
    i
    • 2
    • 5
  • b

    brief-lizard-77958

    12/02/2021, 3:03 PM
    Is it possible to add links for an entities documentation by defining them in the json you ingest? Something like
    "chartUrl": "Wikipedia": "<https://en.wikipedia.org/>",
    b
    • 2
    • 2
  • r

    red-pizza-28006

    12/02/2021, 3:58 PM
    We use confluent Kafka as our Central message bus. Is there anyone who can share an example of their Kafka ingestion, I am not able to get it working, and pretty sure it has to do something with how i am passing the credentials.
    Copy code
    source:
      type: "kafka"
      config:
        connection:
          bootstrap: "bootstrap:9092"
          consumer_config:
            sasl.username: "youthoughtiwasreal"
            sasl.password: "icantsharethat"
            security.protocol: "sasl_plaintext"
          schema_registry_url: "<https://sr:8081>"
          schema_registry_config:
            <http://basic.auth.user.info|basic.auth.user.info>: nonono:istillcannotshare
    
    sink:
      type: "datahub-rest"
      config:
        server: "<http://gmsendpoint>"
    b
    s
    • 3
    • 7
  • m

    millions-notebook-72121

    12/02/2021, 5:04 PM
    Hi all! I am playing around with the Business Glossary function - I can ingest the glossary fine, but was wondering if we can link glossary terms to other entities we've already ingested, like datasets? In the RFC it mentions that, and in the UI I can see a tab
    Related entities
    , which I assume is for that, but could not find anything related to this in the docs nor in the examples. Is this implemented yet?
    b
    • 2
    • 8
  • m

    mammoth-bear-12532

    12/02/2021, 7:20 PM
    Hi folks, we just released (pypi)
    0.8.17.4
    that has a few important fixes for mongo, snowflake and the delete cli (https://github.com/acryldata/datahub/releases/tag/v0.8.17.4) 🧵
    plus1 1
    r
    a
    • 3
    • 6
  • p

    plain-farmer-27314

    12/03/2021, 4:31 PM
    Hi, we recently updated our datahub deployment to 0.8.17 and are getting the following error when trying to ingest anything:
    Copy code
    {'error': 'Unable to emit metadata to DataHub GMS',
                   'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
                            'message': 'org.apache.kafka.common.errors.SerializationException: Error serializing Avro message',
                            'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: '
                                          'org.apache.kafka.common.errors.SerializationException: Error serializing Avro message\n'
                                          '\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n'
                                          '\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)\n'
                                          '\tat com.linkedin.metadata.resources.entity.AspectResource.ingestProposal(AspectResource.java:132)\n'
                                          '\tat sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source)\n'
                                          '\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n'
                                          '\tat java.lang.reflect.Method.invoke(Method.java:498)\n'
                                          '\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.java:172)\n'
                                          '\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.invoke(RestLiMethodInvoker.java:326)\n'
    Kind of a vague error, but I can provide more context if needed
    m
    • 2
    • 2
  • b

    brief-wolf-70822

    12/03/2021, 7:09 PM
    Hey team! I'm also seeing this issue, where profiling of views seems to have issues. After profiling, all the column properties in Stats are "unknown". The only thing that comes through are the row and column counts. Is this a known issue?
    h
    • 2
    • 2
  • s

    stocky-television-65849

    12/03/2021, 8:00 PM
    I am trying to connect to a remote aws redshift database in local docker using:
    Copy code
    source:
      type: redshift
      config:
        # Coordinates
        host_port: xxx:5439
        database: test
    
        # Credentials
        username: x
        password: x
    
        # Options
        options:
        include_views: True # whether to include views, defaults to True
        include_tables: True # whether to include views, defaults to True
    
    sink:
      type: "datahub-rest"
      config:
        server: "<http://localhost:8080>"
    I got the error
    Copy code
    ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /config (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f906d430520>: Failed to establish a new connection: [Errno 61] Connection refused'))
    m
    • 2
    • 9
  • s

    stocky-television-65849

    12/03/2021, 8:05 PM
    the same yml running on ec2 got the error:
    Copy code
    OperationalError: (psycopg2.OperationalError) server certificate for "*.<http://xxxxxxx.us-east-1.redshift.amazonaws.com|xxxxxxx.us-east-1.redshift.amazonaws.com>" does not match host name "xxxxx"
    d
    • 2
    • 6
  • d

    damp-minister-31834

    12/04/2021, 2:39 AM
    Hi all! Has datahub been integrated with azkaban? Now I only see the usage introduction of integrating with airflow, does it support azkaban now? if does, where is the usage introduction?
    b
    l
    +2
    • 5
    • 6
  • b

    best-balloon-56

    12/06/2021, 8:09 AM
    Does datahub support ingestion of metadata from avro schemas? Like confluent or hortonworks schema registry?
    p
    b
    • 3
    • 2
  • b

    brief-lizard-77958

    12/06/2021, 12:35 PM
    How do I define a users password when I ingest a new user? I used to be able to log in with an ingested username and a random password in the old version of datahub. Posting the json I ingest when adding a new user currently
    b
    s
    +3
    • 6
    • 16
  • s

    stocky-television-65849

    12/06/2021, 5:22 PM
    @big-carpet-38439 You mentioned in a YT video that the lineage supports redshift dataset now. Is it under the latest release?
    l
    p
    d
    • 4
    • 5
  • r

    rhythmic-sundown-12093

    12/07/2021, 4:38 AM
    Hi, team, sqlalchemy does not support the data type: geometry
    Copy code
    lib/python3.8/site-packages/sqlalchemy/dialects/mysql/reflection.py:192: SAWarning: Did not recognize type 'geometry' of column 'GeoShape'
    i
    • 2
    • 6
  • b

    billions-receptionist-60247

    12/07/2021, 8:18 AM
    Hi i have installed postgres source using
    pip3 install 'acryl-datahub[postgres]'
    still its showing disabled
    s
    • 2
    • 5
  • c

    colossal-easter-99672

    12/07/2021, 12:06 PM
    Hello team. Is there way to add any additional
    upstreamLineage
    info via
    MetadataChangeProposalWrapper
    ? i mean add additional info and dont rewrite existing.
    s
    b
    • 3
    • 2
  • a

    agreeable-thailand-43234

    12/08/2021, 2:12 AM
    Hi guys! i’m trying to build the project as per
    <https://datahubproject.io/docs/metadata-ingestion/developing/>
    but i got the following error:
    Copy code
    [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] * What went wrong:
    15:09:05.346 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] Execution failed for task ':metadata-ingestion:checkPythonVersion'.
    15:09:05.346 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] > Process 'command 'python3'' finished with non-zero exit value 134
    I can see that task is only checking for the python version install to be >= 3.6
    Copy code
    task checkPythonVersion(type: Exec) {
      commandLine python_executable, '-c', 'import sys; assert sys.version_info >= (3, 6)'
    }
    I have installed python 3.6 from conda distribution…any idea?
    b
    • 2
    • 6
  • o

    orange-flag-48535

    12/08/2021, 10:19 AM
    I see that Datahub has a NumberType, but there is no specific FloatType, DoubleType etc. Any plans to add those?
    g
    • 2
    • 4
  • h

    happy-father-45304

    12/08/2021, 7:36 PM
    Hello all, started evaluating data hub for a project I'm working on. What is the simplest way to just register a file located in s3 as a dataset? I'm using the rest api and I Tried a metadatchangeproposal for a dataset entity with datasetproperties class with a uri set. It submits fine but I get a 500 error from the web app.
    m
    • 2
    • 10
  • c

    cool-king-80482

    12/08/2021, 9:11 PM
    Is there a place I can learn more about the cli commands? Specifically I would like to learn what is invoked with the term "ingest" in this context: ' datahub ingest -c emaple.yml --dry-run '
    l
    • 2
    • 1
  • o

    orange-flag-48535

    12/09/2021, 7:24 AM
    Does Datahub expose Ingestion related metrics? We want to keep an eye on how many Datasets, fields etc got ingested recently. And also as a way to look out for errors and suspicious patterns.
    s
    m
    • 3
    • 7
  • r

    rich-policeman-92383

    12/09/2021, 8:17 AM
    Can someone point me to the DAG of this airflow task on the demo site.
    h
    s
    • 3
    • 9
  • m

    microscopic-elephant-47912

    12/09/2021, 11:37 PM
    Hi team, I created some custom properties for terms but when i search it, Datahub couldn't find the related term. Are custom properties not sent to Elastic ? If yes, can it be done by some configuration ? Thanks a lot.
    e
    b
    • 3
    • 6
1...202122...144Latest