https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • f

    fancy-fireman-15263

    04/27/2021, 7:04 PM
    Get this output from datahub check plugins --verbose
    ModuleNotFoundError("No module named 'pybigquery'")
    m
    g
    • 3
    • 11
  • c

    curved-magazine-23582

    04/28/2021, 12:37 PM
    hello, looking at
    pipeline
    feature, what entities do I need to assemble and ingest to get that working? Is there a doc about that feature?
    m
    • 2
    • 18
  • h

    high-hospital-85984

    04/28/2021, 5:35 PM
    @gray-shoe-75895 What would it take to include https://github.com/linkedin/datahub/tree/master/contrib/metadata-ingestion/python/looker a part of
    acryl-datahub
    ? We’re interested in getting into the main package (willing to contribute)
    g
    l
    +2
    • 5
    • 11
  • s

    steep-pizza-15641

    04/28/2021, 5:49 PM
    But if I use a recipe to publish a Postgres schema, the recipe uses urnlidataPlatform:postgresql and I do not get a nice Postgres icon in the GUI
    Copy code
    c.l.m.k.MetadataAuditEventsProcessor - {com.linkedin.metadata.snapshot.DatasetSnapshot={urn=urn:li:dataset:(urn:li:dataPlatform:postgresql,myapp.public.source_table_c,PROD), aspects=[{com.linkedin.schema.SchemaMetadata={created={actor=urn:li:corpuser:etl, time=1619631529974}, platformSchema={com.linkedin.schema.MySqlDDL={tableSchema=}}, lastModified={actor=urn:li:corpuser:etl, time=1619631529974}, schemaName=myapp.public.source_table_c, fields=[{fieldPath=id, nullable=true, type={type={com.linkedin.schema.NumberType={}}}, nativeDataType=INTEGER(), recursive=false}, {fieldPath=col1, nullable=true, type={type={com.linkedin.schema.StringType={}}}, nativeDataType=VARCHAR(length=255), recursive=false}], version=0, platform=urn:li:dataPlatform:postgresql, hash=}}
    m
    g
    • 3
    • 10
  • p

    plain-waiter-52883

    04/28/2021, 8:51 PM
    Hey, guys! I’m very lucky to use pipeline feature, but I have still one question. For example on screenshot - I would like to have just [dataset >> data job >> dataset] and remove [dataset >> dataset] edge. Can I remove unnecessary edges of graph? Or I don’t completely understand the conception of this feature?) Thx for help!
    g
    • 2
    • 2
  • h

    high-hospital-85984

    04/29/2021, 1:25 PM
    I managed to break our UI by assigning a “nameless” owner to a dataset. The error was only caught in the UI:
    Copy code
    Caused by: com.linkedin.data.template.TemplateOutputCastException: Invalid URN syntax: Urns with empty entityKey are not allowed. Urn: urn:li:corpuser:: urn:li:corpuser:
    Would be good to have some validation in the MCE.
    g
    m
    • 3
    • 5
  • c

    careful-insurance-60247

    04/29/2021, 2:47 PM
    Trying to get the ldap ingest working for a POC of datahub but get the following error.
    Copy code
    [ec2-user@ip-10-16-13-173 recipes]$ datahub ingest -c ./ldap_poc.yml
    [2021-04-29 04:50:55,724] INFO     {datahub.entrypoints:68} - Using config: {'source': {'type': 'ldap', 'config': {'ldap_server': '<ldap://dc.internal.test.com>', 'ldap_user': 'CN=datahub_ldap,OU=Generic Accounts,DC=internal,DC=test,DC=com', 'ldap_password': 'revmoved', 'base_dn': 'DC=internal,DC=test,DC=com', 'filter': '(objectClass=*)'}}, 'sink': {'type': 'datahub-rest', 'config': {'server': '<http://10.16.13.173:8080>'}}}
    Traceback (most recent call last):
      File "/usr/local/bin/datahub", line 8, in <module>
        sys.exit(datahub())
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/usr/local/lib/python3.7/site-packages/datahub/entrypoints.py", line 74, in ingest
        pipeline.run()
      File "/usr/local/lib/python3.7/site-packages/datahub/ingestion/run/pipeline.py", line 108, in run
        for wu in self.source.get_workunits():
      File "/usr/local/lib/python3.7/site-packages/datahub/ingestion/source/ldap.py", line 130, in get_workunits
        b"inetOrgPerson" in attrs["objectClass"]
    TypeError: list indices must be integers or slices, not str
    c
    g
    • 3
    • 21
  • f

    fast-leather-13054

    04/29/2021, 4:51 PM
    Hi All, I am trying to import Airflow jobs data from json dump using python script and found that next classes only available:
    Copy code
    DataJobSnapshotClass
    DataJobInfoClass
    DataJobInputOutputClass
    Could you please explain how you imported links to airflow or any custom properties in your demo environment ? Will these missing classes be available soon ?
    m
    m
    g
    • 4
    • 45
  • c

    curved-sandwich-81699

    04/30/2021, 3:45 PM
    Hello everyone! I have opened a PR allowing the ingestion of Looker views built from (SQL-based) derived tables: https://github.com/linkedin/datahub/pull/2478 It is working great for us (many of our SQL-based Looker views were not ingested before). It would be nice if someone here could test and report back, hoping to get some momentum going on the Looker integration 🙂
    🙌 5
    b
    h
    m
    • 4
    • 9
  • h

    handsome-airplane-62628

    05/03/2021, 6:10 PM
    Hi Everyone - I had a question about source lineage. In DBT we track sources to our source db in our data warehouse - however ideally we'd like to add lineage prior to this db. We use stitch to move data from various sources into Snowflake. I wanted to ask about documenting sources prior to Snow: • Is there any roadmap to add stitch (or workaround) to monitor this EL workflow? • The lineage is fairly static (once setup we don't really modify - so this would be possible to manually manage in the absence of automated stitch ingestion). Is it possible to setup a manual file? And any example of lineage parsing with a manual file? I looked here which is very helpful....but wasn't 100% sure how the entities in the json file mapped to datasets/fields/lineage in datahub
    m
    b
    • 3
    • 4
  • f

    fast-leather-13054

    05/04/2021, 10:04 AM
    Hi Everyone, is there any way to add custom logo to custom dataset source type?
    i
    • 2
    • 2
  • t

    thousands-tailor-5575

    05/04/2021, 12:15 PM
    Hi everyone, I am trying to ingest data from dbt using manifest and catalog json files. Getting key errors while running
    Copy code
    dbtNode.columns = get_columns(catalog[dbtNode.dbt_name])
    the last was for a test that is in manifest file, any help/similar issues? It looks like any key present in catalog has to be also present in the manifest file?
    g
    a
    +2
    • 5
    • 9
  • s

    stale-jewelry-2440

    05/04/2021, 1:58 PM
    Hi! I would like to add to the catalog metadata from API calls. Let's say I have a web service which give me the data I need via REST calls, there is a Swagger page that describe the APIs, and I don't have direct access to the service database. As example, one of the API can give me a list of user's data by calling GET to
    <https://test_web_service.com/api/user_data>
    . And let's say that the results of this call is a JSON containing name, address, telephone number. So I would like to see in the catalog a dataset like
    test_web_service.user_names
    , which contains as fields name, address, telephone number. Does someone out there already did something similar?
    l
    m
    s
    • 4
    • 5
  • h

    high-hospital-85984

    05/04/2021, 6:52 PM
    What are the minimum permissions the user needs in order to run an ingestion job on a Postgres table?
    g
    • 2
    • 1
  • c

    calm-sunset-28996

    05/05/2021, 9:51 AM
    With the airflow backend, as defined here, what happens if something fails? Because this can be quite a risk for our production jobs?
    p
    m
    g
    • 4
    • 6
  • f

    fast-leather-13054

    05/05/2021, 2:45 PM
    One more question: trying to start gms service locally using command
    ./gradlew :gms:war:run
    from readme but it stucks What I did wrong ?
    g
    m
    • 3
    • 18
  • l

    lively-sunset-25180

    05/06/2021, 2:55 AM
    hi there im trying to ingest kafka data from confluent cloud... ive got the connection to the kafka broker working but not sure how to pass in the key and token for the schema registry... if i understand correctly... reading through this issue to find out more... https://github.com/linkedin/datahub/issues/1861
    g
    • 2
    • 12
  • s

    stale-jewelry-2440

    05/07/2021, 12:17 PM
    Hi again! I am using a local JSN file for the ingestion, so there are the dataset metadata descriptions. How can I put the description of the datasets (the "no description" field in the screenshot)?
    m
    b
    • 3
    • 4
  • i

    icy-holiday-55016

    05/07/2021, 12:58 PM
    Hi, I'm trying to ingest lineage data following the recently updated guide here: https://github.com/linkedin/datahub/tree/master/metadata-ingestion#using-datahubs-airflow-lineage-backend-recommended I've set up the hook in step 1 for the REST endpoint at http://localhost:8080 . I've also added the backend config to my airflow.cfg file. Running the DAG in step 3 seems to have no effect in Datahub - https://github.com/linkedin/datahub/blob/master/metadata-ingestion/examples/airflow/lineage_backend_demo.py Using the Datahub Emitter works fine though: https://github.com/linkedin/datahub/blob/master/metadata-ingestion/examples/airflow/lineage_backend_demo.py I see the first one is the recommended approach, which uses inlets and outlets. Do the targets of those inlets/outlets have to exist in Datahub already for them to work? Also, when using this approach, should I see any activity in the Airflow log indicating data has been sent to Datahub. I don't see anything, but I do when using the Datahub Emitter.
    b
    • 2
    • 1
  • m

    mysterious-lamp-73086

    05/09/2021, 12:13 PM
    Hi, I'm trying to ingest lineage data following the recently updated guide. But I have an error in Datahub. Do you know what the problem? upd solve the problem use dag from the demo))
    h
    g
    +2
    • 5
    • 10
  • s

    some-cricket-23089

    05/11/2021, 1:58 PM
    Hi All , I was exploring the datahub relationships, So my as is if i have mysql database and table are normalized in. So can we manage the relationship between the these table. Like table " A" has a foreign key which is a primary key in table "B". So any way i can manage this relationship between Table "A" and "B"
    g
    m
    • 3
    • 12
  • b

    better-orange-49102

    05/12/2021, 1:34 AM
    I'm not familiar with data lineage, wanted to explore and test out how it works for PostgreSQL. How is a table derived from another one? is it as simple as
    Copy code
    CREATE TABLE new_table
      AS (SELECT * FROM old_table);
    I tried that but there was no lineage shown.
    g
    a
    s
    • 4
    • 25
  • f

    fast-leather-13054

    05/12/2021, 11:26 AM
    Hi All, found issue with looker dashboard ingestion, previously DashboardInfo class has only dashboardUrl field to store reference to real looker dashboard and this reference show up on UI. Now this link is not present on UI for my set up. I made investigation and found that new field externalUrl is available now and when I set it up then link on UI is available. But it's not correct to have two fields for the same purpose. Also when I removed dashboardUrl field then UI failed to render the page (Caused by: java.lang.NullPointerException: null) but dashboardUrl is not a required field.
    b
    • 2
    • 2
  • i

    icy-holiday-55016

    05/12/2021, 2:48 PM
    Hi folks, I think I've found a bug with Owner ingestion from using Airflow backend. Using the the default args in screenshot 1, the user 'airflow' doesn't exist in my system to begin with. After I run the DAG, the is visible on the pipeline (screenshot 2). If i click on the user image, the UI breaks and I see the contents of screenshot 3 in the Firefox console If I ingest a via another means (for example user 'Steve' in an MCE over HTTP), the user gets created OK and the UI behaves as expected. If I then pass in user 'Steve' in the DAG default args, I get the same behaviour as above (broken UI). Thanks
    b
    g
    • 3
    • 12
  • b

    brave-furniture-58468

    05/18/2021, 9:45 AM
    Hi, I am new to datahub and exploring it. Is there already a Java API or at least the classes needed for MetadataChangeEvents in a maven artifact that I can use to publish new events to the api myself?
    m
    b
    +2
    • 5
    • 10
  • r

    rich-policeman-92383

    05/18/2021, 12:06 PM
    Hello How can we replace the env key with a value other than Prod https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql_common.py#L51 I have tried setting it in the config section of the sqlalchemy recipe but i get error: Invalid URN Parameter: no enum constant
    g
    • 2
    • 5
  • g

    glamorous-kite-95510

    05/19/2021, 7:32 AM
    Hi everyone, i am newbie I have to question: 1. How i can change the description of the specific trable. Can you instruct me how to edit description in detail, please? Where is document about push-API ? I don't want to change description of all tables in database 2. When i delete table at my own database in local and i ingest this changed database again with the hope that data hub will update too. But it does not. 3. How metadata are stored beneath Data hub ? What is database used for saving, if there's any ?
    g
    b
    • 3
    • 14
  • r

    rich-policeman-92383

    05/20/2021, 11:10 AM
    Hello People I need help while ingesting data from mongo as source and datahub reset sink. I get below error while ingesting data from mongodb Error: 3 validation errors for mongoDBConfig env extra field not permitted enableSchemaInference: True not permitted schemaSamplingSize: 1000 not permitted
    l
    c
    • 3
    • 15
  • g

    glamorous-kite-95510

    05/24/2021, 1:37 AM
    Hi, Can I ingest metadata with a description that should belong to a specific dataset ? I don’t want to all datasets in database have all the same description. Is that possible and how, please ?
    g
    • 2
    • 3
  • e

    enough-potato-17984

    05/25/2021, 9:33 AM
    Hi, I have some question about Datahub. How datahub get mysql/hive lineage infomation? Is there some hook processing like Atlas? I cant find that from Datahub's code or doc.
    g
    m
    +2
    • 5
    • 27
12345...144Latest