https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • s

    steep-midnight-37232

    06/16/2022, 7:06 PM
    Hi, I'm ingesting dbt metadata and I would like to know if it's possible to see in datahub ui the result of dbt tests. I'm able to see the definition of the tests but not the results. Thanks for the help
    l
    m
    • 3
    • 17
  • s

    stocky-energy-24880

    06/17/2022, 10:58 AM
    Hello Team, Stateful ingestion with Transformers are not working, Please find below config:
    Copy code
    # see <https://datahubproject.io/docs/generated/ingestion/sources/mysql> for complete documentation
    source:
      type: "postgres"
      config:
        host_port: localhost:54320
        database: test
        stateful_ingestion:
          enabled: True
        table_pattern:
          deny:
            - '.*company$'
    pipeline_name: "my_postgres_pipeline_2"
    transformers:
          - type: "simple_add_dataset_ownership"
            config:
              owner_urns:
               - "urn:li:corpGroup:mobilede@DataHub"
    
    datahub_api:
        server: "<http://localhost:8080>"
    
    sink:
      type: "datahub-rest"
      config:
        server: "<http://localhost:8080>"
    Here we are trying to softdelete a table using deny pattern and as per the stateful ingestion the soft deleted item should not be displayed from UI, but the soft deleted table still visible from UI. while seeing the logs with debug we got to know that when we are using a transformer then the deleted entity upserted again. please see below logs: [2022-06-17 113520,773] DEBUG {datahub.emitter.rest_emitter:229} - Attempting to emit to DataHub GMS; using curl equivalent to: curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataset", "entityUrn": "urnlidataset:(urnlidataPlatform:postgres,test.public.company,PROD)", "changeType": "UPSERT", "aspectName": "status", "aspect": {"value": "{\"removed\": true}", "contentType": "application/json"}, "systemMetadata": {"lastObserved": 1655458520711, "runId": "postgres-2022_06_17-11_35_18"}}}' 'http://localhost:8080/aspects?action=ingestProposal' [2022-06-17 113520,794] INFO {datahub.ingestion.run.pipeline:84} - sink wrote workunit soft-delete-table-urnlidataset:(urnlidataPlatform:postgres,test.public.company,PROD) [2022-06-17 113520,795] DEBUG {datahub.emitter.rest_emitter:229} - Attempting to emit to DataHub GMS; using curl equivalent to: curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataset", "entityUrn": "urnlidataset:(urnlidataPlatform:postgres,test.public.company,PROD)", "changeType": "UPSERT", "aspectName": "ownership", "aspect": {"value": "{\"owners\": [{\"owner\": \"urnlicorpGroup:mobilede@DataHub\", \"type\": \"DATAOWNER\"}], \"lastModified\": {\"time\": 0, \"actor\": \"urnlicorpuser:unknown\"}}", "contentType": "application/json"}, "systemMetadata": {"lastObserved": 1655458520711, "runId": "postgres-2022_06_17-11_35_18"}}}' 'http://localhost:8080/aspects?action=ingestProposal' [2022-06-17 113520,830] INFO {datahub.ingestion.run.pipeline:84} - sink wrote workunit txform-urnlidataPlatform:postgres-test.public.company-PROD-ownership Is this expected? I mean stateful ingestion with transformer not supported? Or, Is there any configuration for transformers to check the soft deleted entity?
    b
    • 2
    • 3
  • a

    adventurous-apple-98365

    06/17/2022, 10:25 PM
    Are there any plans to implement additional
    changeTypes
    for MetadataChangeProposals? Ideally I need to PATCH(to support external metadata enrichment), but per the docs only UPSERT is supported
    m
    • 2
    • 4
  • b

    better-orange-49102

    06/20/2022, 2:31 AM
    can i check on the purpose of the
    datahub telemetry
    command, whats the purpose of enabling CLI to enable and disable it? Is it a global setting or a per CLI session setting?
    m
    • 2
    • 1
  • l

    lemon-zoo-63387

    06/20/2022, 10:55 AM
    hello everyone,If I want to save all metadata to the company's DB, how to configure it Thanks in advance for your help!😁😁
    s
    • 2
    • 1
  • n

    numerous-diamond-76461

    06/20/2022, 11:30 AM
    I got error when ingest data from UI (http://localhost:9002) while ingest from CLI (datahub ingest -c example/psql.yml) or SDK is successful
    b
    s
    • 3
    • 39
  • n

    numerous-diamond-76461

    06/20/2022, 11:31 AM
    Screenshot from 2022-06-20 18-24-44.png
    s
    • 2
    • 1
  • n

    numerous-diamond-76461

    06/20/2022, 11:32 AM
    Error log is here: https://pastebin.com/6ij6xpEc
    s
    • 2
    • 1
  • b

    bland-orange-13353

    06/20/2022, 11:33 AM
    This message was deleted.
    s
    • 2
    • 1
  • b

    bulky-jackal-3422

    06/20/2022, 1:21 PM
    Hi everyone, what would be your suggestion for getting a CSV with metadata for a source into datahub? Does using the
    File
    source make the most sense here?
    plus1 1
    s
    e
    +2
    • 5
    • 24
  • c

    cool-actor-73767

    06/20/2022, 8:36 PM
    Hello, I created a ingestion from UI ingestion using transformes to set owner of datasets, but the only accepted values owners types are deprecated (DEVELOPER, CONSUMER, PRODUCER,...) I tried to pass "DATA_STEWARD", "TECHNICAL_OWNER", "BUSINESS_OWNER" a error ocurrs. Why? Below my part of yaml file. transformers: - type: simple_add_dataset_ownership config: owner_urns: - 'urnlicorpuser:XXXX.XXXX' ownership_type: DATA_STEWARD
    h
    s
    • 3
    • 11
  • b

    bulky-jackal-3422

    06/20/2022, 8:58 PM
    When using the
    SchemaMetadataClass
    , what exactly should I be using as a
    platformSchema
    ? The documentation isn't very clear to me https://datahubproject.io/docs/graphql/unions#platformschema
  • m

    many-house-53659

    06/21/2022, 4:20 AM
    Hello, I am new to datahub. Currently I have a plan to ingest the metadata and lineage from hive tables. Many ETL is done via hive or beeline command in the cli. Is there any way to capture the lineage directly?
    e
    • 2
    • 1
  • m

    many-house-53659

    06/21/2022, 6:00 AM
    Another question, is there any way to save the lineage from the spark submit approach into a file instead of sending it to the rest server?
  • n

    nutritious-vegetable-81282

    06/21/2022, 7:20 AM
    Hi there, I have a small question about using dataset properties within Datahub. Can I extract these properties in a transformer to populate, for instance, dataset glossary terms or owner fields? thanks!
    b
    b
    • 3
    • 6
  • f

    few-air-56117

    06/21/2022, 8:43 AM
    Hi guys, its posible to add owners to a dataset using the python emmiter?
    b
    • 2
    • 23
  • a

    acoustic-quill-54426

    06/21/2022, 12:54 PM
    Howdy! After running the BigQuery profiler we ingested thousands of views created by great expectations e.g
    ge-temp-{uuid}
    . I found a related feature request, but that is about not showing the query in the original dataset. I believe this is a bug rather than a feature 😅 Do you guys want me to create an issue?
    d
    • 2
    • 3
  • s

    straight-refrigerator-31859

    06/21/2022, 3:44 PM
    Hello community ! Reaching out for help… set up profiling tables in hive but during execution throws following exception; it appears a greatexpectations tmp table. org.apache.hadoop.hive.ql.parse.SemanticException:Table not found ge_temp_0ce5542c
    • 1
    • 1
  • h

    high-family-71209

    06/21/2022, 3:49 PM
    Hi everyone, didn't we say we don't need to specify the sink anymore when we ingest from the CLI? Today I got an error and I need the sink to be specified. Was this changed back? Thanks!
    m
    • 2
    • 4
  • d

    delightful-barista-90363

    06/21/2022, 5:19 PM
    Hello, I was wondering if it's possible to configure spark (for profiling) to point to a hosted spark server (i.e. hosted on kubernetes) as opposed to requiring spark to be installed locally? Maybe creating a spark session prior to the ingestion script being run?
    e
    • 2
    • 4
  • l

    loud-shampoo-64092

    06/21/2022, 7:59 PM
    hey guys, i;m triyng to use metabase ingestion, do someone have an example of the yaml file for it?
    e
    • 2
    • 10
  • p

    polite-application-51650

    06/22/2022, 8:28 AM
    Hi Team, can somebody please tell me how many tables are created/used by datahub for storing the metadata info when our ingestion source is BQ with large number of datasets? @dazzling-judge-80093 @big-carpet-38439 @orange-night-91387
    b
    • 2
    • 4
  • m

    modern-monitor-81461

    06/22/2022, 12:22 PM
    DBT roadmap items Hi guys, I have two questions regarding the roadmap planned for DBT. I think there is some work planned for the DBT source for the near future and I'd like to know if the following items are covered, or if they have even been raised before: 1- I have raised this earlier and never really got closure on it, but I would like to know if the DBT source meta_mapping could be modified to support multiple matches (see this thread). TL;DR I would like to have the possibility to do something like this (
    match this or that or this...
    ):
    Copy code
    meta_mapping:
          data_tier:
            - match: "Bronze"
              operation: "add_term"
              config:
                term: "Bronze"
            - match: "Gold"
              operation: "add_term"
              config:
                term: "Gold"
            - match: "Silver"
              operation: "add_term"
              config:
                term: "Silver"
              term: "Silver"
    The current implementation only performs the last
    match
    and discards the previous ones (in this example, only
    Silver
    would be considered). 2- The DBT model support
    meta
    fields for columns (see docs), but the current code seems to only support
    meta
    information in the DBTNode (not in DBTColumn). I would like to be able to map terms to columns and not only for datasets. Was that ever considered?
    m
    • 2
    • 5
  • s

    sparse-barista-40860

    06/22/2022, 4:18 PM
    https://gist.github.com/f6dca7da93d1549cfa8391c17fa70a77
  • s

    sparse-barista-40860

    06/22/2022, 4:18 PM
    pls help me with this error
    d
    • 2
    • 3
  • s

    sparse-barista-40860

    06/22/2022, 4:18 PM
    Copy code
    datahub ingest -c /root/datahub/metadata-ingestion/examples/demo_data/bigquery_covid19_to_file.dhub.yaml
  • s

    sparse-barista-40860

    06/22/2022, 4:30 PM
    Copy code
    cd datahub/
    pip install 'acryl-datahub[kafka]'
    
    datahub ingest -c /root/datahub/metadata-ingestion/examples/recipes/secured_kafka.dhub.yaml
  • s

    sparse-barista-40860

    06/22/2022, 4:30 PM
    error
  • s

    sparse-barista-40860

    06/22/2022, 4:30 PM
    to
  • s

    sparse-barista-40860

    06/22/2022, 4:32 PM
    Copy code
    cd datahub/
    pip install 'acryl-datahub[nifi]'
    
    datahub ingest -c metadata-ingestion/examples/recipes/nifi_to_datahub_rest.dhub.yaml
1...484950...144Latest