https://datahubproject.io logo
Join SlackCommunities
Powered by
# ingestion
  • l

    late-bear-87552

    01/27/2022, 5:05 AM
    can anyone tell me why _datahub/metadata-ingestion/examples/recipes/bigquery_to_datahub.yml_ file has redshift as type???
    b
    b
    • 3
    • 6
  • l

    late-bear-87552

    01/27/2022, 5:11 AM
    tried running ingestion using command for bigquery got below result on console but i dont see these datasets on UI.
    Copy code
    [2022-01-27 10:53:32,306] INFO     {datahub.cli.ingest_cli:81} - Starting metadata ingestion
    [2022-01-27 10:53:35,470] INFO     {datahub.ingestion.source.sql.bigquery:224} - Built lineage map containing 0 entries.
    [2022-01-27 10:53:38,136] INFO     {datahub.ingestion.run.pipeline:79} - sink wrote workunit regal-reporter-240416.data_test.test_test_tesst
    [2022-01-27 10:53:41,299] INFO     {datahub.ingestion.run.pipeline:79} - sink wrote workunit regal-reporter-240416.test_datahub_big.test_datahub_big
    [2022-01-27 10:53:42,221] INFO     {datahub.cli.ingest_cli:83} - Finished metadata ingestion
    
    Source (bigquery) report:
    {'workunits_produced': 2,
     'workunit_ids': ['regal-reporter-240416.data_test.test_test_tesst', 'regal-reporter-240416.test_datahub_big.test_datahub_big'],
     'warnings': {},
     'failures': {},
     'tables_scanned': 2,
     'views_scanned': 0,
     'entities_profiled': 0,
     'filtered': [],
     'soft_deleted_stale_entities': [],
     'query_combiner': None}
    Sink (datahub-rest) report:
    {'records_written': 2,
     'warnings': [],
     'failures': [],
     'downstream_start_time': datetime.datetime(2022, 1, 27, 10, 53, 38, 113867),
     'downstream_end_time': datetime.datetime(2022, 1, 27, 10, 53, 41, 299106),
     'downstream_total_latency_in_seconds': 3.185239}
    b
    • 2
    • 2
  • r

    red-pizza-28006

    01/27/2022, 7:53 AM
    Have a question - what would be the difference between data lake ingestions on s3 files vs getting this from Glue?
    b
    • 2
    • 1
  • f

    few-air-56117

    01/27/2022, 8:21 AM
    Hi everyone, i try to ingest biguqery-usage and i get this error
    Copy code
    failed to match table read event with job; try increasing '
                                                                                     '`query_log_delay` or `max_query_duration`',
    what dose it mean?
    b
    • 2
    • 1
  • b

    blue-boots-43993

    01/27/2022, 8:24 AM
    Hi everyone, glad to be here....I have couple of questions regarding my current setup. As you can see I have Qlik imported and I am currently struggling to get dataset to chart lineage....could use some help if anybody had tried the same with Qlik
    l
    b
    +2
    • 5
    • 48
  • r

    rich-policeman-92383

    01/27/2022, 5:20 PM
    After profiling a hive dataset the mean, max, median and other values are unknown. The pipeline finished with no errors. What could be the cause of this. only Row count and column count are reported correctly.
    w
    l
    • 3
    • 6
  • w

    witty-butcher-82399

    01/27/2022, 7:33 PM
    I’m testing stateful ingestion. Is it a correct assumption that datasets with
    removed=true
    for the status aspect does not appear in the UI?
    m
    b
    e
    • 4
    • 23
  • q

    quaint-plastic-61962

    01/27/2022, 8:44 PM
    Hi everyone. Is there a way to include percent signs as part of glossary terms (i.e. in the URN and browse paths)? I'm getting errors during ingestion, but if I escape them then the frontend will show the actual escape code and doesn't decode it back to a percent sign 😞
    b
    • 2
    • 4
  • r

    red-napkin-59945

    01/28/2022, 1:06 AM
    Hey Team, I found one issue while I was trying to understand how to ingest existing metadata to datahub. In the current code, it automatically use "li" as all Urn domain. Maybe we need to have a configurable way to set the urn domain? example: a user entity might have urn as "urn:*li*corpUserfoo"
    b
    o
    • 3
    • 4
  • b

    better-orange-49102

    01/28/2022, 1:09 AM
    for datasets with custom platforms to show up as a new platform in UI, I just need to ingest a platform, ie urnlidataPlatform:custom, right?
    g
    b
    l
    • 4
    • 5
  • b

    better-orange-49102

    01/28/2022, 2:57 AM
    for the new platform instance change that is coming out, does all the URNs need to be migrated to the new format? This is because some of my datasets are created by specifying the MCEs manually, I was about to ingest a new platform(csv) to group these datasets.
    m
    • 2
    • 3
  • d

    delightful-orange-22738

    01/28/2022, 12:23 PM
    Hello! how can i create hdfs platform? i seen this in demo datahub https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)/Schema?is_lineage_mode=false I need same thing like in demo.
    b
    f
    • 3
    • 6
  • f

    few-air-56117

    01/28/2022, 1:59 PM
    hi guys, how can i start multiple ingestion in paralele?
    o
    • 2
    • 1
  • b

    broad-battery-31188

    01/28/2022, 5:18 PM
    Hello Team, Is it possible to ingest multiple Look ML projects using one recipe ?
    o
    • 2
    • 1
  • b

    breezy-controller-54597

    01/31/2022, 2:18 AM
    When I input a description from the UI and then re-ingest it, it overwrites what I inputed from the UI and disappears. Is it possible to prevent it from being overwritten by setting up a recipe or something?
    b
    • 2
    • 20
  • b

    billions-receptionist-60247

    01/31/2022, 8:07 AM
    Hello Team , How do i ingest only the changes in metadata to datahub
    o
    • 2
    • 1
  • w

    witty-butcher-82399

    01/31/2022, 11:43 AM
    Hi! How would you model a process reading from a dataset and publishing metrics in DataDog or Prometheus? I was considering DataFlow+DataJob for the process and Dashboard+Chart for the destination. However DataJob output is exclusively restricted to Datasets.
    o
    • 2
    • 2
  • b

    broad-tomato-45373

    01/31/2022, 12:49 PM
    Hi Guys, Need your help on the below points : 1. Does redshift-usage
    <https://datahubproject.io/docs/metadata-ingestion/source_docs/redshift/#mixed>
    module also ingests queries usage and stats for external tables that are queried via redshift spectrum (tables present in glue data catalog). 2. UI is not showing the column usage bar Current datahub version is v.0.8.23. Any help would be much appreciated..
    d
    • 2
    • 5
  • l

    lemon-hydrogen-83671

    01/31/2022, 4:22 PM
    Hey guys, i'm playing around with lineage ingestion using the python DatahubKafkaEmitter and noticed that when i insert two separate
    upstreamLineage
    events that it replaces my previously set lineages. Anyone know of a way to make this additive instead of a full replace? Thanks!
    s
    • 2
    • 2
  • b

    brief-apartment-60236

    01/31/2022, 5:45 PM
    Hi All, this seems so basic but somehow I am not able to find a way to upsert a property value to a Table entity. All I need is update or add a property key/value without disturbing any other existing properties. I am even fine with Read all properties--> upsert --> write back. But how to read all existing properties? Below is my sample code -
    o
    • 2
    • 1
  • c

    calm-airplane-47634

    01/31/2022, 10:41 PM
    How does datahub deal with “column level access control” in bigquery? For that matter in other systems too.
    o
    • 2
    • 3
  • b

    better-orange-49102

    02/01/2022, 1:55 AM
    Since UI added tags now get overwritten by ingest, I'm wondering if it's better to add back the tags field in dataproperties class or to write a improved transformer class that queries Datahub for existing tags and inserts it into the globaltag class before ingestion. Personally I liked the old dataproperties tag field - it introduces tags that cannot be removed from the UI. Edit: this is probably applicable to ownership as well.
    👍 1
    m
    • 2
    • 3
  • o

    orange-iron-51394

    02/01/2022, 9:00 AM
    Hi everyone, I'm new here🙂 I have couple of questions about basic abilities - is there a way to preview the data in the UI? Data such as videos, pictures, parquet files, etc and can i search a for a specific entity in the system? (Not a column or schema, but an particular instance). Thanks 🙏🏻
    s
    • 2
    • 5
  • f

    few-air-56117

    02/01/2022, 11:11 AM
    Hi guys, i tried to inget some bigquery metadata but i get this strange error
    Copy code
    {'error': 'Unable to emit metadata to DataHub GMS',
    Info
    2022-02-01 12:48:18.653 EET
    metadata-ingest
     'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
    Info
    2022-02-01 12:48:18.653 EET
    metadata-ingest
     'stackTrace': "com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Parameters of method 'ingest' failed "
    s
    • 2
    • 45
  • h

    happy-island-35913

    02/01/2022, 2:29 PM
    Hello All, Is there a workaround for ingesting data from PowerBI? or How can I ingest data from PowerBI?
    o
    l
    • 3
    • 2
  • s

    silly-boots-14314

    02/01/2022, 2:33 PM
    Hello, in the recent town hall you demo’d dataset stats ingestion from sql databases. The stats included number of rows, percentiles etc. Does your rest API support adding stats for custom datasets?
    o
    • 2
    • 5
  • g

    gifted-queen-61023

    02/01/2022, 3:00 PM
    Hey guys waving from afar left Hope you're doing fine. I have something that I would like to point out. When ingesting
    tags
    , when we try to associate them using the "Global Tags" aspect, the TagProperties
    name
    has no use, since that only
    description
    and
    urn
    identifier are displayed in the frontend. • Is this something that you are aware of? • Am I doing something wrong? Thanks in advance 🙌
    b
    • 2
    • 7
  • w

    witty-butcher-82399

    02/01/2022, 3:09 PM
    Hi datahub community! How are you modelling eg a kafka streams application or a microservice reading from a kafka topic and writing to another one? For the topics, it is clear that I can use Dataset entity. But what about the process in-between? I find DataFlow+DataJob not fitting the purpose here. Related to this I raised this feature request https://feature-requests.datahubproject.io/b/Developer-Experience/p/entity-model-for-streaming-applications-or-microservices Anyway, advice and suggestions are welcome. Thanks!
    l
    l
    i
    • 4
    • 11
  • s

    silly-laptop-71099

    02/01/2022, 3:51 PM
    Hello. I am in the process of implementing a custom ingestion source to fivetran. I have the meta data pulling from the rest API, but now want to generate the data to make lineage work. For example, there are connectors that pull from a mongo schema and push data to another destination. Advice and suggestions are welcome.
    l
    s
    • 3
    • 2
  • a

    acceptable-lion-18875

    02/01/2022, 5:32 PM
    Hi set up datahub locally on a Windows pc with docker to find out if datahub might help us. I am struggeling with the file Ingestion I always get "FileNotFoundError: [Errno 2] No such file or directory". I tried a lot of possbile combinations to specify the path to the file.
    o
    • 2
    • 1
1...262728...144Latest