https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • s

    some-car-9623

    04/11/2023, 4:16 PM
    image.png,image.png
    l
    a
    • 3
    • 2
  • p

    purple-salesmen-12745

    04/11/2023, 6:24 PM
    I looking for one-one help/fereelncer for seeting ingestion
    βœ… 1
    l
    p
    • 3
    • 2
  • w

    wonderful-jordan-36532

    04/12/2023, 9:13 AM
    What's the role of ActorUrn attached to a access token? Does it come up somewhere in the UI?
    l
    a
    • 3
    • 2
  • f

    few-carpenter-93837

    04/12/2023, 10:06 AM
    Hi, can anyone confirm that with DataHub Tableau integration, they have successfully got the new project_patterns to work (using the allow, deny configurations in recipe)? For example with the following project (+sub-folder in the project) in Tableau: 1. Folder (standard)/1.1 Folder2 (standard), I've tried with the following patterns in the recipe file:
    Copy code
    1'st pattern try:
    project_pattern:
    allow: ["1. Folder (standard)/1.1 Folder2 (standard)"]
    2'nd:
    allow: ["^1. Folder (standard)$"]
    3'rd:
    allow: ['^1. Folder (standard)$', '^1.1 Folder2 (standard)$']
    And it always gives the error while ingesting, that this object isn't in the allowed pattern list.
    βœ… 1
    l
    a
    • 3
    • 2
  • q

    quiet-television-68466

    04/12/2023, 11:13 AM
    Hello all! I am working on creating our Airflow integration. Currently trying to set up the plugin following these docs https://datahubproject.io/docs/lineage/airflow/, and the plugin shows up here. But its not recognising the
    datahub_rest_default
    connection that has been created. Additionally there is no
    emitting
    step shown in DAG logs. Am I missing something obvious?
    πŸ“– 1
    πŸ” 1
    l
    d
    +3
    • 6
    • 29
  • b

    busy-ghost-93490

    04/12/2023, 12:07 PM
    I'm trying to make Lineage to hive tables and already have the pyspark codes for them , how to make Lineage and what is the recommended steps?
    l
    a
    • 3
    • 2
  • a

    adamant-sugar-28445

    04/13/2023, 4:37 AM
    Hi folks. Could anyone tell me where datahub stores the data of dataset lineage?
    βœ… 1
    l
    m
    a
    • 4
    • 4
  • b

    bland-appointment-45659

    04/13/2023, 5:04 AM
    Team, we are calling mcp requests to build the lineage for our custom code (based on upstream / downstream). However when an intermediate dataJob is removed within a dataFlow the obsolete dataJob still shows on UI. Any way to get this cleaned up ?
    πŸ“– 1
    πŸ” 1
    l
    a
    • 3
    • 4
  • p

    powerful-mechanic-83241

    04/13/2023, 11:03 AM
    Hi all, any workarounds to get datahub working with dbt cloud in the emea region (https://emea.dbt.com/)? I see the url for the api is hardcoded, rather than allowing you to specify a url like with other sources
    πŸ“– 1
    πŸ” 1
    l
    a
    +3
    • 6
    • 5
  • k

    kind-sunset-55628

    04/13/2023, 11:45 AM
    Hi Team, we recently upgrade to data version 0.10.0.6 and for Airflow ingestion it doesnt show the list of tasks inside pipeline.
    πŸ” 1
    πŸ“– 1
    l
    a
    h
    • 4
    • 5
  • i

    important-tailor-54083

    04/13/2023, 7:16 PM
    Hi, I tried to create Metabase Ingestion with this config. but it's always failed. Is it because I use non-admin account ? Is there any metabase permission that I need to grant to non-admin account ? or I have to use admin account for the ingestion. I don't want to ingest every dashboard and chart, only selective collections. Kindly advise. Thanks
    Copy code
    source:
        type: metabase
        config:
            connect_uri: '<https://xxxxx>'
            username: username
            password: password
    Error msg
    Copy code
    .....
    Traceback (most recent call last):
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/entrypoints.py", line 179, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
        return self.main(*args, **kwargs)
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/click/core.py", line 1055, in main
        rv = self.invoke(ctx)
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/click/core.py", line 760, in invoke
        return __callback(*args, **kwargs)
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
        raise e
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
        res = func(*args, **kwargs)
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
        return func(ctx, *args, **kwargs)
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 198, in run
        loop.run_until_complete(run_func_check_upgrade(pipeline))
      File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
        return future.result()
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 158, in run_func_check_upgrade
        ret = await the_one_future
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 149, in run_pipeline_async
        return await loop.run_in_executor(
      File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 140, in run_pipeline_to_completion
        raise e
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 132, in run_pipeline_to_completion
        pipeline.run()
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 339, in run
        for wu in itertools.islice(
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/ingestion/source/metabase.py", line 617, in get_workunits
        yield from self.emit_card_mces()
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/ingestion/source/metabase.py", line 304, in emit_card_mces
        chart_snapshot = self.construct_card_from_api_data(card_info)
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/ingestion/source/metabase.py", line 351, in construct_card_from_api_data
        datasource_urn = self.get_datasource_urn(card_details)
      File "/tmp/datahub/ingest/venv-metabase-0.10.1/lib/python3.10/site-packages/datahub/ingestion/source/metabase.py", line 440, in get_datasource_urn
        (
    ValueError: not enough values to unpack (expected 4, got 2)
    l
    a
    b
    • 4
    • 5
  • q

    quiet-smartphone-60119

    04/13/2023, 9:07 PM
    Hey y'all, Question about how snowflake handles lineage. We're using a 'bulk' ingestion Snowflake source recipe targeting a particular set of schema to bring over tables, dataset stats & lineage information on an hourly basis into DataHub, and then occasionally manually enriching that lineage information with manual edits. (In this case, some tables are derived from an s3 file that is also in DH that we wanted to showcase the connection) However, we noticed recently that after a subsequent ingestion upstream lineage information from a particular augmented dataset goes missing. A bit more investigation showed that this wasn't so much missing, as replaced - but with lineage information that won't be propagated via a Snowflake ingestion as it's based on a temp table used in the process of creating this particular generate the snowflake table in question - and is no longer around to be ingested. So questions from this - 1. unlike other lineage based ingestions (ex: file based) is there a plan / use case for others outside of ourselves to preserve upstream lineage vs. replacing? 2. still a bit new at working with lineage information - anyone else hitting this issue and/or know of a way to inspect logged lineage table relation to have been 'temporary' or not? (So far I've not found anything)
    πŸ” 1
    πŸ“– 1
    l
    a
    • 3
    • 6
  • b

    brave-france-7945

    04/13/2023, 10:58 PM
    Any tips on how to ingest Tableau into Datahub? Have been trying to do it since two weeks + but it seems to be failing. Tried using the credentials of other colleagues to try it, but no luck. Keep getting this as the reason of failure:
    βœ… 1
    l
    a
    +2
    • 5
    • 13
  • b

    brave-france-7945

    04/13/2023, 10:58 PM
    Copy code
    'failures': {'tableau-login': ['Unable to login (invalid/expired credentials or missing permissions): \n\n\t401001: Signin Error\n\t\tError signing in to Tableau Server']},
     'failures': {'tableau-login': ['Unable to login (invalid/expired credentials or missing permissions): \n\n\t401001: Signin Error\n\t\tError signing in to Tableau Server']},
    a
    • 2
    • 3
  • a

    adamant-sugar-28445

    04/14/2023, 10:36 AM
    Hi Team. I'm using the Spark agent to create a data lineage for an hdfs input path. The Spark job take the path and produce another hdfs path. I created a table for the latter and push the metadata to datahub. The problem is the lineage only shows two hdfs paths, but I want the Hive table icon to replace the output path. I intended to create a feature request for this. But for the time being, do you know any workaround?
    l
    a
    a
    • 4
    • 5
  • q

    quiet-rain-16785

    04/14/2023, 11:03 AM
    Hi Guys, I am new to datahub and want to explore some features of it. Can anyone please help me in how to connect airflow with datahub. please provide documentation of it if possible.
    πŸ“– 1
    l
    a
    a
    • 4
    • 6
  • g

    glamorous-wire-83850

    04/14/2023, 2:19 PM
    Hi folks, Is there a any method for check the ingestion status? (from rest or db doesn’t matter)
    πŸ“– 1
    πŸ” 1
    l
    a
    • 3
    • 3
  • g

    gorgeous-tent-62316

    04/14/2023, 8:09 PM
    We are using csv-enricher. If we specify a dataset in the resource column that does not exist in the system, it is getting created. Is this the expected behavior? This is becoming a problem if we have a different case in the dataset urn, and a new one is getting created (since the current default now: https://datahubproject.io/docs/debugging/#im-seeing-exceptions-in-datahub-gms-container-like-caused-by-javalangillegalstateexception-duplicate-key-comlinkedinmetadataentityebeanebeanaspectv2dd26e011-what-do-i-do:~:text=We%27ve%20recently%20moved%20to%20deploying%20with%20a%20case%2Dsensitive%20collation%20(utf8mb4_bin)%20by%20default.)
    βœ… 1
    l
    m
    a
    • 4
    • 4
  • s

    some-car-9623

    04/14/2023, 9:08 PM
    Hello Everyone, I am trying to ingest the Dashboard and charts from Superset, I am able to do that.I am using the recipe from the demo site. It created the platform as Superset and ingest the Dashboard and charts, is there any way to rename the platform name, since We don't want use the platform as superset.how to achieve that ? recipe used: source: type: superset config: # Coordinates connect_uri: http://localhost:8088 # Credentials username: user password: pass provider: ldap sink: # sink configs Thanks Geetha
    πŸ“– 1
    πŸ” 1
    l
    a
    • 3
    • 7
  • d

    damp-lighter-99739

    04/17/2023, 7:49 AM
    Hi team , the DatasetProfileClass has a field called partitionSpec which gives information of the partition that was profiled. Is this field visible on the UI? If so could someone tell me where? I had used the python SDK to emit this data which it successfully did but i could not see this anywhere on the UI Thanks
    πŸ” 1
    βœ… 1
    πŸ“– 1
    l
    a
    m
    • 4
    • 6
  • r

    rapid-airport-61849

    04/17/2023, 8:55 AM
    β€œThis entity is not discoverable via search or lineage graph. Contact your DataHub admin for more information.” How could i recover this?
    l
    a
    • 3
    • 3
  • a

    agreeable-table-54007

    04/17/2023, 3:10 PM
    Hi guys. Hope you are doing well. I am new to all this and i don't find any tutorials or good explanations on internet... I want to ingest data from csv files or json files (data factory ARM so json) into datahub. I don't know how to do it with the UI or the CLI. Anyone can help ? Also what structure does the .yml need to have to ingest data from csv files / json data factory arm files ? Please help. 😭 Thanks.
    πŸ“– 1
    βœ… 1
    πŸ” 1
    l
    m
    • 3
    • 11
  • g

    gentle-arm-6777

    04/17/2023, 3:24 PM
    Hi, my ALL ingests is in peding status. Any glue?
    πŸ“– 1
    βœ… 1
    l
    b
    • 3
    • 3
  • a

    ancient-policeman-73437

    04/17/2023, 3:31 PM
    Dear community, we are deploying version 10.2 and faced an issue with connector to Looker. It shows now error ERROR: Ignored the following versions that require a different python version: 0.8.24.1 ERROR: No matching distribution found for acryl-datahub[datahub-kafka,datahub-rest,looker]==0.10.2. Could we fix it somehow? Thank you in advance!
    l
    a
    s
    • 4
    • 3
  • d

    dazzling-alarm-64985

    04/17/2023, 3:44 PM
    Hi, is it possible to define urn-id while creating Domains with the Python SDK? I cant see that it works using MetadataChangeProposalWrapper. It works fine using curl against graphql api
    l
    a
    • 3
    • 2
  • l

    loud-hospital-37195

    04/17/2023, 4:10 PM
    I am trying to create a connection to Snowflake for metadata ingestion and I get the following error, does anyone know why? ~~ Execution Summary - RUN_INGEST ~~ Execution finished with errors. {'exec_id': '394d3fb2-c607-4104-bc73-9bf0193cd9a5', 'infos': ['2023-04-17 160305.547847 INFO: Starting execution for task with name=RUN_INGEST', "2023-04-17 160519.121871 INFO: Failed to execute 'datahub ingest'", '2023-04-17 160519.122047 INFO: Caught exception EXECUTING task_id=394d3fb2-c607-4104-bc73-9bf0193cd9a5, name=RUN_INGEST, ' 'stacktrace=Traceback (most recent call last):\n' ' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n' ' task_event_loop.run_until_complete(task_future)\n' ' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n' ' return future.result()\n' ' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n' ' raise TaskError("Failed to execute \'datahub ingest\'")\n' "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"], 'errors': []}
    πŸ” 1
    πŸ“– 1
    l
    a
    +2
    • 5
    • 7
  • s

    straight-hairdresser-4506

    04/17/2023, 4:27 PM
    I am trying to get ingestion via graphql running and I am struggling with building the necessary query. I am getting the following error:
    Copy code
    {
      "errors": [
        {
          "message": "Invalid Syntax : offending token '\": {\"' at line 9 column 28",
          "locations": [
            {
              "line": 9,
              "column": 28
            }
          ],
          "extensions": {
            "classification": "InvalidSyntax"
          }
        }
      ],
      "data": null,
      "extensions": {}
    }
    Here is what I cooked up with python. I am guessing, that somehow the way I am filling recipe with a json string is causing the issue. But I don't have a real solution how to include a JSON string in the query. query = """ mutation { createIngestionSource( input: { type: "mssql" name: "some.name" description: "some description" config: { recipe: "$recipe" executorId: "default" } } ) }""" recipe = { "source": { "type": "mssql", "config": { "host_port": "host.com", "database": "db, "username": "user", "password": "***", "schema_pattern": {"allow": ["schema"]} } } } query = query.replace("$recipe", json.dumps(recipe))
    πŸ” 1
    πŸ“– 1
    l
    a
    b
    • 4
    • 4
  • c

    clever-magician-79463

    04/17/2023, 4:45 PM
    Hi, I am trying to ingest metadata from redshift, I wanted to understand, how the data is ingested? Like does it take change delta or scans all the tables each time the ingestion job is run. If there is any config i must use to control that, can anyone please point that out. I ask this because when i try to ingest data, datahub consumes all my cluster bandwidth and essentially chokes the redshift. This cause all other read queries to get piled up and ultimately the system crashes. If it reads all the data each time ingestion queries are run then it will be difficult to adapt datahub for our data governance purpose. Please help with this query as this is our only blocker for now. If anyone wants to connect we can discuss on direct messages as well. Thanks in Advance πŸ™‚
    l
    a
    • 3
    • 5
  • r

    rich-state-73859

    04/17/2023, 9:28 PM
    Hi, I’m running datahub quickstart, the version is v0.10.2. The ingestion job runs successfully, but it shows
    No Metadata Found
    on the main page and there’s no results if I click dbt or athena under the platform part.
    βœ… 1
    πŸ“– 1
    πŸ” 1
    l
    a
    • 3
    • 5
  • i

    important-area-90857

    04/18/2023, 3:04 AM
    Hi, does the latest version of DataHub support ingesting Apache Doris?
    βœ… 1
    l
    m
    • 3
    • 11
1...115116117...144Latest