https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • g

    green-diamond-58921

    04/20/2023, 7:46 AM
    Hi team. I am doing POC for data hub in my organization. I run datahub docker quickstart but got this error. Bad Gateway . Unable to run quickstart - the following issues were detected: - quickstart.sh or dev.sh is not running.
    l
    b
    j
    • 4
    • 5
  • l

    loud-librarian-93625

    04/20/2023, 12:56 PM
    Hi, I'm trying to ingest the sample data with
    datahub docker ingest-sample-data
    but I'm getting the error below. Windows 10, Python 3.10.0
    Copy code
    Traceback (most recent call last):
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\datahub\entrypoints.py", line 182, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1130, in __call__
        return self.main(*args, **kwargs)
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1055, in main
        rv = self.invoke(ctx)
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 760, in invoke
        return __callback(*args, **kwargs)
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\datahub\telemetry\telemetry.py", line 379, in wrapper
        raise e
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\datahub\telemetry\telemetry.py", line 334, in wrapper
        res = func(*args, **kwargs)
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\datahub\cli\docker_cli.py", line 945, in ingest_sample_data
        pipeline.run()
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\datahub\ingestion\run\pipeline.py", line 359, in run
        for wu in itertools.islice(
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\datahub\utilities\source_helpers.py", line 115, in auto_workunit_reporter
        for wu in stream:
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\datahub\ingestion\source\file.py", line 215, in get_workunits_internal
        for i, obj in self.iterate_generic_file(f):
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\datahub\ingestion\source\file.py", line 342, in iterate_generic_file
        for i, obj in self._iterate_file(path):
      File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\datahub\ingestion\source\file.py", line 256, in _iterate_file
        raise ConfigurationError(f"Cannot read remote file {path}, error:{e}")
    datahub.configuration.common.ConfigurationError: Cannot read remote file C:\Users\XXXXXXX\AppData\Local\Temp\tmpsbsy2hpx.json, error:No connection adapters were found for 'C:\\Users\\XXXXXXX\\AppData\\Local\\Temp\\tmpsbsy2hpx.json'
    πŸ” 1
    βœ… 2
    πŸ“– 1
    l
    b
    +4
    • 7
    • 32
  • f

    faint-australia-24591

    04/20/2023, 3:24 PM
    Hi everyone. I was trying to create column lineage on datahub like this blogpost https://blog.datahubproject.io/its-here-say-hello-to-column-level-lineage-in-datahub-dfdeaaefa567 But I can't find any resources or documentation on how to do this.
    l
    b
    • 3
    • 6
  • v

    victorious-monkey-86128

    04/20/2023, 8:44 PM
    Hello! I don't know if this is the right channel but some pages on the DataHub docs on DataHub Documentation no longer exist! e.g. https://datahubproject.io/docs/api/tutorials/creating-users-and-groups/ and https://datahubproject.io/docs/api/tutorials/creating-datasets/. Is there a way to access these pages again? They were very helpful and I need to have access to them! Thanks!
    l
    a
    • 3
    • 5
  • r

    rich-greece-41339

    04/21/2023, 1:49 AM
    hey all, I am very new to Datahub and I am looking it as a solution for a govt enterprise. I have been able to use the quickstart to get it running locally. How do I get onto next steps of defining ingestion sources etc. What I am really after is a template repository structure that I can extend and then deploy it into AWS eventually.
    πŸ” 1
    πŸ“– 1
    l
    d
    • 3
    • 4
  • a

    able-action-29338

    04/21/2023, 12:20 PM
    Hi team firstly great job on the Data Hub Tool. i am looking to introduce this within our company I have one question: In regards to tableau integration, can tableau cloud also be integrated or it only applies to tableau server?
    l
    d
    • 3
    • 2
  • b

    bland-orange-13353

    04/21/2023, 1:24 PM
    This message was deleted.
    l
    • 2
    • 1
  • h

    hallowed-petabyte-25444

    04/21/2023, 1:27 PM
    Hi Team, I am currently exploring the features of the Datahub tool and I got to know that there is some limitation for classifying the data. (It only classifies the data for Snowflake) May I know can we classify the other databases also (like PostGreSQL, MySQL etc.) Thanks in advance
    πŸ” 1
    πŸ“– 1
    l
    w
    +3
    • 6
    • 7
  • s

    stale-teacher-93393

    04/24/2023, 7:26 AM
    Hi Team I'm trying to install Datahub on ec2 ubuntu instance. I have airflow and dbt on the ec2. Want to install Datahub also in the same. Facing some issues following the guide here https://datahubproject.io/docs/quickstart/
    Copy code
    datahub docker quickstart
    [2023-04-24 07:26:10,317] INFO     {datahub.cli.quickstart_versioning:144} - Saved quickstart config to /home/ubuntu/.datahub/quickstart/quickstart_version_mapping.yaml.
    [2023-04-24 07:26:10,318] INFO     {datahub.cli.docker_cli:643} - Using quickstart plan: composefile_git_ref='master' docker_tag='head'
    Docker doesn't seem to be running. Did you start it?
    πŸ” 1
    πŸ“– 1
    l
    d
    • 3
    • 2
  • p

    polite-motherboard-47818

    04/24/2023, 9:05 AM
    Hello folks, I'm interested in extracting all the metadata (schemas, tables and columns) my company have in DataHub, is there any documentation I could check to do it?
    πŸ” 1
    πŸ“– 1
    l
    h
    m
    • 4
    • 4
  • b

    billions-butcher-90660

    04/24/2023, 11:03 AM
    Hello Folks! I'm a newbie I'm trying to connect to SAP My connection string look like this ASHOST=myserver.com USER=MYUSER SYSNR=00 CLIENT=800 LANG=EN PASSWD=MYPASSWORD
    πŸ“– 2
    πŸ” 2
    l
    a
    • 3
    • 2
  • b

    bland-orange-13353

    04/24/2023, 11:05 AM
    This message was deleted.
    βœ… 1
    πŸ“– 1
    πŸ” 1
    l
    • 2
    • 1
  • p

    proud-dusk-671

    04/24/2023, 4:44 PM
    Hi, The architecture of my data platform is such that data moves from Kafka to S3 to Snowflake. On Snowflake, we have written Airflow tasks that convert one table to another. I have ingested metadata from the three and it looks good. Now, I want to be able to view the lineage around this (Kafka -> S3 -> Snowflake table1 -> Airflow -> Snowflake table2 -> Airflow -> Snowflake table3). In this, do I have to manually add upstream and downstream for each of these platforms and draw the lineage or can it be done automatically somehow? PS. Our Snowflake is the Standard Edition and not the Enterprise Edition
    πŸ“– 1
    πŸ” 1
    l
    a
    • 3
    • 5
  • b

    bland-orange-13353

    04/25/2023, 4:37 AM
    This message was deleted.
    βœ… 1
    l
    • 2
    • 1
  • p

    plain-analyst-30186

    04/25/2023, 4:51 AM
    Hi team. When executing the elasticsearch curl command, only the ".ds-datahub_usage_event-000001" is visible. "command: curl --location --request GET 'http://localhost:9200/_cat/indices?v' " Other index not found in elasticsearch How are elasticsearch indexes created? datahub version is 0.10.0
    l
    a
    • 3
    • 2
  • b

    billions-baker-82097

    04/25/2023, 5:24 AM
    Is there any way for the type 'openapi' we can ingest metadata for the POST and others methods ?
    πŸ“– 1
    l
    d
    b
    • 4
    • 5
  • e

    early-area-1276

    04/25/2023, 6:03 AM
    Hey there πŸ‘‹ I'm The DataHub Community Support bot. I'm here to help make sure the community can best support you with your request. Let's double check a few things first: 1️⃣ There's a lot of good information on our docs site: www.datahubproject.io/docs, Have you searched there for a solution? Yes button 2️⃣ It's not uncommon that someone has run into your exact problem before in the community. Have you searched Slack for similar issues? Yes button
    βœ… 1
    l
    • 2
    • 1
  • f

    freezing-sunset-28534

    04/25/2023, 6:11 AM
    Hi team, Could you pls share the detailed description of the physical data model of Datahub. We'd like to create or modify the metadata/lineage from the code level directly. Until now we've read python code and tried to ceate a new lineage, but it's not created correctly. We very appreciate that if you may simply guide us how to create the lineage via coding from the scratch.
    πŸ” 1
    πŸ“– 1
    l
    g
    +2
    • 5
    • 8
  • r

    rich-napkin-34055

    04/25/2023, 9:16 AM
    Hello, I have created a user account with the role read in datahub but my problem is the user sees all while I have created a policies so that he sees only the metadata allow. I need help please. Thanks in advance.
    πŸ” 1
    l
    d
    b
    • 4
    • 3
  • f

    full-beach-33961

    04/25/2023, 1:10 PM
    Hi friends, I was wondering if you could comment on these features , using the Python API - β€’ Can I get a specific version of an entity. By default the latest is returned. β€’ How can I relate two entities : Is it via lineages. β€’ Can I add custom fields to the ingestion , example, I have a field "sensitive" (Y/N, or T/F) to flag security-sensitive fields.
    l
    b
    +2
    • 5
    • 16
  • g

    gifted-market-81341

    04/25/2023, 2:05 PM
    Hello Everyone, we have an internal implementation of DataHub working on ingesting data from MSSQL and BigQuery, so far everything has been done through the UI. However, we are expanding our use of the tool a little bit more and are planning on using the Python Emitter SDK. I was wondering if someone can just confirm my understanding of the below points: β€’ SDK Can be used to create Datasets, Fields for those data sets and set various attributes inside the dataset β€’ SDK Can be used to create lineage between datasets both upstream and downstream Thanks in advance and I appreciate you help πŸ™‚
    πŸ” 1
    πŸ“– 1
    l
    d
    • 3
    • 3
  • a

    adamant-honey-44884

    04/25/2023, 3:08 PM
    Hello, we have setup DataHub in AWS as a POC so we can evaluate it using our own metadata. Other than some permissions issues that I am working on with my SRE team for other sources, I was able to get an MSSQL Ingestion source setup to connect to one of our databases. It ran successfully a few times. I added the configuration to deny some schemas from being included which ran successfully the first time but as I added more schemas it eventually failed and now even with that configuration removed it is failing consistently. This is done via the UI. DataHub Version: v0.10.2 Configuration yaml:
    Copy code
    source:
        type: mssql
        config:
            host_port: 'some.database:1433'
            env: DEV
            database: mydatabase
            include_views: true
            include_tables: true
            profiling:
                enabled: false
            stateful_ingestion:
                enabled: true
            username: DataHub_App
            password: '${PASSWORD_DEV}'
    Error Log:
    Copy code
    [2023-04-25 14:58:07,151] INFO     {datahub.cli.ingest_cli:137} - Sink (datahub-rest) report:
    {'total_records_written': 1153,
     'records_written_per_second': 98,
     'warnings': [],
     'failures': [],
     'start_time': '2023-04-25 14:57:55.502886 (11.65 seconds ago)',
     'current_time': '2023-04-25 14:58:07.151328 (now)',
     'total_duration_in_seconds': 11.65,
     'gms_version': 'v0.10.2',
     'pending_requests': 0}
    [2023-04-25 14:58:07,339] ERROR    {datahub.entrypoints:195} - Command failed: 
    Traceback (most recent call last):
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/entrypoints.py", line 182, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
        return self.main(*args, **kwargs)
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/core.py", line 1055, in main
        rv = self.invoke(ctx)
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/core.py", line 760, in invoke
        return __callback(*args, **kwargs)
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
        raise e
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
        res = func(*args, **kwargs)
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
        return func(ctx, *args, **kwargs)
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 198, in run
        loop.run_until_complete(run_func_check_upgrade(pipeline))
      File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
        return future.result()
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 158, in run_func_check_upgrade
        ret = await the_one_future
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 149, in run_pipeline_async
        return await loop.run_in_executor(
      File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 140, in run_pipeline_to_completion
        raise e
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 132, in run_pipeline_to_completion
        pipeline.run()
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 359, in run
        for wu in itertools.islice(
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/utilities/source_helpers.py", line 104, in auto_stale_entity_removal
        yield from stale_entity_removal_handler.gen_removed_entity_workunits()
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/ingestion/source/state/stale_entity_removal_handler.py", line 267, in gen_removed_entity_workunits
        last_checkpoint: Optional[Checkpoint] = self.source.get_last_checkpoint(
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/ingestion/source/state/stateful_ingestion_base.py", line 320, in get_last_checkpoint
        self.last_checkpoints[job_id] = self._get_last_checkpoint(
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/ingestion/source/state/stateful_ingestion_base.py", line 295, in _get_last_checkpoint
        self.ingestion_checkpointing_state_provider.get_latest_checkpoint(
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/ingestion/source/state_provider/datahub_ingestion_checkpointing_provider.py", line 76, in get_latest_checkpoint
        ] = self.graph.get_latest_timeseries_value(
      File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/ingestion/graph/client.py", line 299, in get_latest_timeseries_value
        assert len(values) == 1
    AssertionError
    Thank you for the help.
    πŸ“– 1
    πŸ” 1
    l
    d
    +5
    • 8
    • 11
  • b

    best-yacht-69562

    04/26/2023, 7:40 PM
    Hey all! I'm exploring datahub as a possible project to take on in my new role, and was wondering what data access datahub gets when it's connected to a database? I didn't see this mentioned in the documentation anywhere. Is there no issue because it's hosted locally on your docker image? Thanks for what may be a silly question! Sorry if this a silly question! If you could point me in the right direction that'd be super helpful!
    πŸ“– 1
    🩺 1
    πŸ” 1
    βœ… 2
    l
    l
    • 3
    • 4
  • e

    early-hydrogen-27542

    04/26/2023, 9:06 PM
    πŸ‘‹ everyone - how would I get a list of all possible privileges (e.g. EDIT_ENTITY_DOCS, EDIT_ENTITY_TAGs, etc.)? I'm looking for possible privileges, not just those that are configured. I'm specifically looking for the correct privileges to allow deletion of entity docs links and addition of entity queries.
    πŸ” 1
    πŸ“– 1
    βœ… 1
    l
    b
    b
    • 4
    • 7
  • h

    hundreds-accountant-40883

    04/27/2023, 1:56 AM
    Hello a total noob here, does DataHub support ingesting OpenLineage data format?
    πŸ“– 1
    πŸ” 1
    l
    b
    • 3
    • 2
  • r

    red-telephone-12711

    04/27/2023, 6:37 AM
    Hello there! I'm going to try enable and use Stats tab for our data sources. Could you please tell supported or not profiling for source tables on Postgres, Greenplum and Clickhouse?
    πŸ“– 1
    πŸ” 1
    l
    m
    • 3
    • 10
  • s

    shy-kitchen-7972

    04/28/2023, 8:03 AM
    Hi all, I'm running the following query to fetch schemafield information.
    Copy code
    {
      search(input: {type: DATASET , query: "mig6", start: 0, count: 100}) {
        searchResults {
          entity {
            ... on Dataset {
              urn
              type
              schemaMetadata{version fields { fieldPath label nullable type nativeDataType description glossaryTerms{terms{associatedUrn term{urn properties {
                description
                sourceRef
                sourceUrl
                rawSchema
              }}}} tags { tags { tag { urn properties { name }}}}} primaryKeys}
            }
          }
        }
      }
    }
    I expect a list of terms associated to some fields but it does not return anything. In the openapi I do receive the assigned glossary terms. Anyone else experiencing this issue? I get the same results in the demo environment. Not sure if I'm maybe not formatting the graphql query properly.
    l
    b
    • 3
    • 5
  • p

    proud-dusk-671

    04/28/2023, 4:03 PM
    Hi team, I have the following use-case that I am unable to gather any information on. In our system, data moves from Kafka topics to S3 and then to Snowflake. Can somebody tell us what is the method to draw lineage from Kafka to S3 and then to Snowflake tables. FYI, I was able to ingest metadata from these sources independently.
    πŸ” 1
    πŸ“– 1
    l
    m
    m
    • 4
    • 7
  • b

    brave-mouse-33819

    04/28/2023, 8:24 PM
    Hey everyone!, I have a question about the UI of Properties Tab in Dataset Page. When we add dataset that has siblings in it, Then in properties tab it shows both siblings custom properties as well as the custom properties of its own β€’ Why it is doing that? Is there any particular benefits or side effects? β€’ Because β€’ If user want to see custom-properties of the sibling then user can β€’ simply see by visiting the properties tab through indivisual sibilings β—¦ i.e inside Composed Of β€’ If we have more then one siblings, then which one will we show in properties tab? Ref PR https://github.com/datahub-project/datahub/pull/5390
    l
    m
    b
    • 4
    • 5
  • a

    abundant-smartphone-50803

    05/01/2023, 1:44 AM
    Hi I am trying to deploy Datahub on AWS and would like to use AWS managed Elastic cluster. In the Org I am working in it's not allowed to authenticate to Elastic Search via user name and password so I am trying to enable AWS Auth but it's not working it seems
    l
    b
    a
    • 4
    • 3
1...616263...80Latest