https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • q

    quiet-jelly-11365

    01/25/2023, 10:30 AM
    Hi team, I am just getting started do we have REST APIs into datahub that allows me to pull the metadata from other code bases ?
    ✅ 1
    d
    • 2
    • 1
  • c

    cuddly-journalist-42219

    01/25/2023, 1:53 PM
    Hi there. Just getting started, and trying out the the
    datahub docker quickstart
    , and getting the error “datahub-gms is running but not healthy”. I can add any helping info in the thread
    ✅ 1
    b
    o
    • 3
    • 9
  • a

    aloof-iron-76856

    01/25/2023, 2:58 PM
    Hi, I'm trying to use:
    datahub docker quickstart
    , and getting the error:
    Copy code
    Unable to run quickstart - the following issues were detected:
    - kafka-setup container is not present
    Windows 10; Rancher Desktop using WSL Ubuntu-22.04; Before switching to Rancher, I used Docker Desktop and had no problems. Can you suggest what the problem might be and how to solve it?
    👀 1
    ✅ 1
    g
    a
    • 3
    • 7
  • m

    magnificent-exabyte-12941

    01/25/2023, 3:41 PM
    Hello! DataHub is really great instrument and my team are delighted with it 🙂 I am exploring the possibility of adding "owners" to the fields of datasets (because different people may be responsible for the content of different columns of dataset). Maybe someone has already done this feature (in his fork) or are there plans to do it? Thank you!
    👀 1
    plus1 1
    ✅ 1
    a
    • 2
    • 3
  • v

    victorious-school-96321

    01/26/2023, 2:08 PM
    Hello everyone!!! We intend to use DataHub to help us to organize and democratize data in our organization. It seems to be a great tool!!! We have some projects related to Open Data and we need to publish the metadata related to this data too (schema, documentation, glossary terms, etc.). In a first look at DataHub resources we don't find a way to publish these resources to public access, without authentication. Are we missing something or there is no way to do that? Thank you!
    b
    b
    • 3
    • 5
  • f

    fierce-lunch-75215

    01/26/2023, 5:22 PM
    Hi everyone, I'm trying to deploy datahub on azure kubernetes. I was able to do it but at the end of the ingestion execute (done via UI) I get this error: _'task_id=0e6e5365-85f3-4699-a50e-1e9957ad5640, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'_ _' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'_ _' task_event_loop.run_until_complete(task_future)\n'_ _' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'_ ' return future.result()\n' _' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 168, in execute\n'_ ' raise TaskError("Failed to execute \'datahub ingest\'")\n' "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"] Lineage is also not working, any ideas?
    ✅ 1
    o
    • 2
    • 5
  • w

    wonderful-notebook-20086

    01/26/2023, 11:44 PM
    Hi all - Just want to take a straw poll … as our team begins to think about how we want to use Datahub, we’re envisioning starting with an independent queryable store becoming the backend for most of the lineage related metadata (likely Neptune - maybe Neo4J) that would then push the metadata to Datahub. A quick Slack search makes it seem that there’s more support for Neo4J than Neptune (non-existent?) What would be signing up for if we were committed to deploying to AWS and using native AWS services (Neptune + Opensearch?)
    ✅ 1
    o
    • 2
    • 1
  • d

    delightful-pizza-28599

    01/27/2023, 1:39 PM
    Hello everyone. I don't understand the logic of datahub ingesting processes from multiple servers. I have nearly 50 production servers with pretty similar MySQL schema (database names and most of the table names are the same in different environments), and I'm gathering all data from these servers in data lake. I want to create the data catalog on top of the ETL process to show that I have 50 servers with pretty similar structure and show in UI that 50 servers with the same database structure load the data into one place. But I tried to make a demo and see that after loading metadata from 2 different servers with the same database structure (the only one difference is host IPs) it's loaded into 1 metadata model. How can I adjust properties of new ingestion processes to make 2 different metadata models with the same structure but different tags based on host IPs for example?
    a
    • 2
    • 1
  • a

    astonishing-twilight-39885

    01/27/2023, 6:11 PM
    Hello, Trying to run quickstart:
    datahub docker quickstart
    Consistently getting:
    Copy code
    Unable to run quickstart - the following issues were detected:
    - kafka-setup container is not present
    Could you, please, help with that? Thanks!
    👀 1
    l
    g
    +3
    • 6
    • 12
  • b

    bland-orange-13353

    01/27/2023, 6:11 PM
    If you’re having trouble with quickstart, please make sure you’re using the most up-to-date version of DataHub by following the steps in the quickstart deployment guide: https://datahubproject.io/docs/quickstart/#deploying-datahub. Specifically, ensure you’re up to date with the DataHub CLI:
    Copy code
    python3 -m pip install --upgrade pip wheel setuptools
    python3 -m pip install --upgrade acryl-datahub
    datahub version
  • c

    cool-lifeguard-67046

    01/30/2023, 5:09 AM
    Hi Team, I'm trying to create sample demo for data hub,i tried with docker guide not able to start with the demo Can someone please help me.
    ✅ 1
    b
    b
    • 3
    • 10
  • q

    quaint-belgium-35390

    01/30/2023, 7:39 AM
    Hi all, I just deploy datahub in GCP VM using docker-compose file, When i try to integrate airflow (separate VM) with datahub, I got this error about serialization,
    Copy code
    [2023-01-30, 13:47:35 WIB] {_plugin.py:147} INFO - Emitting Datahub Dataflow: DataFlow(
      urn=<datahub.utilities.urns.data_flow_urn.DataFlowUrn object at 0x7f6420f75890>,
      id='raw_partner.vw_transactions_partner', orchestrator='airflow', cluster='stg',
      name=None, description='Extract partner vw_transactions_partner from BigQuery\n\n',
      properties={
        '_access_control': 'None', '_default_view': "'tree'", 'catchup': 'False',
        'fileloc': "'/opt/airflow/dags/extraction/bq/dag_extraction_partner_vw_transactions_partner_generated.py'",
        'is_paused_upon_creation': 'None', 'start_date': "datetime.datetime(2020, 12, 31, 17, 0, tzinfo=Timezone('UTC'))",
        'tags': "['extraction', 'raw', 'partner']", 'timezone': "Timezone('Asia/Jakarta')"
        },
      url='<http://redacted_host_ip:8080/tree?dag_id=raw_partner.vw_transactions_partner>',
      tags={'partner', 'raw', 'extraction'}, owners={'airflow'}
      )
    [2023-01-30, 13:47:35 WIB] {_plugin.py:121} ERROR - Error sending metadata to datahub: (
      'Unable to emit metadata to DataHub GMS',
      {
        'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
        'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: org.apache.kafka.common.errors.SerializationException: Error serializing Avro message\n',
        'message': 'org.apache.kafka.common.errors.SerializationException: Error serializing Avro message',
        'status': 500}
    )
    Traceback (most recent call last):
      File "/home/airflow/.local/lib/python3.7/site-packages/datahub/emitter/rest_emitter.py", line 256, in _emit_generic
        response.raise_for_status()
      File "/home/airflow/.local/lib/python3.7/site-packages/requests/models.py", line 953, in raise_for_status
        raise HTTPError(http_error_msg, response=self)
    requests.exceptions.HTTPError: 500 Server Error: Server Error for url: <http://redacted_host:8080/aspects?action=ingestProposal>
    please help, thanks ps: datahub-version
    v0.9.6
    ✅ 1
    b
    • 2
    • 2
  • w

    wonderful-spring-3326

    01/30/2023, 8:09 AM
    Hi all, I'm looking to configure datahub on kubernetes using the helm chart, and I'm stuck on generating an access token for programmatic ingestion of metadata: I keep getting
    Token based authentication is currently disabled. Contact your DataHub administrator to enable this feature.
    I understand metadata_service_authentication should be enabled, which it is (in the helm values.yaml at least, and redeployed too), so I'm not sure what I'm missing here. I've tried with the root user and my own user (with admin role) so that should be fine too.
    teamwork 1
    ✅ 2
    b
    • 2
    • 11
  • f

    fancy-evening-4658

    01/30/2023, 8:27 AM
    Hi All, I am trying to understand the dataprofiling capabilities of DataHub. From the documentation, I could find only this page - https://datahubproject.io/docs/metadata-ingestion/docs/dev_guides/sql_profiles. How do we trigger/enable SQL profiling? Can it be performed on a subset of the data in a table?
    ✅ 1
    c
    b
    +2
    • 5
    • 26
  • e

    elegant-salesmen-99143

    01/30/2023, 1:19 PM
    Hi. I want to write a transformer that adds a certain tag to airflow pipelines that have names with a certain pattern (begin with
    de_*
    ). Can someone please help me, how do I write this expression for transformer rule?
    h
    • 2
    • 2
  • f

    fierce-agent-11572

    01/30/2023, 4:41 PM
    Hello i nedd you help, i'd like to install datahub in AWS ECS, do you confirme that works well in ECS or you propose to install it into a EKS cluster ? if you are agree with ECS ? can you recommande me a good documentation to do this installaiton thanks
    ✅ 2
    f
    b
    • 3
    • 4
  • f

    faint-ram-72591

    01/30/2023, 4:53 PM
    Hi folks, I was just wondering if it’s possible to use datahub as the catalog for spark, like hive metastore and glue does. If that’s possible, could you point me some docs about it?
    b
    m
    • 3
    • 9
  • v

    victorious-school-96321

    01/30/2023, 5:26 PM
    Hello. I need to configure my DataHub instance to allow public access, like https://demo.datahubproject.io/. Can anyone help me with this?
    b
    b
    • 3
    • 5
  • b

    bitter-quill-61460

    01/31/2023, 11:04 AM
    Hello. I deployed pods to AWS EKS per https://datahubproject.io/docs/deploy/aws. I can use the datahub normally with kubectl port-forward but there is some issue when load balancer is used. I can see the login pages in case of using load balancer, after datahub/datahub are entered, following page is shown
    b
    • 2
    • 12
  • p

    purple-terabyte-64712

    01/31/2023, 2:22 PM
    Hello Team, I try to integrate DataHub with Airflow, but in my case both are official helm chart installation, and Airflow is not running locally. How can I integrate them? Thanks.
    b
    • 2
    • 2
  • f

    faint-ram-72591

    01/31/2023, 2:41 PM
    Hi team, ingesting hudi metadata is just possible using the push approach with the hudi sync tool or is it possible do extract similarly we do with delta tables?
    ✅ 1
    l
    • 2
    • 2
  • b

    bitter-translator-92563

    01/31/2023, 4:09 PM
    Hi all. I trying to find a way to collect and visualise (optional) about the data flows for the data sources described in DataHub. Terms of the glossary - is the level of details I'd like to get as a result. Data lineage in DataHub seems to be a good tools for the task but it's not very clear whether I can use glossary terms in lineage in any way. Seems that lineage is aplicable for the physical assets only. Could anyone give any tip on my case please?
    a
    • 2
    • 4
  • f

    fresh-lion-91827

    01/31/2023, 4:34 PM
    Hi everyone 👋 - I’ve heard incredibly good things about datahub, so I wanted to spin it up locally (M1, Ventura 13.0.1) and play around a bit. However, I’m having troubles getting
    quickstart
    to run with the latest version of
    docker-compose-without-neo4j-m1.quickstart.yml
    (link)
    Copy code
    [+] Running 11/11
     ⠿ Container elasticsearch              Running                                                                                                                                                         0.0s
     ⠿ Container zookeeper                  Running                                                                                                                                                         0.0s
     ⠿ Container elasticsearch-setup        Started                                                                                                                                                         0.8s
     ⠿ Container mysql                      Running                                                                                                                                                         0.0s
     ⠿ Container broker                     Running                                                                                                                                                         0.0s
     ⠿ Container datahub-gms                Running                                                                                                                                                         0.0s
     ⠿ Container schema-registry            Running                                                                                                                                                         0.0s
     ⠿ Container mysql-setup                Started                                                                                                                                                         0.7s
     ⠿ Container datahub-frontend-react     Running                                                                                                                                                         0.0s
     ⠿ Container kafka-setup                Started                                                                                                                                                         0.8s
     ⠿ Container datahub-datahub-actions-1  Running                                                                                                                                                         0.0s
    .............
    Unable to run quickstart - the following issues were detected:
    - datahub-gms is running but not healthy
    
    If you think something went wrong, please file an issue at <https://github.com/datahub-project/datahub/issues>
    or send a message in our Slack <https://slack.datahubproject.io/>
    Be sure to attach the logs from /var/folders/55/d2bjkrjd693_3qcwmz41sjhc0000gn/T/tmpqwxmvsoo.log
    b
    b
    • 3
    • 40
  • b

    bland-orange-13353

    01/31/2023, 4:34 PM
    If you’re having trouble with quickstart, please make sure you’re using the most up-to-date version of DataHub by following the steps in the quickstart deployment guide: https://datahubproject.io/docs/quickstart/#deploying-datahub. Specifically, ensure you’re up to date with the DataHub CLI:
    Copy code
    python3 -m pip install --upgrade pip wheel setuptools
    python3 -m pip install --upgrade acryl-datahub
    datahub version
  • w

    wonderful-spring-3326

    02/01/2023, 7:02 AM
    Hi all, I was wondering how to bootstrap datahub automatically, as we're planning to start/stop/redeploy to different k8s clusters often at the moment (we're using the helm chart) Specifically: • how to automatically have specific ingestions ready at startup • how to automatically have specific users / roles ready at startup (currently struggling with the right access rights in azure to import users from azure AD, so if this is covered (partially) by the above item, that could be awesome too) • how to backup/export everything (and load it all back when starting up automatically) (e.g. can everything be exported using the file sink, including settings for datahub itself? alternatively, if all settings can be set automatically on boot only exporting the metadata is fine of course) (if you'd rather have 3 separate threads for this, let me know, and I'll split it, and this can be the tread for the first one)
    s
    b
    • 3
    • 12
  • o

    orange-flag-48535

    02/01/2023, 9:38 AM
    Hi, I'm integrating a piece of software with Datahub. Specifically, I want to be able to launch Datahub as part of the integration tests for my project and write and read from Datahub. Is there a recommended way for this? Does Datahub work with testcontainers, for example?
    b
    • 2
    • 4
  • w

    witty-school-76151

    02/01/2023, 1:09 PM
    Hi, can anyone help me and show me some documentation how I can create a dataset composed of several tables including fields, fine-grained lineage on field level, by using REST directly? Is there any JSON documentation available for a dataset entity? Thank you very much
    b
    • 2
    • 15
  • r

    ripe-eye-60209

    02/01/2023, 5:47 PM
    Given a container urn, how can related-entities be retrieved using python sdk?
    ✅ 1
    b
    • 2
    • 1
  • r

    ripe-eye-60209

    02/01/2023, 5:49 PM
    e.g,
  • f

    famous-quill-82626

    02/02/2023, 12:06 AM
    Hi. We have recently installed DataHub (v0.9.0) via Kubernetes. We have also enabled SSO and can log in as either: • admin "datahub" user • authenticated SSO (AzureAD) user However neither of these users is able to see themselves under
    Users & Groups
    Is this expected behaviour? Or do I need to provide a Role/Privilege for this?
    b
    b
    • 3
    • 7
1...535455...80Latest