https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • b

    better-orange-49102

    03/17/2023, 6:32 AM
    demo datahub site is down...
    ✅ 1
    d
    • 2
    • 1
  • a

    astonishing-dusk-99990

    03/17/2023, 9:29 AM
    Hello I want to ask something related OIDC in kubernetes. Basically when I read in here OIDC set as false and we can enable and set another variable for OIDC. In the meantime, when I read in here documentation it said that we can put it in
    extraEnvs
    So my question is, if we want to use OIDC do we need to set all enable or we can choose only one? Let’s just say I want to use in
    extraEnvs
    but in above variable I must set false. Is it working?
    ✅ 1
    a
    c
    • 3
    • 7
  • m

    microscopic-leather-94537

    03/17/2023, 10:02 AM
    hi folks ! I am using data hub , but I want to restore my datahub information . I followed the commands and steps to create backup.sql file. When I downloaded datahub on a new sytema and used the command to restore that sql backup file , I expected to get same information and restored databut I didnt any one has done it or can help me out ?
    ✅ 1
    a
    • 2
    • 1
  • b

    brave-judge-32701

    03/20/2023, 11:05 AM
    hi all, I found there is a file metadata-integration/java/spark-lineage/src/test/java/datahub/spark/TestCoalesceJobLineage.java I want to know what is
    CoalesceJobLineage
    ?
    ✅ 1
    a
    d
    h
    • 4
    • 7
  • e

    elegant-salesmen-99143

    03/20/2023, 1:53 PM
    Is it just me or is Documentation on GraphQL pretty confusing? For example, in SearchAcrossEntitiesInput argements: https://datahubproject.io/docs/graphql/inputObjects/#searchacrossentitiesinput Argument
    query
    . What does it mean? What kind of queries are possible? Some examples would be nice🥹
    🩺 1
    ✅ 1
    👍 1
    a
    b
    m
    • 4
    • 11
  • p

    polite-lawyer-82760

    03/21/2023, 6:41 PM
    Can someone please share the pricing for managed/hosted datahub, and different modules ?
    ✅ 1
    d
    e
    • 3
    • 2
  • c

    colossal-autumn-78301

    03/22/2023, 1:47 PM
    Hi Team, my question is related to datahub design (storage and querying): • Why did
    datahub
    not choose Arangodb since it is multi-modal and it supports many of the
    datahub
    supported storage and querying capabilities? • Is it possible to
    auto-generate
    the
    graphql
    schema and the
    resolvers
    without manually updating. For example, using the tools like https://www.howtographql.com/graphql-java/11-alternative-approaches/? Any ideas? hints?
    ✅ 1
    a
    • 2
    • 2
  • s

    salmon-motherboard-58709

    03/23/2023, 9:24 AM
    Hi everyone! I'm new to this so keep that in mind! 😄 I have set up an Azure VM that i've installed docker and K8s on to run datahub locally on. Everything has been going well until i try to run: helm install datahub datahub/datahub My guess that there is something blocking in the network but i figured i'd post here to see if anyone has ran in to the same issue! I get this error when running: PS C:\Windows\system32> helm uninstall datahub release "datahub" uninstalled PS C:\Windows\system32> helm install datahub datahub/datahub --debug install.go194 [debug] Original chart version: "" install.go211 [debug] CHART PATH: C:\Users\ADMCHR~1\AppData\Local\Temp\helm\repository\datahub-0.2.160.tgz client.go477 [debug] Starting delete for "datahub-elasticsearch-setup-job" Job client.go133 [debug] creating 1 resource(s) client.go703 [debug] Watching for changes to Job datahub-elasticsearch-setup-job with timeout of 5m0s client.go731 [debug] Add/Modify event for datahub-elasticsearch-setup-job: ADDED client.go770 [debug] datahub-elasticsearch-setup-job: Jobs active: 0, jobs failed: 0, jobs succeeded: 0 client.go731 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED client.go770 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0 client.go731 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED client.go770 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0 client.go731 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED client.go770 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0 client.go731 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED client.go770 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0 client.go731 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED client.go770 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 1, jobs succeeded: 0 client.go731 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED client.go770 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 1, jobs succeeded: 0 client.go731 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED client.go770 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 1, jobs succeeded: 0 client.go731 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED client.go770 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 1, jobs succeeded: 0 client.go731 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED client.go770 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 2, jobs succeeded: 0 client.go731 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED client.go770 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 2, jobs succeeded: 0 Error: INSTALLATION FAILED: failed pre-install: timed out waiting for the condition helm.go84 [debug] failed pre-install: timed out waiting for the condition INSTALLATION FAILED main.newInstallCmd.func2 helm.sh/helm/v3/cmd/helm/install.go:141 github.com/spf13/cobra.(*Command).execute github.com/spf13/cobra@v1.6.1/command.go:916 github.com/spf13/cobra.(*Command).ExecuteC github.com/spf13/cobra@v1.6.1/command.go:1044 github.com/spf13/cobra.(*Command).Execute github.com/spf13/cobra@v1.6.1/command.go:968 main.main helm.sh/helm/v3/cmd/helm/helm.go:83 runtime.main runtime/proc.go:250 runtime.goexit runtime/asm_amd64.s:1571 PS C:\Windows\system32>
    a
    • 2
    • 2
  • i

    important-activity-29882

    03/23/2023, 1:50 PM
    I think demo website is down this morning ! https://datahubproject.io/docs/demo
    plus1 1
    ✅ 1
    m
    • 2
    • 1
  • r

    red-book-38331

    03/23/2023, 7:02 PM
    Hi guys, I'm following the quickstart guide to run datahub on my local laptop. Everything works fine but, when I access on the UI, it seems the
    datahub
    user doesn't have the admin privileges to use the UI ingestion or to edit his profile. Am I missing something?
    a
    • 2
    • 9
  • r

    red-book-38331

    03/23/2023, 10:08 PM
    hi guys, my datahub-action container seems stuck after getting this log
    Copy code
    [2023-03-23 22:02:54,369] INFO     {datahub_actions.cli.actions:76} - DataHub Actions version: unavailable (installed editable via git)
    [2023-03-23 22:02:54,584] WARNING  {datahub_actions.cli.actions:103} - Skipping pipeline datahub_slack_action as it is not enabled
    [2023-03-23 22:02:54,586] WARNING  {datahub_actions.cli.actions:103} - Skipping pipeline datahub_teams_action as it is not enabled
    [2023-03-23 22:02:54,587] INFO     {datahub_actions.cli.actions:119} - Action Pipeline with name 'ingestion_executor' is now running.
    No user action configurations found. Not starting user actions.
    Exception in thread Thread-1 (run_pipeline):
    Traceback (most recent call last):
      File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
        self.run()
      File "/usr/local/lib/python3.10/threading.py", line 953, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline_manager.py", line 42, in run_pipeline
        pipeline.run()
      File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline.py", line 166, in run
        for enveloped_event in enveloped_events:
      File "/usr/local/lib/python3.10/site-packages/datahub_actions/plugin/source/kafka/kafka_event_source.py", line 174, in events
        yield from self.handle_mcl(msg)
      File "/usr/local/lib/python3.10/site-packages/datahub_actions/plugin/source/kafka/kafka_event_source.py", line 182, in handle_mcl
        metadata_change_log_event = build_metadata_change_log_event(msg)
      File "/usr/local/lib/python3.10/site-packages/datahub_actions/plugin/source/kafka/kafka_event_source.py", line 80, in build_metadata_change_log_event
        return MetadataChangeLogEvent.from_class(
      File "/usr/local/lib/python3.10/site-packages/datahub_actions/event/event_registry.py", line 34, in from_class
        instance = cls.construct({})
    AttributeError: type object 'MetadataChangeLogEvent' has no attribute 'construct'. Did you mean: '_construct'?
    %4|1679609041.110|MAXPOLL|rdkafka#consumer-1| [thrd:main]: Application maximum poll interval (10000ms) exceeded by 150ms (adjust <http://max.poll.interval.ms|max.poll.interval.ms> for long-running message processing): leaving group
    anyone got this error before? Every ingestion is in pending state (I'm running on mac m1, with quickstart on docker providing 16gb of ram)
    plus1 1
    a
    h
    a
    • 4
    • 3
  • f

    freezing-account-90733

    03/24/2023, 5:03 AM
    Hi For few of the datasets the table name is displayed has platform_instance.schema.tablename and others just table name. Why we see this and is there a way we can fix this
    a
    • 2
    • 3
  • n

    nutritious-policeman-35027

    03/24/2023, 2:31 PM
    Hi. Is it possible to install DataHub on windows without using VM or Docker compose?
    i
    • 2
    • 1
  • b

    brief-bear-90340

    03/24/2023, 4:32 PM
    hello team, i am trying to setup datahub for bigquery and running into issue trying to setup the connection
    Copy code
    [12:05 PM] DefaultCredentialsError: ('Failed to load service account credentials from /tmp/tmp53gmekv8', ValueError('Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).', [<OpenSSLError(code=503841036, lib=60, reason=524556, reason_text=unsupported)>]))
    any help with this would be appreciated
    d
    • 2
    • 2
  • c

    calm-judge-34623

    03/24/2023, 7:00 PM
    * What went wrong: Execution failed for task 'li utilsgenerateDataTemplate'.
    'other' has different root
    Quick search of The Google and all that I see is that the build doesn't play nicely with Windows. Anyone have any ideas that could keep me from reinventing the wheel?
    a
    • 2
    • 1
  • r

    rough-lamp-22858

    03/26/2023, 12:55 PM
    @here Hello guys. I have the following error when go to analytics of datahub - "Charts failed to load -Validation error (FieldUndefined@[analyticsChart/rows/cells/linkParams/searchParams/filters/value]) : Field 'value' in type 'FacetFilter' is undefined". Any clue ? Thank you
    a
    a
    • 3
    • 3
  • b

    brave-judge-32701

    03/27/2023, 6:47 AM
    https://datahubproject.io/docs/api/tutorials/adding-lineage/ can I use java sdk to add lineage
    d
    • 2
    • 3
  • c

    colossal-pharmacist-88268

    03/27/2023, 7:56 AM
    Hello, According to the documentation, there is an Airflow operator DatahubEmitterOperator (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub_provider/example_dags/lineage_emission_dag.py ) Im comparing the functioning of that operator with the integration through Airflow Lineage Plugin. Am I right that you cannot build lineage of Dataset > Task with that operator? I was trying to build simple lineage using DatahubEmitterOperator, Dataset > Task and I used the code below:
    ```step = DatahubEmitterOperator(
    task_id="emit_lineage",
    datahub_conn_id="datahub_rest_default",
    mces=[builder.make_lineage_mce(
    upstream_urns=[
    builder.make_dataset_urn(PLATFORM_POSTGRES, "dvd.public.film", DEV_ENV)
    ],
    lineage_type=DatasetLineageTypeClass.TRANSFORMED,
    downstream_urn=builder.make_data_job_urn(orchestrator="airflow", flow_id="build_lineage", job_id="job_id_test", cluster=DEV_ENV)
    )
    ]
    )```
    The result was the error below:
    java.net.URISyntaxException: Urn entity type should be 'dataset'.: urnlidataJob:(urnlidataFlow:(*,build_lineage,DEV),job_id_test)
    Datahub Airflow lineage plugin can build such lineage, however DatahubEmitterOperator seems to be capable to build only Dataset > Dataset lineage. Is that true or I was doing something wrong?
    a
    • 2
    • 1
  • b

    brave-judge-32701

    03/28/2023, 8:30 AM
    I successfully tried to display the source table of spark’s CreateDataSourceTableAsSelectCommand in datahub.
    ✔️ 1
    ✅ 1
    thanks bear 1
    a
    m
    a
    • 4
    • 6
  • f

    fierce-agent-11572

    03/28/2023, 8:36 AM
    Hello All i have installed Datahub with datahub docker command in a EC2 with Multi Availibility Zone (3 az) with a load balancer but when i open users list page and i do refresh of the browser some times i have the users but when i do the refresh again i have an empty user list, so i must do many refresh to have the users lists my quesiton must i do any configuraiton to configure datahub towork well with multi az ??? thank you
    a
    o
    • 3
    • 4
  • g

    glamorous-microphone-33484

    03/29/2023, 12:54 AM
    Hi Datahub Team, 1. Will Datahub Ingest cli (Python), Datahub UI ingestion and GMS rest API be governed by access policies (https://datahubproject.io/docs/authorization/policies/) or apache ranger plugin? 2. On a related question related to Apache Ranger plugin, are you able to provide screenshots/examples on how to configure metadata privileges via the Ranger UI? I was able to verify that platform privileges could be offloaded to Ranger and Datahub was able to sync the policies from Ranger correctly. However, it was not the case for metadata privileges and I was not able to get Datahub to apply the metadata related policies from Ranger. Can the team verify that datahub integration with ranger is working with metadata privileges and we can offload them to Ranger? 3. What is the use of this env variable "REST_API_AUTHORIZATION_ENABLED"? It is not clearly documented in the project
    a
    • 2
    • 1
  • m

    millions-barista-69668

    03/29/2023, 8:14 AM
    Hi guys, I'm trying to run DataHub locally on Mac M1. When I run the docker-compose command all the containers seems to build and run correctly. But when I log into the UI, it says "Failed to log in. An unexpected error occurred". Am I missing a setup step?
    plus1 2
    r
    a
    • 3
    • 3
  • a

    ambitious-room-6707

    03/29/2023, 9:51 AM
    Hi team, during
    datahub init
    is it possible to supply the
    ca_certificate_path
    like in a recipe file? Currently having issues using datahub cli due to cert errors
    a
    • 2
    • 1
  • b

    busy-judge-73408

    03/29/2023, 3:12 PM
    Hi Team, are there any example scripts or pointers for how i could view all changes to a dataset over time? Specifically, i want to be able to say for Dataset X the following schema changes have been made in release Y (our product release). I see how i could use timeline api and the datahub cli to pull the history for a specific entity in the dataset but not how i can pull the same for all entities in the dataset. I'm trying to avoid needing to query all entities for the data set, then loop through them pulling the timeline api, then merging that back into a readable format. Let me know if you have done this or have any pointers. I couldnt find any views like this in the UI itself but if i missed it please let me know
    ✅ 1
    a
    • 2
    • 4
  • a

    able-hair-50437

    03/29/2023, 4:30 PM
    Hello all 🙂, I'm in the process of deploying Datahub using acryl-data Datahub Helm chart;https://github.com/acryldata/datahub-helm/tree/master/charts/datahub/subcharts/datahub-gms. Unfortunately, the GMS service is not starting as expected. It goes through all pre-flight checks but never actually starts listening on the port (also verified this with netstat). Now it get's killed after waiting for 5 minutes. I will post logs and describe in subposts
    m
    • 2
    • 9
  • g

    great-winter-52851

    03/30/2023, 1:37 AM
    Hi everybody, I'm working on getting through the Quickstart guide (Ubuntu Server 22.10). I am getting the error
    Docker doesn't seem to be running. Did you start it?
    from running
    python3 -m datahub docker quickstart
    . I verified that Docker is running and I am able to run
    sudo docker run hello-world
    succesfully. I believe I installed all required dependencies (in guide). python3 is version 3.10.7, Docker Engine and Compose installed as well. The command
    python3 -m datahub version
    does succeed. Any ideas on what I should I investigate?
    ✅ 1
    b
    • 2
    • 2
  • h

    high-airport-34841

    03/30/2023, 5:03 AM
    Hello, I know that DataHub collects(ingestion) and manages metadata. If so, are there functions that can be performed through machine learning? For example, the function to recommend and create tags for schemas through machine learning. If you have any related documents or content, please let me know.
    a
    • 2
    • 2
  • a

    adventurous-waiter-4058

    03/30/2023, 6:02 AM
    Hi All, I am new to datahub. I am trying to ingest metadata from glue catalog to Datahub REST. Facing error '`python3 -m datahub ingest -c demo.dhub.yaml`
    [2023-03-30 05:41:53,274] INFO     {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.1
    [2023-03-30 05:41:53,592] ERROR    {datahub.entrypoints:192} - Command failed: Failed to set up framework context: Failed to instantiate a valid DataHub Graph instance'0
    Datahub is hosted on AWS EKS and its up and running is there any way to resolve this or to downgrade Datahub to older version?
    ✅ 2
    a
    a
    +2
    • 5
    • 8
  • p

    polite-afternoon-10256

    03/30/2023, 7:15 AM
    Hi, Team. Does Datahub profiling support executing only new partitions each time ingestion is executed or profile incremental data on Hive?
    d
    • 2
    • 3
  • p

    proud-dusk-671

    03/30/2023, 7:30 AM
    Does Datahub have native support for Snowflake. What I mean by that is it possible to manage data products and their interaction on Snowflake through Datahub?
    m
    • 2
    • 12
1...585960...80Latest