https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • f

    few-rainbow-57094

    07/12/2022, 1:21 AM
    Good day I had the local version of datahub up and running, but deleted it to try to restart. But now I'm getting this error:
    Copy code
    datahub-graphql-core:compileJava FAILED
    or this
    Copy code
    failed to create parent directory
    Has anyone ever had this error before? Could it be linked to my jdk?
    e
    • 2
    • 5
  • g

    gorgeous-library-38151

    07/12/2022, 7:06 AM
    What does datahub suggest to deal with non-structural data like zip files. Is datahub able to manage them?
    c
    • 2
    • 3
  • m

    melodic-match-31187

    07/12/2022, 7:23 AM
    Hi everyone! I am new to DataHub! We have a use-case to solve for which we are looking into DataHub. We have our raw data coming from Kafka to GCS in Hudi format (raw layer). Next we want to create derived tables/layer after proper Schema validation check. We will be using Spark to move/process data from raw layer to derived layer. What we want to achieve via DataHub: • Store schema of tables in DataHub • Get the schemas in Spark from DataHub • Apply the schema to the data PS: We don't only want to use DataHub as Schema Registry but this one of the use-case we want to solve.
    m
    • 2
    • 6
  • l

    little-spring-72943

    07/12/2022, 1:12 PM
    I am trying to use Python Emitter to emit databricks notebook details to Datahub. Any Examples?
    c
    b
    • 3
    • 15
  • s

    steep-carpet-52398

    07/12/2022, 2:34 PM
    Hi, Im on RHEL OS and Im getting some errors when I try to deploy datahub with quickstart. Im using latest datahub version. Errors: - kafka-setup is still running - schema-registry is not running - broker is not running - datahub-frontend-react is running but not healthy - datahub-gms is still starting - elasticsearch-setup is still running - elasticsearch is running but not healthy HELP! Anyone knows how can I fix it? Thanksss.
    s
    • 2
    • 1
  • f

    few-rainbow-57094

    07/12/2022, 7:21 PM
    Good day people! I have more questions 😄 With the bigquery-usage feature, is it possible to get the most used datasets/tables?
    c
    • 2
    • 1
  • a

    adamant-mouse-7290

    07/13/2022, 11:47 AM
    Hey guys, is there a way to restrict access to specific assets to users/groups. Thanks in advance.
    b
    • 2
    • 10
  • b

    blue-crowd-84759

    07/13/2022, 3:39 PM
    I’m trying to load run_results information from dbt using the run_results.json, but my validation tab leads to a blank page (essentially an about:blank), am I missing something here?
    b
    • 2
    • 3
  • w

    wooden-jelly-68313

    07/13/2022, 3:46 PM
    For anyone else having the error with building psutil while installing the datahub cli I found that the python you're using matters (not just the version) - switching to the python from python.org solved the problem (my default python3 was coming from a MinGW installation, causing the issue)
  • f

    faint-television-78785

    07/13/2022, 3:59 PM
    re: custom Actions, the docs say this about installing a custom action:
    The easiest way to do this is to just place it in the same directory as your configuration file, in which case the module name is the same as the file name - in this case it will be
    custom_action
    .
    rn my file is
    my_action.py
    , in the same dir as my
    my_action.yaml
    config file. however, having
    type: my_action:ActionClassInside
    doesn’t work. what should the correct format be?
    😀 1
    c
    • 2
    • 2
  • s

    sparse-raincoat-42898

    07/13/2022, 5:18 PM
    Hi All, I have CSV files in SFTP and Azure blob store. How should I configure the profiling? It appears that there is only an option for local and S3 ( data-lake-files) but not for SFTP or Azure blob storage. Is there a recommendation?
    c
    • 2
    • 3
  • b

    busy-airport-23391

    07/13/2022, 6:31 PM
    Is it possible to send metadataChangeProposals via the java kafka emitter to topic
    MetadataChangeEvent_v4
    instead of
    MetadataChangeProposal_v1
    ?
    i
    c
    • 3
    • 11
  • w

    wooden-jelly-68313

    07/14/2022, 12:44 PM
    how do you run the quickstart when behind a company proxy? I'm getting a SSL self signed certificate error when running
    datahub docker quickstart
    i
    b
    • 3
    • 4
  • b

    bland-orange-13353

    07/14/2022, 5:42 PM
    This message was deleted.
  • i

    incalculable-football-44453

    07/14/2022, 6:57 PM
    I know this question regarding use of OpenSearch(1.2) as an alternative to a hosted ElasticSearch have been asked before. But it doesn’t fully work for us -
    Copy code
    ....
    elasticsearchSetupJob:
          enabled: true
          image:
            repository: linkedin/datahub-elasticsearch-setup
            tag: "v0.8.40"
          extraEnvs:
            - name: USE_AWS_ELASTICSEARCH
              value: "true"    
    ....
    it triggers a job that finishes successfully The data ingestion works fine updating ES index. But analytics fail with
    Copy code
    type	:	illegal_argument_exception
    reason	:	Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.
    now I can successfully send POST to -
    Copy code
    <https://opensearch-domain-endpoint:443/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>
    Because of above GMS k8s pod is getting evicted periodically. Which affects all datahub setup. Any suggestions?
    b
    • 2
    • 2
  • b

    bitter-insurance-49151

    07/15/2022, 11:46 AM
    Copy code
    elasticsearch is up-to-date
    zookeeper is up-to-date
    Starting mysql ... 
    broker is up-to-date
    Starting mysql               ... done
    schema-registry is up-to-date
    Starting kafka-setup         ... 
    datahub-gms is up-to-date
    mysql-setup is up-to-date
    Starting elasticsearch-setup ... done
    Starting kafka-setup         ... done
    ..............
    Unable to run quickstart - the following issues were detected:
    - datahub-gms is still starting
    - mysql-setup is still running
    - mysql is not running
    b
    • 2
    • 3
  • b

    bitter-insurance-49151

    07/15/2022, 11:46 AM
    sos
  • g

    gorgeous-library-38151

    07/16/2022, 7:57 AM
    How can I use the function of modeling schemaMetadata and profiling for s3 objects without ingesting it into datahub (without doing 'datahub ingest -c'). I found the ability to extract schemaMetadata and profile on each column very helpful, but want to manage them myself.
    c
    • 2
    • 3
  • l

    lemon-zoo-63387

    07/18/2022, 2:49 AM
    Hi,everyone,I installed the datahub with python3. In which directory do I save docker_compose files at startup? Thanks in advance for your help. https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/cli/docker.py
    m
    • 2
    • 5
  • b

    bright-cpu-56427

    07/18/2022, 8:47 AM
    hi guys! What is the difference between inherits and contains in glossary ?
    m
    b
    • 3
    • 5
  • h

    helpful-librarian-40144

    07/19/2022, 7:27 AM
    hi everyone, what's the default Grafana user/passwd for minitor installation
    b
    • 2
    • 2
  • s

    square-hair-99480

    07/19/2022, 7:59 AM
    Hey friends, anyone using Key Pair Authentication for Snowflake ingestion? I have my initial Datahub deployed in an AWS EC2 using docker-compose but I am not sure where should I store the public key in the EC2. Should I create a specific volume, and for which service/container
    datahub-gms
    ?
  • e

    elegant-article-21703

    07/19/2022, 11:10 AM
    Hi everyone! We are on process to upload the OpenAPI to Microsoft API Manager and, our colleages on the team has asked us to upload the JSON export of the Swagger UI, is possible to access the JSON format somehow? Thanks in advance!
    m
    • 2
    • 2
  • l

    lemon-engine-23512

    07/19/2022, 4:03 PM
    Hello everyone. Am trying to deploy datahub on aws kubernetes . I am new to kubernetes as well. Is there a way we can deploy datahub as a service? Rather than using kubectl to run helm?
    l
    q
    • 3
    • 11
  • c

    cool-dinner-21847

    07/19/2022, 5:53 PM
    Hello Everyone, Is there any document available for Role Based Access In Datahub? I tried Applying some Privileges to a Policy but when I am logging in back I can see all the tabs. FYI I have applied 'View Analytics' Privileges only and I can see other tabs too.
    b
    • 2
    • 1
  • a

    ancient-processor-65867

    07/19/2022, 5:54 PM
    hi everyone. Anyone using DataHub with Data Mesh? Was wondering what DataHub Entity type you use to catalog a data product?
    l
    • 2
    • 1
  • s

    salmon-angle-92685

    07/20/2022, 8:18 AM
    Hello guys, I've set the Glossary Terms via file ingestion. Then, I had to change the name of some of the glossary terms and, instead of reingesting the glossaries via the yaml file, I've renamed it directly via the UI. However, the URN stayed the same name as before. How should I correct this ? Thanks !
    b
    • 2
    • 12
  • q

    quaint-kite-80251

    07/20/2022, 8:56 AM
    I have deployed datahub from frontend and gms containers and i see after sometime both the containers are going in unhealthy state and gms container is going into exited state as well. when i checked the log of datahub frontend container, i see below exception error as below java.net.UnknownHostException: broker at java.net.InetAddress.getAllByName0(InetAddress.java:1282) at java.net.InetAddress.getAllByName(InetAddress.java:1194) at java.net.InetAddress.getAllByName(InetAddress.java:1128) at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:104) at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:403) at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:363) at org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:151) at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:943) at org.apache.kafka.clients.NetworkClient.access$600(NetworkClient.java:68) at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1114) at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1005) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:537) at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:331) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:238) at java.lang.Thread.run(Thread.java:748) 073519 [kafka-producer-network-thread | datahub-frontend] WARN o.apache.kafka.clients.NetworkClient - [Producer clientId=datahub-frontend] Error connecting to node broker:29092 (id: 1 rack: null)
  • f

    fancy-bear-96983

    07/20/2022, 9:10 AM
    My team is evaluating options for Data Lineage and Catalog solutions. We have done a POC with Apache Atlas. And want to compare this with DataHub which provides a lot more connectors.(Particularly interested in RDBMS(Postgres and MySQL)) I am curious does DataHub support real time updates for Postgres or is it based on batch jobs.
    m
    • 2
    • 1
  • l

    lemon-engine-23512

    07/20/2022, 9:42 AM
    Hello everyone. For datahub on aws production deployment, did everyone steps as in document or created any images or cdk for it
1...343536...80Latest