https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • g

    great-fall-93268

    12/16/2022, 3:05 AM
    Hello everyone, I have installed datahub docker successfully and changed gms port to 58080 when quickstart. I have ingested some datasets and now I want use datahub delete command but how can I let datahub know I have changed the port. It seems that the command doesn't work because datahub still configured with 8080 port when using CLI. Could anyone please help me on this?
    b
    • 2
    • 2
  • r

    ripe-eye-60209

    12/16/2022, 2:09 PM
    Hello Team, is there any function in python sdk or way to remove existing datasets given a urn?
    b
    • 2
    • 1
  • f

    full-car-92006

    12/16/2022, 2:11 PM
    hello everyone! I’m starting to explore datahub, and from what I have seen, looks really nice! well done! I have created a simple project on dbt and run it into Snowflake. I did configuration on Datahub side, and I can see data on the product When I go on lineage, I can lineage between tables but nothing at column level (even if I can see columns on the chart, it doesn’t show a line between columns) Is there any limitation that I should read/be aware ? How can I debug this ?
    b
    • 2
    • 16
  • w

    worried-animal-81235

    12/16/2022, 10:08 PM
    I'm new to DataHub & trying to evaluate it as a potential solution for our growing metadata management requirements. From what I read so far, there are seems to be a lot of great features built into the DataHub. However, one of the things that I am not clear about is how DataHub related to Hive Metastore (HMS)? Can DataHub replace HMS? Or even better, does DataHub offer a HMS compatible interface so I can point my query engines such as AWS Athena, PrestoDB/Trino/Spark to DataHub to query my external tables (content data stored on as S3) for example--that way there is no need for us to maintain a separate HMS deployment and manually keep the metadata synced btw DataHub and HMS.
    m
    • 2
    • 2
  • a

    average-dinner-25106

    12/18/2022, 9:05 AM
    Hi, I have one question : How to search the documentation of glossary term? We made several glossary terms which are important in our company and their documentations explaining them in detail. To provide contextual search, we want to check whether words in documentation of a glossary term can be searched. Unfortunately, datahub ui can't provide this level. Is this true? If not, how to find using search engine? As far as I know, datahub uses elasticsearch, which may provide this.
    ✅ 1
    b
    e
    • 3
    • 7
  • f

    few-tent-85021

    12/18/2022, 6:12 PM
    Help. can i use open search and not elasticsearch for helm insytall? i know the docu https://datahubproject.io/docs/deploy/aws/#elasticsearch-service has some help for aws opensearch. what changes do i need to make to sure opensearch locally? ty 🎄
    lookaround 1
    o
    • 2
    • 1
  • f

    famous-kite-14798

    12/19/2022, 9:23 AM
    Hi, is there any update about Bigquery Column Level Lineage ? It is mentioned in this video that it will be Q4

    https://www.youtube.com/watch?v=FjkNySWkghY&t=2472s▾

    ✅ 1
    plus1 3
    a
    • 2
    • 1
  • s

    some-alligator-9844

    12/21/2022, 9:17 AM
    Hi, I am running datahub gms service on my local pointing to external resources but getting RestLiServiceException when hitting http://localhost:8000 Using below configurations and commands Environment Variables export DATAHUB_SERVER_TYPE="quickstart" export DATAHUB_TELEMETRY_ENABLED="true" export DATASET_ENABLE_SCSI="false" export EBEAN_DATASOURCE_USERNAME="datahub" export EBEAN_DATASOURCE_PASSWORD="datahub" export EBEAN_DATASOURCE_HOST="<HOST>:3306" export EBEAN_DATASOURCE_URL="jdbcmysql//<HOST>:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8&enabledTLSProtocols=TLSv1.2" export EBEAN_DATASOURCE_DRIVER="com.mysql.jdbc.Driver" export KAFKA_BOOTSTRAP_SERVER="<HOST>:29092" export KAFKA_SCHEMAREGISTRY_URL=http//&lt;HOST&gt;8081 export ELASTICSEARCH_HOST="<HOST>" export ELASTICSEARCH_PORT="9200" export NEO4J_HOST=http//&lt;HOST&gt;7474 export NEO4J_URI="bolt://<HOST>" export NEO4J_USERNAME="neo4j" export NEO4J_PASSWORD="datahub" export JAVA_OPTS="-Xms1g -Xmx1g" export GRAPH_SERVICE_IMPL="neo4j" export ENTITY_REGISTRY_CONFIG_PATH="/Users/pamahato/github/datahub-project/datahub/metadata-models/src/main/resources/entity-registry.yml" export ENTITY_SERVICE_ENABLE_RETENTION="true" export MAE_CONSUMER_ENABLED="true" export MCE_CONSUMER_ENABLED="true" export PE_CONSUMER_ENABLED="true" export UI_INGESTION_ENABLED="true" Build ./gradlew metadata servicewar:build Run ./gradlew metadata servicewar:run -Dlogback.debug=true Hitting from Postman {"exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:202)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:254)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:228)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:215)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:171)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(
    Datahub-gms-restli-exception.txt
    o
    • 2
    • 1
  • l

    lemon-appointment-96333

    12/21/2022, 10:05 AM
    Hi, I have a basic query about datahub: Imagine I have a dataset which is divided into 2 data stores. Latest data in Postgres and older data in S3 (with same schema). Is it possible to setup a datahub catalog to express the same? Or catalog can always point to 1 data store for one data set?
    👀 1
    a
    • 2
    • 2
  • m

    miniature-branch-33689

    12/21/2022, 4:27 PM
    hey guys. Is there a way to link together SageMaker Feature Groups and other lineage elements like Databricks tables? Can I ingest such relations? Can I push it from my code?
    h
    • 2
    • 2
  • c

    careful-nightfall-53251

    12/21/2022, 5:20 PM
    Hello can any one help me out how to deploy datahub in local machine?
    ✅ 1
    a
    t
    • 3
    • 9
  • t

    thankful-fireman-70616

    12/21/2022, 5:34 PM
    Hi all - I'm new to datahub .. and have a basic question .. i know that I can connect to databricks with datahub... however can configure great expectations on top of Databricks tables ? Is there a way? Any tutorials or related stuff
    ✅ 1
    a
    • 2
    • 5
  • b

    bumpy-pilot-52145

    12/21/2022, 9:39 PM
    Is there a way to access raw data backing up the datahub analytics page?
    b
    • 2
    • 1
  • p

    plain-cricket-83456

    12/22/2022, 2:00 AM
    I have a question, is there a timeout log out function or where can I set the duration of the session
    b
    • 2
    • 5
  • b

    breezy-shoe-41523

    12/22/2022, 3:15 AM
    Hi team! I used to update custom Properties of dataset with mcp emit but now i get token error because it requires an mcp emit per dataset and it makes too many mcps. so i want to batch update but i cannot find how. can you tell me how i can batch update mcps or another way to do this? Thanks!
    b
    • 2
    • 2
  • f

    freezing-account-90733

    12/22/2022, 6:06 AM
    Hi Team, Can you please provide a code snippet to delete urn through python sdk
    b
    g
    • 3
    • 9
  • k

    kind-dusk-91074

    12/23/2022, 6:34 AM
    Hello Team, I am new to datahub and I am trying to ingest metadata from MySQL but it is stuck at pending. Please any help?
    b
    • 2
    • 2
  • b

    bitter-translator-92563

    12/23/2022, 7:35 AM
    Hi guys. Just a short question on ingesting OpenAPI - it looks like there should be a relative path in "swagger_file" but not absolute. But we have many cases where the swagger file located in other place. Is there a way to specify the absolute path to the swagger file?
    b
    • 2
    • 2
  • p

    proud-policeman-19830

    12/23/2022, 5:30 PM
    Hey guys, would anyone have any snippets of how to set the deprecation state of a dataset from python (looking at Airflow here). Tried with adding a
    DatasetDeprecationClass
    aspect to a
    MetadataChangeEvent
    , but that results in Datahub blowing up (getting 500's when drilling down to the datasets in the ui)
    a
    b
    • 3
    • 7
  • g

    glamorous-tomato-92925

    12/25/2022, 12:54 PM
    Hi all- when I run python -m datahub docker quickstart I got an error
    Copy code
    Unable to run quickstart - the following issues were detected:
    - datahub-frontend-react is running but not healthy
    Copy code
    python  -m datahub version
    DataHub CLI version: 0.9.2
    Python version: 3.8.12 (default, Dec 25 2022, 12:32:27)
    [Clang 13.0.0 (clang-1300.0.27.3)]
    Copy code
    docker ps
    
    CONTAINER ID   IMAGE                                   COMMAND                  CREATED          STATUS                      PORTS                                        NAMES
    3dc05232d5c0   confluentinc/cp-schema-registry:7.2.0   "/etc/confluent/dock…"   49 minutes ago   Up 19 minutes               0.0.0.0:8081->8081/tcp                       schema-registry
    8f2a0107b619   acryldata/datahub-actions:head          "/bin/sh -c 'dockeri…"   49 minutes ago   Up 20 minutes                                                            datahub-datahub-actions-1
    50e4aecb8870   linkedin/datahub-frontend-react:head    "/bin/sh -c ./start.…"   49 minutes ago   Up 20 minutes (unhealthy)   0.0.0.0:9002->9002/tcp                       datahub-frontend-react
    291139c1c039   confluentinc/cp-kafka:7.2.0             "/etc/confluent/dock…"   49 minutes ago   Up 20 minutes               0.0.0.0:9092->9092/tcp                       broker
    2a9175da7b1f   linkedin/datahub-gms:head               "/bin/sh -c /datahub…"   49 minutes ago   Up 20 minutes (healthy)     0.0.0.0:8080->8080/tcp                       datahub-gms
    8144ad199d12   mariadb:10.5.8                          "docker-entrypoint.s…"   49 minutes ago   Up 20 minutes               0.0.0.0:3306->3306/tcp                       mysql
    710e6914aaf8   elasticsearch:7.9.3                     "/tini -- /usr/local…"   49 minutes ago   Up 20 minutes (healthy)     0.0.0.0:9200->9200/tcp, 9300/tcp             elasticsearch
    6583a5b77275   confluentinc/cp-zookeeper:7.2.0         "/etc/confluent/dock…"   49 minutes ago   Up 20 minutes               2888/tcp, 0.0.0.0:2181->2181/tcp, 3888/tcp   zookeeper
  • g

    glamorous-tomato-92925

    12/25/2022, 1:22 PM
    I restart the docker daemon . then rerun the
    python  -m datahub docker  quickstart
    got new error
    Copy code
    Unable to run quickstart - the following issues were detected:
    - kafka-setup is still running
    - datahub-gms is running but not healthy
  • g

    glamorous-tomato-92925

    12/25/2022, 1:27 PM
    docker logs kafka-setup
    got the error .
    Copy code
    [kafka-admin-client-thread | adminclient-1] INFO org.apache.kafka.clients.admin.internals.AdminMetadataManager - [AdminClient clientId=adminclient-1] Metadata update failed
    org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1671973739261, tries=1, nextAllowedTryMs=1671973739362) timed out at 1671973739262 after 1 attempt(s)
    Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
    [kafka-admin-client-thread | adminclient-1] WARN org.apache.kafka.clients.NetworkClient - [AdminClient clientId=adminclient-1] Connection to node -1 (broker/172.18.0.7:29092) could not be established. Broker may not be available.
    t
    p
    +2
    • 5
    • 12
  • a

    acoustic-ghost-64885

    12/26/2022, 4:45 AM
    how can i delete glosarry terms and term groups using api?
    b
    • 2
    • 1
  • f

    flaky-businessperson-14858

    12/27/2022, 1:43 AM
    👋 Hi everyone! How can I create Post using GraphQL?
    ✅ 1
    b
    b
    • 3
    • 15
  • a

    adorable-magazine-49274

    12/27/2022, 4:40 AM
    Hi Team, I'm James I'm trying to launch datahub in aws this time. When configuring Datahub instance types, can you give me some advice on which computing resource to focus on: memory, disk, or cpu?
    i
    • 2
    • 1
  • l

    lively-notebook-84369

    12/27/2022, 1:03 PM
    Hi all, I'm trying to use DataHub alongside Airflow and would like to know if it's possible to emit additional metadata regarding outlet datasets directly from Airflow / the Airflow DataHub plugin? So far I got it to register output datasets but only the URN and I'm not sure what the best way to go from here is regarding expanding the available metadata of the outputs (S3 dataset output, but hosted on MinIO).
    d
    b
    • 3
    • 4
  • b

    bitter-translator-92563

    12/28/2022, 8:14 PM
    Hi all. I'm investingating the possibilities of DataHub related to glossary and got few questions on it. I would appreciate your advise on it. 1. When I'm creating a certain Term group through UI and then trying to add some Terms in it through the ingesting - DataHub creates new Term Group instead of ingesting in one already exists. The urn of the created through UI Term group contains some uuid in it (for example "urnliglossaryTerm:05eed122-b332-45b9-bc68-4e3da85d59a2") meanwhile urn of the grouo created through ingesting consists of the name of the group ("urnliglossaryNode:NewGroupName"). Is this behaviour of DataHub is correct? Is there a way to fix it and be able to ingest groups without doubling them? 2. Another issue I found is similar to the first one, but reltated to glossary terms. If I add a term through UI and then trying to add a term with the same name through ingesting - DataHub will make a "copy" of it with another urn (consist term name) instead of updating of existing one. Is there a way to fix it? 3. The question is about the Related Terms. When adding a Term through glossary ingesting I can specify terms that should be related to one that I'm creating. But the thing is - all the terms that I specify as related will not be visible in glossary in DataHub (the only place I can see them is in the "Related Terms" tab of the term I'm creating). I can't find these related terms through the search. I can't see these terms when making relations with table/column. So it seems I can't use those related terms in any way. But at the same time, if I make a term in glossary and then make it "related" to any other term in glossary through the UI - I will be able to see this related term in glossary, it will be available in search results, I will be able to "link" it with table/column. So it looks like creation of related terms throug ingesting makes it impossible to use. Is there a way to deal with this issue?
    plus1 2
    💥 1
    ✅ 1
    b
    • 2
    • 13
  • f

    freezing-account-90733

    12/28/2022, 11:25 PM
    Hi Team How to detect the dataset exist or not using python ask?
    b
    b
    • 3
    • 3
  • t

    thankful-fireman-70616

    12/25/2022, 6:08 AM
    Hello All - Probably a very basic question, I have recently ingestion some data from postgresql using UI Ingestion and I'm not seeing lineage, queries and validation fileds are enabled for me... Is there any specific configuration I need to enable?
    m
    • 2
    • 2
  • p

    plain-cricket-83456

    12/30/2022, 1:55 AM
    I wonder if there is a configuration item that can change the password limit for new users to register such as length characters, etc., and is there a concept of email authentication to send emails to register emails? If I know someone else's email address, and I sign up for datahub using someone else's email address, no one else can sign up using that email address
    b
    • 2
    • 1
1...505152...80Latest