https://datahubproject.io logo
Join SlackCommunities
Powered by
# getting-started
  • d

    damp-minister-31834

    01/04/2022, 1:56 AM
    Hello all, when I start datahub via docker-compose, the error occurred, what's wrong and what should I do
    datahub-gms               | 01:49:43.948 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:69 - Ingesting default policy with urn urn:li:dataHubPolicy:1
    datahub-gms               | 01:49:44.068 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:69 - Ingesting default policy with urn urn:li:dataHubPolicy:2
    datahub-gms               | 01:49:44.151 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:69 - Ingesting default policy with urn urn:li:dataHubPolicy:3
    datahub-gms               | 01:49:45.316 [main] ERROR c.l.metadata.boot.BootstrapManager:32 - Caught exception while executing bootstrap step IngestPoliciesStep. Exiting...
    datahub-gms               | org.apache.kafka.common.errors.SerializationException: Error serializing Avro message
    datahub-gms               | Caused by: java.net.SocketException: Unexpected end of file from server
    datahub-gms               | 	at <http://sun.net|sun.net>.<http://www.http.HttpClient.parseHTTPHeader|www.http.HttpClient.parseHTTPHeader>(HttpClient.java:851)
    datahub-gms               | 	at <http://sun.net|sun.net>.<http://www.http.HttpClient.parseHTTP|www.http.HttpClient.parseHTTP>(HttpClient.java:678)
    datahub-gms               | 	at <http://sun.net|sun.net>.<http://www.http.HttpClient.parseHTTP|www.http.HttpClient.parseHTTP>(HttpClient.java:706)
    datahub-gms               | 	at <http://sun.net|sun.net>.<http://www.protocol.http.HttpURLConnection.getInputStream0|www.protocol.http.HttpURLConnection.getInputStream0>(HttpURLConnection.java:1593)
    datahub-gms               | 	at <http://sun.net|sun.net>.<http://www.protocol.http.HttpURLConnection.getInputStream|www.protocol.http.HttpURLConnection.getInputStream>(HttpURLConnection.java:1498)
    datahub-gms               | 	at <http://java.net|java.net>.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
    datahub-gms               | 	at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:272)
    datahub-gms               | 	at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:351)
    datahub-gms               | 	at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:494)
    datahub-gms               | 	at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:485)
    datahub-gms               | 	at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:458)
    datahub-gms               | 	at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:206)
    datahub-gms               | 	at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:268)
    datahub-gms               | 	at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:244)
    datahub-gms               | 	at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:74)
    datahub-gms               | 	at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:59)
    datahub-gms               | 	at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:62)
    datahub-gms               | 	at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:902)
    datahub-gms               | 	at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:862)
    datahub-gms               | 	at com.linkedin.metadata.dao.producer.EntityKafkaMetadataEventProducer.produceMetadataChangeLog(EntityKafkaMetadataEventProducer.java:142)
    datahub-gms               | 	at com.linkedin.metadata.entity.EntityService.produceMetadataChangeLog(EntityService.java:421)
    datahub-gms               | 	at com.linkedin.metadata.entity.EntityService.ingestProposal(EntityService.java:327)
    datahub-gms               | 	at com.linkedin.metadata.boot.steps.IngestPoliciesStep.ingestPolicy(IngestPoliciesStep.java:95)
    datahub-gms               | 	at com.linkedin.metadata.boot.steps.IngestPoliciesStep.execute(IngestPoliciesStep.java:70)
    datahub-gms               | 	at com.linkedin.metadata.boot.BootstrapManager.start(BootstrapManager.java:30)
    datahub-gms               | 	at com.linkedin.metadata.boot.BootstrapManagerApplicationListener.onApplicationEvent(BootstrapManagerApplicationListener.java:29)
    datahub-gms               | 	at com.linkedin.metadata.boot.BootstrapManagerApplicationListener.onApplicationEvent(BootstrapManagerApplicationListener.java:16)
    datahub-gms               | 	at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:172)
    datahub-gms               | 	at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:165)
    datahub-gms               | 	at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:139)
    datahub-gms               | 	at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:403)
    datahub-gms               | 	at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:360)
    datahub-gms               | 	at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:897)
    datahub-gms               | 	at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:553)
    datahub-gms               | 	at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:401)
    datahub-gms               | 	at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:292)
    datahub-gms               | 	at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:103)
    datahub-gms               | 	at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:921)
    datahub-gms               | 	at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:554)
    datahub-gms               | 	at org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:888)
    datahub-gms               | 	at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:357)
    datahub-gms               | 	at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1443)
    datahub-gms               | 	at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1407)
    datahub-gms               | 	at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:821)
    datahub-gms               | 	at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:276)
    datahub-gms               | 	at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:524)
    datahub-gms               | 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
    datahub-gms               | 	at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
    datahub-gms               | 	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
    datahub-gms               | 	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:106)
    datahub-gms               | 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
    datahub-gms               | 	at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
    datahub-gms               | 	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
    datahub-gms               | 	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:106)
    datahub-gms               | 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
    datahub-gms               | 	at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
    datahub-gms               | 	at org.eclipse.jetty.server.Server.start(Server.java:407)
    datahub-gms               | 	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:110)
    datahub-gms               | 	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:106)
    datahub-gms               | 	at org.eclipse.jetty.server.Server.doStart(Server.java:371)
    datahub-gms               | 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
    datahub-gms               | 	at org.eclipse.jetty.runner.Runner.run(Runner.java:520)
    datahub-gms               | 	at org.eclipse.jetty.runner.Runner.main(Runner.java:565)
    datahub-gms               | 2022/01/04 01:49:46 Command exited with error: exit status 1
    datahub-gms exited with code 1
    b
    l
    o
    • 4
    • 17
  • b

    breezy-camera-11182

    01/04/2022, 4:57 AM
    Hi, is it possible to deploy datahub without the MAE Consumer and MCE Consumer, since i only ingest the metadata using rest API. I saw those 2 components are optional but couldn’t find a way to disable it within helm chart https://github.com/acryldata/datahub-helm/tree/master/charts/datahub thanks in advance
    s
    • 2
    • 1
  • f

    few-air-56117

    01/04/2022, 1:26 PM
    Hi guys, how can i change the default password of datahub user?
    r
    • 2
    • 4
  • a

    adamant-sugar-28445

    01/04/2022, 1:57 PM
    Hi everyone. I want to ingest metada from airflow into datahub. In my airflow py. file, I gave inlets and outlets and configured the connection to datahub (as guided in https://datahubproject.io/docs/lineage/airflow). The inlets and outlets are about HDFS and they looked like outlets={"datasets": [Dataset("hdfs", "/general/project1/folder1/file1.parquet")]}. The problem is file1's schema didn't show up in the datahub UI, and I saw the whole path rather the file alone in the UI. Can anyone tell me what's the cause here?
    h
    • 2
    • 1
  • m

    miniature-eve-89383

    01/04/2022, 6:54 PM
    Hi, is there some kind of RBAC in Datahub? i.e. can we make sure that one user only has access to specific (meta)data?
    h
    l
    b
    • 4
    • 10
  • c

    calm-librarian-24047

    01/04/2022, 9:21 PM
    Hi, just started to play with datahub... was curious about trying to run the quickstart w podman versus docker on Rocky/Centos8 Linux. Getting an error that docker isn't running.
    m
    o
    • 3
    • 15
  • l

    loud-holiday-22352

    01/05/2022, 7:07 AM
    @gorgeous-machine-88582 Hello,when i run ‘datahub ingest -c ./recipe.yml’,then will get an error 【ValueError: This version of acryl-datahub requires GMS v0.8.0 or higher】,Is there something wrong with the installation? thank you.
    s
    • 2
    • 2
  • r

    rhythmic-kitchen-64860

    01/05/2022, 9:21 AM
    hi, is there some example of using mutation updatedataset?
    h
    b
    • 3
    • 4
  • l

    lemon-hydrogen-83671

    01/05/2022, 4:52 PM
    Hi folks, i'm trying to run
    datahub-gms
    with different topic names by setting the environment variables specified here: https://datahubproject.io/docs/how/kafka-config/#datahub-gms but when i try running the container i see the following stacktrace (see thread) Looks like its not honouring that config 😞 any suggestions?
    b
    • 2
    • 7
  • n

    nice-autumn-10105

    01/05/2022, 7:19 PM
    hello Just ingested some mssql souces with profiling turned on. I really like it. Now the magic will really start to happen if I can get upstream and downstream pipelines setup. But, we are a bit legacy ETL shop. SSIS packages. But we do have a lot of spark acting like ETL. We do not have Airflow. IS there a path for me getting the SSIS pipelines and pyspark pipelines imported?
    h
    m
    • 3
    • 5
  • w

    white-school-8018

    01/05/2022, 9:57 PM
    Hello, do we have integrations to fetch the metadata from segment
    <https://segment.com/>
    ?
    h
    b
    • 3
    • 2
  • w

    wide-helicopter-97009

    01/06/2022, 10:33 PM
    I got this error when I run ./gradlew build, is there any solution for this?
    h
    b
    • 3
    • 3
  • w

    white-school-8018

    01/07/2022, 5:04 AM
    Hello, How do we handle PII data with datahub ? is there specific process that we can follow to achieve the same ?
    b
    l
    • 3
    • 3
  • f

    few-air-56117

    01/07/2022, 11:51 AM
    Hi guys, how can i create a new user and set a password?
    s
    b
    l
    • 4
    • 9
  • b

    bland-translator-47564

    01/08/2022, 9:47 AM
    Hi folks! It's usually inside the data catalog that words in the name join without underscore. Example: "goodclients". I tried a search in the demo for "formation". The search result is empty. I expected that result has entities about "*in*formation". As I understand, the current search doesn't use n-grams, which could enable the whole word in search results. Also, n-grams could give a more relevant search by entity name. Have anybody thought about using n-grams for search?
    a
    m
    e
    • 4
    • 7
  • b

    billowy-lock-72499

    01/10/2022, 10:43 AM
    Hi i have a question regarding alerts and notification how do we(users) get notified if any changes are made in datahub
    m
    m
    l
    • 4
    • 6
  • w

    wide-helicopter-97009

    01/10/2022, 4:11 PM
    Hi Team, I got this error when running your quick start script, "./docker/quickstart.sh"
    q
    m
    • 3
    • 3
  • n

    nice-autumn-10105

    01/11/2022, 9:43 PM
    Hello, any advice how to document stored procedures that define the CRUD process for a given table? I have a table, with 5 stored procedures that control the data to that table. Best approach to show that in DataHub?
    m
    • 2
    • 2
  • b

    breezy-noon-83306

    01/12/2022, 1:30 PM
    Hello everybody ! I am starting a Data Governance Plan, what do you recommend me to do? What can do Datahub related to that ? THanks !!
    m
    • 2
    • 2
  • m

    miniature-television-17996

    01/13/2022, 10:26 AM
    Hello! i found UpstreamLineage https://github.com/linkedin/datahub/blob/97e966003710aba18f7a2ecf5af0686504359da5/[…]odels/src/main/pegasus/com/linkedin/dataset/UpstreamLineage.pdl i ingest two tables and now i would like to join them (not manually using json ) T1 -> Procedure1 -> T2
    e
    • 2
    • 11
  • c

    chilly-wolf-28627

    01/13/2022, 2:18 PM
    Hi, Quick question, I see DataHun supports Elastic for its Search Index and Graph Index. Does it also support OpenSearch and what are the plans going forward?
    q
    e
    b
    • 4
    • 7
  • d

    dazzling-appointment-34954

    01/17/2022, 3:58 PM
    Hey guys, I have a quick question regarding the glossary: Is there any way to create a more advanced documentation for glossary items? For now I understand we can only ingest glossary items and not create them in the UI. Anyway I think it is quite limiting that it is only possible to add a few words of text to a glossary item. Is additional information supposed to be entered through Properties? And what exactly is Schema for in this regards? Thanks in advance for some information on this! 🙂
    b
    l
    • 3
    • 3
  • a

    alert-beach-77662

    01/18/2022, 10:08 AM
    How to take a backup in datahub in case of some reinstallation?
    s
    b
    b
    • 4
    • 7
  • a

    alert-pager-8342

    01/18/2022, 9:44 PM
    Hi all! Just got started w/ DataHub and have all of the docker containers running, and successful run of the test data ingestion. I’m also able to ingest from my MySQL database to a file, but when trying to use the rest sink, I am getting an error. Error in thread
    l
    m
    • 3
    • 11
  • c

    careful-engine-38533

    01/20/2022, 7:40 AM
    I try to delete the dataset using the following command `datahub delete --env PROD --entity_type dataset --platform mongodb`but it fails with error java.lang.NullPointerException\n\tat com.linkedin.metadata.entity.ebean.EbeanEntityService.deleteUrn(EbeanEntityService.java:577)\n\tat com.linkedin.metadata.resources.entity.EntityResource.lambda$deleteEntity$13(EntityResource.java:313)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:30)\n\t... 81 more\n', 'message': 'java.lang.NullPointerException', 'status': 500} - any help?
    s
    b
    • 3
    • 4
  • b

    bitter-toddler-42943

    01/21/2022, 5:09 AM
    [2022-01-21 140625,551] ERROR {datahub.entrypoints:115} - File "/usr/local/lib/python3.6/site-packages/datahub/entrypoints.py", line 102, in main 99 def main(**kwargs): 100 # This wrapper prevents click from suppressing errors. 101 try: --> 102 sys.exit(datahub(standalone_mode=False, **kwargs)) 103 except click.exceptions.Abort: .................................................. kwargs = {} datahub = <Group datahub> click.exceptions.Abort = <class 'click.exceptions.Abort'> .................................................. File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1128, in call 1126 def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any: (...) --> 1128 return self.main(*args, **kwargs) ..................................................
    s
    b
    • 3
    • 12
  • f

    few-air-56117

    01/21/2022, 11:44 AM
    Hi, how can i update datahub version?
    s
    a
    • 3
    • 7
  • c

    crooked-market-47728

    01/21/2022, 4:32 PM
    Hi to everyone! I’m deploying DataHub to AWS, i have some questions with the official DataHub AWS setup guide. deploy type: Kubernetes (EKS) 1. Can i change default MySQL RDS to Aurora Postgres? I see part of the example says
    hostForMysqlClient: "<<rds-endpoint>>"
    can i put for Postgres? Of course change to 5432 port, driver, etc. I know is Postgres compatible with Docker deployment, but i didn’t see nothing respecting to Kubernetes examples 2. Elasticsearch Service: AWS now use OpenSearch, it’s very similar to Elasticsearch Service. (Last versión of Elasticsearch service is 7.10 and you can upgrade to OpenSearch) Can be compatible with DataHub or prefer to stay with 7.10 Elasticsearch Service? Thanks!
    s
    n
    • 3
    • 8
  • n

    nutritious-bird-77396

    01/21/2022, 4:50 PM
    We are thinking of using Datahub for some DataQuality Usecases there by pushing quality metrics to the event stream. As the metrics volume would be much higher than traditional metadata thinking if we should be using NoSQL instead of Mysql/PostgreSQL that we decided as a data store earlier. Anyone in the community tried a NoSQL store as a datastore for Datahub? If so, which NoSql store was implemented?
    plus1 1
    m
    • 2
    • 4
  • b

    breezy-controller-54597

    01/24/2022, 3:01 AM
    I would like to use "datahub put" to grant properties to an already extracted dataset, is there a page that explains the procedure?
    m
    b
    • 3
    • 10
1...181920...80Latest