https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • b

    bitter-dusk-52400

    05/05/2022, 8:45 AM
    Hi @better-orange-49102, @helpful-optician-78938 and @big-carpet-38439 and team, To save the task instance run details we can use DataProcessInstance entity. similarly, is there any possibility to save pipeline run details in datahub.
    d
    • 2
    • 2
  • g

    gentle-camera-33498

    05/06/2022, 12:52 AM
    Hello everyone! I created a DataPlataform entity for a Data Quality tool and created an Assertion entity with the dataPlataformInstance aspect relating this Assertion entity with the created DataPlataform entity. On the UI, the DataPlataform created appears and the count of related entities is right. But, when I click on the DataPlataform icon to get all instances, no one appears. Someone could tell me what I did wrong?
    h
    b
    • 3
    • 10
  • f

    fresh-monitor-41243

    05/06/2022, 4:53 PM
    Newbie! I did the docker quickstart yesterday and all seemed fine, though it said the datahub user did not have permission to ingest data through the UI and I was unable to get dummy data ingested via the File recipe. I closed down my docker containers for the night and started it back up today and the UI is no longer available and I get this in the frontend container logs:
    16:47:06 [application-akka.actor.default-dispatcher-5] INFO  o.a.kafka.common.utils.AppInfoParser - Kafka version: 2.3.0
    16:47:06 [application-akka.actor.default-dispatcher-5] INFO  o.a.kafka.common.utils.AppInfoParser - Kafka commitId: fc1aaa116b661c8a
    16:47:06 [application-akka.actor.default-dispatcher-5] INFO  o.a.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1651855626390
    16:47:07 [kafka-producer-network-thread | datahub-frontend] INFO  org.apache.kafka.clients.Metadata - [Producer clientId=datahub-frontend] Cluster ID: 9g92g2gkQ0CAzW82CbxqTA
    16:48:49 [application-akka.actor.default-dispatcher-9] ERROR application -
    ! @7nh6pkgij - Internal server error, for (GET) [/] ->
    play.api.UnexpectedException: Unexpected exception[NullPointerException: Null stream]
    at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:340)
    at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:263)
    at play.core.server.AkkaHttpServer$$anonfun$1.applyOrElse(AkkaHttpServer.scala:443)
    at play.core.server.AkkaHttpServer$$anonfun$1.applyOrElse(AkkaHttpServer.scala:441)
    at scala.concurrent.Future.$anonfun$recoverWith$1(Future.scala:417)
    at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
    at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
    at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:92)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
    at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:92)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:49)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    I nuked it, restarted the containers, same thing. Pruned my docker state and same thing. Not really sure b/c I’m JUST starting to look at this why it worked yesterday but not today! haha. Any thoughts?
    m
    o
    • 3
    • 3
  • h

    handsome-football-66174

    05/06/2022, 4:55 PM
    Hi Everyone, I am trying to test Datahub with Deequ. Is there a documentation for the integration ?
    l
    h
    m
    • 4
    • 6
  • f

    fresh-monitor-41243

    05/06/2022, 7:44 PM
    I’ve done some reading here in this channel and searched a bit through the Slack workspace so I know a bit more now than I did before but that is still very little. I’m trying to very quickly trial the docker quickstart version of dataHub and put in some dummy metadata just to see how our kind of weird data might look. I have a handy csv of metadata here that technically describes some data in AWS S3 (so maybe someday we set up Glue and move forward). I’m understanding now that the File ingest needs the output of the File sink to get the json in the right format (haaha, to put metadata around the metadata that I’m wanting to use to describe my data… this gets pretty meta fast eh?). Is there a quick and dirty way to manually create a json from my existing csv-transformed-into-a-json that will have the right “bindings” for ingestion? I see the example mce-files directory but I’m not sure where I can shove in my json tidbits to fake a suitable json.
    o
    h
    • 3
    • 6
  • b

    best-umbrella-24804

    05/09/2022, 3:16 AM
    Hello, I'm trying to google these things and find them in the documentation, but I'm having a lot of trouble. I was wondering how I can get emit metadata regarding, runs, queries and stats as below
    m
    • 2
    • 5
  • q

    quick-family-76114

    05/09/2022, 5:56 AM
    Hi Team I have one query Can we use Mysql 8 as Store in Datahub
    m
    • 2
    • 1
  • b

    brash-sundown-77702

    05/09/2022, 5:23 PM
    Hi Team, I am Satish from Dell. We are working on a project to process Dicom JSON metadata and programmatically create PDL schema files as we use Datahub to process the data. We use the mvn repo https://mvnrepository.com/artifact/com.linkedin.pegasus/data/27.7.18 to do the programmatic conversion to PDL schema. Sample dicom data can be found here https://dicom.nema.org/dicom/2013/output/chtml/part18/sect_F.4.html We are facing two issues. Issue 1: json keys starting with numeric string are throwing the following exception.
    Copy code
    Exception in thread "main" java.lang.IllegalArgumentException: 1,93: "00100030" is an invalid field name.
    
       	at com.linkedin.data.template.DataTemplateUtil.parseSchema(DataTemplateUtil.java:313)
    	at com.linkedin.data.template.DataTemplateUtil.parseSchema(DataTemplateUtil.java:291)
    Java code snippet used is
    Copy code
    SchemaToPdlEncoder schemaToPdlEncoder
    								= new SchemaToPdlEncoder(fileWriter);
    										RecordDataSchema recordDataSchema = ((RecordDataSchema) DataTemplateUtil.parseSchema("{\"type\":\"record\",\"name\":\"Adjusteddicompdl\",\"namespace\":\"resources.practice.dicom\",\"fields\":[{\"name\":\"00100030\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"record\",\"name\":\"Value\",\"fields\":[{\"name\":\"dummyKey\",\"type\":\"string\"}]}}}]}"));
    				                                                        schemaToPdlEncoder.encode(recordDataSchema);
    Here the key being used is "00100030" in String format. If the same key starts with an alphabet, say "d00100030", it works fine. Issue 2: We need a pdl schema(array) for the following dicom json data which contains Array of Strings for the key "Key"
    Copy code
    {
      "d00091002": {
        "Key": [
          "z0x9c8v7",
          "z0x9c8v8"
        ]
      }
    }
    As per the PDL schema documentation(https://linkedin.github.io/rest.li/pdl_schema), we have PDL schema only if the "Key" has array of "Key, Value" pairs like the below
    Copy code
    {
    		"d00091002": {
    			"Key": [{"dummyKey":"z0x9c8v7"},{"dummyKey":"z0x9c8v"}]
    		}
    	} whose corresponding PDL schema is
    Copy code
    record dicomInfo {
    	
    		  d00091002: array[record Value {
    		    dummyKey: string
    		  }]
    		}
    Please let us know if you need more details.
    o
    • 2
    • 6
  • g

    gentle-camera-33498

    05/09/2022, 7:29 PM
    Hello Guys, I read the GraphQL API docs, and I saw that the listing entities request is not possible yet. Could someone tell me when this feature will be available?
    a
    • 2
    • 5
  • c

    crooked-furniture-44524

    05/09/2022, 9:13 PM
    Hey folks, I am hoping to take a single SQL query and extract the tables and columns from it. For example, for the following query:
    Copy code
    SELECT tbl1.id 
    FROM tbl1 
    JOIN tbl2 on tbl1.join_id=tbl2.join_id
    return: • tables_used = [tbl1, tbl2] • columns_used = [tbl1.id, tbl1.join_id, tbl2.join_id] A trickier example would be when I do
    SELECT *
    , columns_used would instead need to be all of tbl1 and all of tbl2. Is this something that is possible with the datahub project? Thanks in advance for any tips!
    o
    b
    • 3
    • 4
  • s

    salmon-rose-54694

    05/10/2022, 2:37 AM
    hi, i find managerUrn in code but how can i view user’s manager in UI?
    l
    • 2
    • 1
  • s

    sparse-raincoat-42898

    05/10/2022, 3:04 AM
    Hello, I am trying to follow the quick start guide but getting error "*Docker doesn't seem to be running. Did you start it?*" I confirmed my docker is running.
    b
    • 2
    • 4
  • a

    astonishing-dusk-99990

    05/10/2022, 2:24 PM
    Hi All, I'm currently having a trouble to ingest data from dbt to datahub. However I'm following installation in here https://datahubproject.io/docs/cli and then I'm having three files for ingest data for dbt which I'm following in this documentation https://datahubproject.io/docs/generated/ingestion/sources/dbt. But after I ran the ingestion, the datahub got error and it didn't detect the file path location. However I put exactly the same location as same in the config. Can anyone help me?
    l
    • 2
    • 2
  • w

    wonderful-smartphone-35332

    05/10/2022, 3:27 PM
    Hi All, I have followed the guide to do the quickstart datahub helm deployment - i have needed to switch a few things (our k8s clusters require all resources are explicitly defined) - but beyond that it is working. I also am using all of the confluent platform charts - have switched the broker urls as needed. Trying to ingest some data - but it looks like my datahub user has limited access - any thoughts on what I could do?
    Failed to create ingestion source!: Unauthorized to perform this action. Please contact your DataHub administrator.
    I also don't have permission to go to my user page? Thanks in advanced :)
    • 1
    • 1
  • a

    adorable-receptionist-20059

    05/10/2022, 10:44 PM
    Can we run DataHub without Kafka? i.e. just with use of the GMS api? Any downsides to this?
    👍 1
    b
    m
    • 3
    • 4
  • s

    salmon-rose-54694

    05/11/2022, 8:05 AM
    I see groups under CorpGroupInfoClass, does this work?
    Copy code
    CorpGroupInfoClass(
                                email=email,
                                admins=owners,
                                members=members,
                                groups=[],
                            )
    o
    • 2
    • 1
  • h

    handsome-stone-44066

    05/11/2022, 11:09 AM
    hello everyone. I get a problem that datahub’s plugin mysql disabled. I run this command:
    Copy code
    python3 -m pip install 'acryl-datahub[mysql]'
    it doesn’t work.
    h
    • 2
    • 4
  • w

    wonderful-egg-79350

    05/12/2022, 4:46 AM
    Hello everyone. Where is configuration file of DataHub to change DB path? I want to know docker container name and path. I deployed DataHub using docker container. The reason why I make a question is that I try to change database(metadata ingestion db) from mysql(original DataHub database) to MSSQL(another server DB)
    b
    • 2
    • 1
  • g

    great-nest-9369

    05/12/2022, 7:04 AM
    Hi, Team. Does Datahub profiling support executing only new partitions each time ingestion is executed?
    d
    • 2
    • 2
  • h

    hallowed-analyst-96384

    05/12/2022, 7:08 AM
    Hi everyone, I successfully ingested the business glossary into datahub directly from our project CI/CD. The issue is: that I want these business glossary terms to be in our project documentation as well. What is the best way to achieve this?
    b
    • 2
    • 3
  • a

    astonishing-dusk-99990

    05/12/2022, 9:42 AM
    Hi All, I'm sorry I'm asking again, currently I'm installing and running datahub with
    datahub docker quickstart --quickstart-compose-file=docker-compose.quickstart.yml
    Can I change the credential of mysql in that .yml into mysql rds? Thank you
    b
    m
    • 3
    • 15
  • m

    most-plumber-32123

    05/12/2022, 10:04 AM
    Hi All am successfully ingest the datasets from the Snowflake into Datahub. but cannot able to see the lineage between the datasets. Its greyed out on the UI. will the lineage between datasets captured automatically?
    m
    d
    s
    • 4
    • 14
  • c

    chilly-gpu-46080

    05/12/2022, 11:35 AM
    Hi All, I’m trying out DataHub and was able to host it using Docker (completely vanilla). I was then able to ingest some SQL Server and MongoDB metadata using ingest via CLI. However, when I try to define ingestion in the UI, it is not able to make connection to MongoDB or SQL Server. Any ideas?
    b
    d
    b
    • 4
    • 24
  • m

    most-plumber-32123

    05/12/2022, 12:22 PM
    Hi all Have installed the Datahub via Docker. Want to grant access to the UI to my team as well to play around. when try to create a user from UI, couldn't see any options to create a new user rather can see a create new group.
    o
    • 2
    • 1
  • c

    chilly-gpu-46080

    05/13/2022, 5:20 AM
    Hi All, how does one add new users to DataHub?
    b
    • 2
    • 2
  • a

    astonishing-dusk-99990

    05/13/2022, 6:38 AM
    Hi All, I want to use google authentication for datahub and based on documentation you need to put this variable in
    docker/datahub-frontend/env/docker.env
    Copy code
    AUTH_OIDC_ENABLED=true
    AUTH_OIDC_CLIENT_ID=your-client-id
    AUTH_OIDC_CLIENT_SECRET=your-client-secret
    AUTH_OIDC_DISCOVERY_URI=<https://accounts.google.com/.well-known/openid-configuration>
    AUTH_OIDC_BASE_URL=your-datahub-url
    AUTH_OIDC_SCOPE="openid profile email"
    AUTH_OIDC_USER_NAME_CLAIM=email
    AUTH_OIDC_USER_NAME_CLAIM_REGEX=([^@]+)
    Question : Can I put this into docker-compose file? If can, which container should I put to? Thank you Note : I'm using
    datahub docker quickstart --quickstart-compose-file=docker-compose.quickstart.yml
    to run datahub
    • 1
    • 1
  • g

    great-cpu-72376

    05/13/2022, 1:51 PM
    Hi, I would like to install datahub to test the integration with several tool we have in our company. I see the documentation and there is the quickstart description, is described anywhere how to deploy using docker-compose file without datahub tool? We should and set several parameter in that files (i.e. networks, volumes, env variables etc). I downloaded the project from github and I switched to tag 0.8.34, in docker I see several docker-compose files, what should I consider as reference for a starting installation? Thanks!
    b
    • 2
    • 4
  • a

    ambitious-lizard-47888

    05/13/2022, 6:04 PM
    Hi Team, I installed datahub on my local vm using the installation instructions provided here https://datahubproject.io/docs/quickstart after that I am able to launch the front end, and all works. I did ingest the metadata using the sample scripts given it shows successfully ingested but the metadata is not appearing in the UI. I also tried to create a new Domain after saving it shows the domain created but again it's not persisting. am I missing anything here? for UI i am accessing it at http://localhost:9002/
    o
    e
    • 3
    • 19
  • s

    sticky-dawn-95000

    05/14/2022, 11:36 PM
    Hi Team, Is there a way to remove metadata related to all business glossaries using DataHub CLI commad "datahub delete ~~~"? If possible, how can I do that? Please, help me.. :)
    l
    i
    • 3
    • 3
  • g

    great-cpu-72376

    05/16/2022, 12:13 PM
    Hi, I am testing datahub and I have created two connection to two databases. These databases has the same name and have some tables that have names in common, I launched the ingestion execution with the profiling enables. After some hours the run has not finished yet but what it is not clear are the tables with name in common. For example, I search table A and I should found it two times but instead I found only one time, is it normale? Another question is it possible to find what is the ingestion that inserted the table in datahub? Because I cannot find it. Thanks.
    h
    • 2
    • 2
1...272829...80Latest