Hi is it possible to create an empty dataset and then afterw DataHub #getting-started

Hi, is it possible to create an "empty" dataset, a...

dazzling-alarm-64985

10/12/2022, 6:18 AM

Hi, is it possible to create an "empty" dataset, and then afterwards use kafka schema-registry ingestion to add a schema to this dataset?

dazzling-alarm-64985

10/12/2022, 6:24 AM

My use case is that i want to "create" a dataproduct before there is any actual data or schema in schema-registry. In this creation process i will add topic, acl rules, create certificates for the users and most importantly i want to add business and technical metadata to the dataproduct. Im trying to avoid to have this being a multi stage process that requires human interventions

better-orange-49102

10/12/2022, 6:28 AM

you can pre-create an empty dataset, but you need to know which urn the ingestion will map to (you cant force the ingestion process to use a custom URN.. unless you want to tweak the code). there is a particular pattern to the urn, though, once you understand how its formed

dazzling-alarm-64985

10/12/2022, 7:37 AM

@better-orange-49102 ok thanks 🙂

famous-florist-7218

10/12/2022, 7:54 AM

I can confirm that it probably works. Like @better-orange-49102 said above, I can push any “empty” entities even the whole pipeline from the emitter. Then the ingestion job will enrich them with the metadata.

dazzling-alarm-64985

10/12/2022, 8:01 AM

Im having a bit of struggle deciding whats the best course of action regarding metadata ingestion. My first thought was using existing CI/CD-pipelines and just having a yaml-file in git, first i felt like this was a very good IAC-kind of solution but this weekend i felt that it did not make the metadata dynamic or flexible. A metadata change would require a git commit. To mitigate this i thought that users should be able to enrich or change metadata in datahub afterwards but this would make us a situation where the git metadata would go out of date very fast. Bleh.

dazzling-alarm-64985

10/12/2022, 8:02 AM

And it would make the ingestion process very complex as it would need to merge content between git yaml file and existing metadata in datahub since users would probably change on both ends ..... 🙂

ambitious-magazine-36012

10/12/2022, 3:17 PM

For the empty dataset creation, Do you need to define a custom source for this or just use existing sources with an empty table?

ambitious-magazine-36012

10/12/2022, 3:18 PM

Is there a way to create a dataset with an API?

Open in Slack

Previous Next