Hi I was interested in using datahub without connecting dire DataHub #getting-started

Hi, I was interested in using datahub without conn...

hallowed-market-13473

06/02/2023, 9:54 PM

Hi, I was interested in using datahub without connecting directly to the raw data, but instead providing the files containing the metadata itself - for instance, a JSON file containing all the tables in a database, as well as the schema for each table. Is there documentation on the recommended way of doing this? I was thinking of using JSONSchema as the datasource, but wasn’t sure if that was the best/recommended way.

hallowed-market-13473

06/02/2023, 9:55 PM

This is what it looks like right now using the two scripts below.

hallowed-market-13473

06/02/2023, 9:56 PM

test_schema.json

hallowed-market-13473

06/02/2023, 9:56 PM

test_ingest.py

modern-artist-55754

06/03/2023, 3:15 AM

You probably have to use both json schema + csv enricher if you want to have tags, terms, owner etc. Why dont you want to have datahub connect to source?

hallowed-market-13473

06/04/2023, 6:26 PM

Oh okay thanks, is there a good guide on how this is done? The main reason why I don’t want to connect it to the raw data itself is to avoid dealing with permissions related issues. At some point in the future we might switch to connecting to the data itself, but for now it would be easier to adopt if we could only provide metadata directly since that is already being maintained.

astonishing-answer-96712

06/06/2023, 6:28 PM

Hi- have you looked at the JSON schema source? https://datahubproject.io/docs/generated/ingestion/sources/json-schema/

hallowed-market-13473

06/07/2023, 5:29 PM

@astonishing-answer-96712 yeah, but it looks like it’s limited in its functionality (it can’t extract tags/ownership)

modern-artist-55754

06/08/2023, 12:28 PM

You have to do 2 passes. First pass, use the json schema to create the dataset, and then use csv-enricher to populate tags/owner etc. So you metadata need to be split into 2 files... At least that is how i see.

Open in Slack

Previous Next