Hi guys, first I really like the product and reall...
# ingestion
f
Hi guys, first I really like the product and really considering suggesting this for our organization. I've been looking everywhere for an elegant solution such as DataHub. I am doing a small PoC and would like to see it loaded with our data. We have Oracle as our main source, and saw it it is not yet supported. As a work-around I would like to ingest some manual extracts, so from a file source. But I have difficulties understanding how to get the file in a shape that the ingestor can understand. I couldn't find anything in your documentation. E.g. We have some metadata already trapped in some tables that could be easily be exported in json or csv, or anything else. What are the required fields, data structure, etc to ingest this?
b
So currently the ingestion framework can be configured to ingest from files. However, the files must contain well-formed metadata that conforms to the MetadataChangeEvent schema. For your reference: https://datahubproject.io/docs/metadata-ingestion#file-file. This directory contains some files that can serve as examples. In your case, you'll have to produce files that contain MCE json objects and then configure the ingestion framework to ingest from the File as a source
g
Hi @faint-hair-91313 - let me just add an Oracle ingestion source for you
@faint-hair-91313 I've made a basic PR for this - would love if you could give it a try and let me know if it works, as I'm not too familiar with Oracle DBs. https://github.com/linkedin/datahub/pull/2347
f
Thanks @gray-shoe-75895, I cannot test this just yet, as I don't have a final integrated environment set-up.
@big-carpet-38439, thanks for coming back. I see, I think using the REST API sink is also a good choice. I have eventually found some documentation and tested out some things, but need to dive deeper into real examples on our own data to get it sorted out. My main pain is converting our definition of datasets into the required format.
b
I see - you'd likely need to construct a script to convert from the form you have to the DataHub model