Hello
#general, I'd like to raise a question f
Can we use DataHub to store metadata other than store the metadata structure?
I'd like to give you some context to better understand my question:
My application process daily an high volume of inbound feeds for several customers. Each feed is transformed to a datamodel, processed to clean/compute/add some information and then stored in a parquet file on a given location (so far then we are working on Hadoop but this may change at some point).
Each of those feed i'd be interesting to store information like:
- the version of parser/processor/ecc which did generate it
- the version of datamodel used ( and if it's deprecated )
- the inbound feed which did generate it
- the date when it was generated
- the location where i can find the output/input feed
- the customer owning the inbound feed
Those are just few example of metadata i would like to attach to a given dataset and to store with the main purpose of search through them later on.
At the same time I would need to implement an ACL to restrict access to those metadata.
I'm currently analyzing DataHub solution to asses if it could satisfy those requirements.
I therefore clone the repository and tried some data ingestion.
I've played with
Rest.li to create some dataset.
My first impression is that DataHub is meanly meant to store, manage and search through metadata structure only but maybe I'm approaching this tool from the wrong point of view.
Given the usecase i described above can you suggest me if DataHub can fit my needs upon having implemented the appropriate extension to the actual model?