I have a bunch of datasets (each representing tabu...
# advice-metadata-modeling
g
I have a bunch of datasets (each representing tabular data) that I'd like to make available in multiple formats (excel, json, csv, geojson, etc) to users. Is there a metadata standard for describing the links and formats they represent for such a dataset? If not, any suggestions for how to tailor things in DataHub will be much appreciated. Ideally, the metadata and links would be self-describing so that I could create a small programmatic client that could query DataHub for available datasets and then list available formats for download.
1
b
As the multiple formats are each a transformation of the original (tabular) dataset, I’d use Datahub’s lineage mechanism to keep track of the links
b
+1! In terms of how to store the download formats. I would recommend using some custom properties to do this (perhaps DatasetProperties.customProperties field). Your client would read this aspect, check a mapping of format type -> location where it is stored (e.g. S3) and then proceed to download