bumpy-pharmacist-66525
04/26/2023, 11:00 AMemit_chart_mces
which will grab all of the datasets in Superset, and then filter out the non-virtual datasets
• Looker explore
(which seem to be equivalent to Superset virtual datasets) seems to be my best bet in terms of what to follow when creating support for ingesting virtual datasets
I can also think of some blockers/pain points:
• Physical datasets have a single underlying database id and table name, however, virtual datasets can have multiple (or none at all). Would the DatasetSnapshot
class allow me to create a dataset with multiple underlying tables/databases or with no underlying tables/databases?
• I'm not too sure how the lineage will work for virtual datasets, is it created automatically when you create a MetadataChangeEvent
? If not, can someone point me to an example of an ingestion source/line which does it?
• What is the minimum set of parameters I need to create an object of the DatasetSnapshot
class. I tried looking at other sources as examples but I couldn't find anything useful in regards to what the minimum/default set of parameters to send it is (and/or is there a list which states all of the possible parameters?)
Are there any assumptions I made which are not correct? What do you think about the blockers/pain points? Is there anything else you think I should know before I start making modifications to that ingestion source?bumpy-pharmacist-66525
04/26/2023, 11:02 AMastonishing-answer-96712
04/26/2023, 5:10 PMorange-night-91387
04/26/2023, 5:36 PM• What is the minimum set of parameters I need to create an object of theclass. I tried looking at other sources as examples but I couldn't find anything useful in regards to what the minimum/default set of parameters to send it is (and/or is there a list which states all of the possible parameters?)DatasetSnapshot
We've generally moved away from the Snapshot based approach towards MCPs which are aspect oriented. Each aspect has its own set of required properties. To create these you can utilize the MetadataChangeProposalWrapper class like this example in the redshift connector.
• Physical datasets have a single underlying database id and table name, however, virtual datasets can have multiple (or none at all). Would theclass allow me to create a dataset with multiple underlying tables/databases or with no underlying tables/databases?DatasetSnapshot
There are a few different choices for modeling here, from looking at what a virtual dataset is in Superset, this seems like it would be modeled as a dataset with lineage to the underlying physical datasets, but containers might also make sense if they're designed more as a logical collection of data assets within Superset. This might warrant further discussion with our ingestion team though.
• I'm not too sure how the lineage will work for virtual datasets, is it created automatically when you create aLineage is created through the UpstreamLineage aspect, examples of how this gets constructed are available in the ingestion code.? If not, can someone point me to an example of an ingestion source/line which does it?MetadataChangeEvent
bumpy-pharmacist-66525
04/27/2023, 11:41 AMgray-shoe-75895
04/27/2023, 10:51 PMbumpy-pharmacist-66525
04/28/2023, 11:40 AM