A complete solution for open data platforms, enterprise data catalogs, data lakes and data management. Open source, mature, fully-featured and production ready.

DataHub

<@UV0M2EB8Q> Is there a list of platforms supported? I ingested a mariadb using mysql connector. It worked. But it is showing mysql in datahub which I would like to correct to "mariadb". I was thinking adding `underlying_platform` as an option in mysql source. What would be the correct thing here "mariadb" or something else?

Is it present for `S3` ? Was thinking of allowing `s3` as `underlying_platform`  in the glue data source

I see s3 dataset is present at <https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:s3,datahubproject-demo-pipelines.entity_aspect_splits.all_entities,PROD)/schema|https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:s3,datahubpro[…]-pipelines.entity_aspect_splits.all_entities,PROD)/schema> without schema. But the s3 source is using glue <https://datahubproject.io/docs/metadata-ingestion/source_docs/s3> so how is it s3 then? Was this ingested some other way

<@U027B7R23J4> any luck figuring this out? trying to do the same

<@U02GSKURF33> what have you been trying to do?

<@UV0M2EB8Q> I'm also trying to ingest some data from s3 (mostly files to manually tag), and am wondering how the s3 datasets like <https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:s3,datahubproject-demo-pipelines.entity_aspect_splits.all_entities,PROD)/Schema?is_lineage_mode=false|this> are populated in the demo. When I ingest via glue crawler it shows up as "glue" data, so am wondering if the ingestion pattern used in the demo is different?

Got it… this S3 dataset gets populated thru lineage edges emitted by the Airflow tasks that read / write to it

In your case, do you have an s3 folder that a glue table points to?

i guess the desire is to surface to end users that the "dataset" they're looking at is an s3 dataset as opposed to a glue artifact (if possible)

makes sense! would it be okay to see lineage from the Glue dataset to the S3 dataset (<@U02GSKURF33>)?

might be okay - just trying to get a sense of what's possible for now. I could imagine us wanting to add our own custom lineage data (non-glue) in the future as well.

We def will be adding Glue -&gt; S3 auto-lineage-edge very shortly

adding custom lineage is quite easy… you just need to use the python sdk and emit away :slightly_smiling_face: