A complete solution for open data platforms, enterprise data catalogs, data lakes and data management. Open source, mature, fully-featured and production ready.

DataHub

From what I can tell, datahub doesn't currently support openlineage ingestion for lineage. Is that true or is it in upcoming plans? I see datahub is part of the openlineage docs here: <https://openlineage.io/getting-started/>

<@U02GSKURF33> It is correct that we don't yet have any connectors for openlineage. Would be curious to learn what use cases you are trying to address. Often, just getting metadata out of tools is the hardest part as opposed to representing them in standard formats but we will build the adapters if the use-cases are compelling.

<@U01C3DMG2GL> we actually have an internal workflow engine (open source here: <https://github.com/insitro/redun>) that we use for all of our internal analysis pipelines. I want to write something to take the internal executions and get the s3 artifacts + metadata into datahub automatically. From the openlineage docs, we originally thought the openlineage --&gt; datahub was already done, so we only needed to implement redun --&gt; openlineage. Any suggestions? It seems like now the benefit of implementing the openlineage representation in the middle is if datahub implements its own openlineage ingestion, or we want to re-use the lineage for other tools that do.

it would be trivial to emit lineage edges using the DataHub python SDK. Using an intermediate representation will make sense only if there are multiple sinks which all support the intermediate format like you said.

The recipes for emitting lineage edges are here

<https://datahubproject.io/docs/lineage/sample_code/>

redun looks interesting btw - will check it out!

agreed, I was looking at those last night and it feels like the easiest way to go about this. BTW are there any plans to implement openlineage &lt;-&gt; datahub, or not at the moment? It's misleading that they have the datahub logo front and center, but appreciate that's obviously not under datahub's control.

How much effort is required for developing an openlineage integration connector (roughly)? Is there any tutorials/examples?