Hi team :wave: I want to get the lineage between K...
# ingestion
s
Hi team 👋 I want to get the lineage between Kafka and Snowflake. I have Kafka connect sink connector to connect both platforms. I know that Kafka Sink connector are not implemented in DataHub yet. But, is there any option to implement it using custom emitter or something like that? Any suggestion? Thanks!!
g
Hey @salmon-area-51650 - are you referring to https://datahubproject.io/docs/metadata-ingestion/source_docs/kafka-connect/? Datahub does have a Kafka Connect source.
s
Not really. According to the documentation only source connectors and BigQuery sink connector are supported
Copy code
JDBC and Debezium source connectors
BigQuery sink connector
But, I have a Snowflake sink connector configured and I want to build the lineage between Kafka and Snowflake cc @green-football-43791
g
Ah I see @salmon-area-51650 - in that case, you may want to take a look at the YAML-Specified Dataset Lineage feature that was recently released!
this lets you write custom lineage edges and ingest them manually
Alternatively, you could use the python or java emitters to emit arbitrary metadata:
hope that helps!
s
Yeah! These two approaches definitively can help me. Thanks @green-football-43791! Just one more question. I can emit lineage with these approaches, but… How can I remove an existing lineage between two entities? Is that possible? Thanks!
g
Right now, the best way I would recommend removing an existing lineage would be emitting a new UpstreamLineage aspect without that element
thank you 1
p
@salmon-area-51650 - have you had any success on this endeavour? We'd be interested in the same functionality. I wonder if it would be useful to try to extend the existing kafka-connect source so it will also handle the Snowflake sink. In my eyes this would be the best solution. However, I am not sure where to create a PR for this. On the datahub project's git repo or on the acryldata one. It seems that the pypi package is provided by acryldata...
If we'd go for extending the existing kafka-connect source I'd suggest to solely evaluate the
snowflake.topic2table.map
. This would be much easier as a starting point instead of also implementing the topic→table renaming logic.
s
No I have not started to work on that yet.
I wonder if it would be useful to try to extend the existing kafka-connect source so it will also handle the Snowflake sink.
In my eyes this would be the best solution.
+1