Hey folks! We are looking into adopting DataHub as...
# ingestion
w
Hey folks! We are looking into adopting DataHub as our backend for a data catalog. Our data pipelines heavily rely on Kafka with Schema Registry as transport layer. DataHub amazingly displays auto-ingested topics with schemas that follow the
TopicNameStrategy
strategy (basically the schema name is generated from the topic name), however it lacks support for the other two (
RecordNameStrategy
,
TopicRecordNameStrategy
). Are there any plans to support these formats in the future? E.g. if anybody is working on this or not (we might be able to help on this if the answer is no). Thanks for the awesome work, so far we like the product a lot!
b
Hey Victor! Thanks for the details. How would you like the experience to look in the UI? It would still show topic + associated schema right?
w
Yes, the idea would be to use the same
Schema
tab for the topic like what we have now but parse additional formats/strategies. The
RecordNameStrategy
could be difficult as we would need to consume from the topic (or maybe there’s an easier way?)
m
Hi @witty-airline-46094 it seems like both
RecordNameStrategy
and
TopicRecordNameStrategy
require processing the data to figure out what schemas are attached to the topic
it would also mean that the Schema tab for the topic would have to indicate a plurality of Schemas that exist for this topic
from what I can tell, Confluent control center also doesn't support these strategies in the UI today (probably for the same reason?)
@witty-airline-46094 would love to collaborate with you on figuring out what a good implementation for this would look like
w
Sounds good
Let me talk to the team and get back to you