Hi, I'm in the process of evaluating datahub. I ha...
# getting-started
w
Hi, I'm in the process of evaluating datahub. I have looked at the existing architecture, and I understand that the ingestion framework supports a deployment where Kafka is not required as a dependency : "The Ingestion Framework is a modular, extensible Python library for extracting Metadata from external source systems (e.g. Snowflake, Looker, MySQL, Kafka), transforming it into DataHub's Metadata Model, and writing it into DataHub via either Kafka *or using the Metadata Store Rest APIs directly*" In the quickstart I see a docker kafka broker image being pulled - is there an option to not require kafka for a datahub deployment?
b
No. Kafka is used both as a means to bring ingestion data into datahub (optional), as well as serve as an internal messaging bus (compulsory)
w
and it uses confluent for all kafka related dependencies, including a schema registry, the actual broker and zookeeper, right?
b
Yup
w
thanks!
so when you say internal messaging bus - do you mean the communication between any of the relevant components - for example between the front end, the gms and the underlying data stores (mysql & elasticsearch)? Can you give me a basic example of what flows through that internal message bus?
b
@big-carpet-38439 I'm no good at explaining this haha
w
😛 I'm also sorry for the rather naive and ignorant question
m
@worried-motherboard-80036: I think we get this question about once a week at least, so we need to improve our documentation on this 🙂
if you look at the picture on this page: https://datahubproject.io/docs/architecture/metadata-serving
it should help you understand how datahub's serving tier uses Kafka
w
it's also my bad, I didn't dive enough into it before asking the question. I am coming after evaluating openmetadata, which has a much simpler design
I've managed to get a better understanding by looking at what topics exist, ingesting some sample table and then consuming from the topics to see what's "out there"
m
@worried-motherboard-80036: there is a recent documentation improvement that focuses on the events : https://datahubproject.io/docs/what/mxe
👍 1