Hi - from reading previous threads, I understand D...
# ingestion
c
Hi - from reading previous threads, I understand DataHub requires Kafka and that there are plans to support other pub-sub systems in the future as well. We're currently running POCs with various open source data discovery tools, and DataHub is definitely a great candidate. But... my company is using the Google Cloud Platform and wants to use as much as possible the "standard" GCP components. Our engineers are reluctant to choose a solution that needs Kafka, and would much prefer to work with Google Cloud Pub/Sub. Are there other people who are facing the same challenge? Are there plans to support Google Cloud Pub/Sub specifically? Any idea of what it would take to contribute Pub/Sub support the project ourselves?
1
m
Hi Maurice, first of all thanks for considering DataHub! We haven't heard this being a show stopper for other companies since there is often a managed Kafka instance that is provided on the major cloud providers. DataHub's dependency on Kafka can of course be abstracted away and replaced by another "stream system", but we haven't done the estimation of that effort. Would that be very important for your evaluation?
c
Hi Shirshanka - thanks for the quick reply! Indeed the ability to use “standard GCP components” is an important aspect in our evaluation. My company does not have a managed Kafka instance that we can use, and our IT Security isn’t very fond of SaaS (such as Confluent.io, Aiven.io). Basically, this means running our own Kafka cluster for the purpose of a data discovery alone, if we want to use DataHub today. As mentioned previously, the engineers that I work with to evaluate the various data discovery tools are reluctant to run a Kafka cluster. Unfortunately, this means that for us DataHub is off the table for now; at least until I can convince either or IT Security to allow SaaS for Kafka, or convince the engineers to contribute to the project by developing Pub/Sub messaging ourselves.