I am doing `datahub docker quickstart` and it is p...
# getting-started
s
I am doing
datahub docker quickstart
and it is pulling a bunch of things
Copy code
Pulling elasticsearch          ... downloading (68.7%)
Pulling elasticsearch-setup    ... done
Pulling mysql                  ... pull complete
Pulling datahub-gms            ... pull complete
Pulling datahub-frontend-react ... done
Pulling mysql-setup            ... 
Pulling zookeeper              ... waiting
Pulling broker                 ... 
Pulling schema-registry        ... downloading (49.4%)
Pulling kafka-setup            ...
Are all of these hard dependencies of datahub?
The main concern is the mysql one. Can we instead use hosted solution like cloud sql on GCP?
I am hoping that metadata is stored in mysql and kafka, schema registry, elasticsearch is only for temporary data that can be re-created from the metadata?
h
Postgres databases work as well, and MariaDB as a sibling to MySQL. Not sure about other dialects. But no need to host the DB yourself, managed solutions are the scalable solution.
In principle the metadata can be recreated, but the info stored the MySQL database does keep track of versions of the metadata (which might or might not be of value to you).
Info in kafka is usually transient (depedens of course on how you configure it), and in ES there is only one version stored so it can be recreated easily.
s
I can disable it in the helm chart but then where do I configure the sql connection?
b
Glad you found it!
m
@square-activity-64562 have you found the specific guides for AWS and GCP at https://datahubproject.io/docs/deploy/aws/ and https://datahubproject.io/docs/deploy/gcp/ ?
s
Yes, I saw them. Thanks for mentioning them. I have the GKE cluster and I know how to create ingress in GKE. Other that it mainly links to https://datahubproject.io/docs/datahub-kubernetes. This doc needs updation to point to helm chart repo. Currently it is from the perspective of someone developing datahub which might not be the general case. I am assuming https://github.com/linkedin/datahub/blob/master/datahub-kubernetes/README.md is the source for this doc? I'll try and send a PR to fix this and add option of node selector in the helm chart in coming few days. Need this stuff to try this out for our use case. I probably went into overdrive. Will try to read the architecture and some more docs and then try to get things running in our GKE
e
We haven’t set up a job that automatically posts to helm.datahubproject.io which is why I haven’t updated the docs yet.
👍 1