Hi, Our company is very interested in integrating ...
# getting-started
g
Hi, Our company is very interested in integrating DataHub in our system. Is there anything required ? What is the sequentially steps in deploying DataHub ? How can we backup metadata ? If we can deploy by docker, what is the persistent part to backup metadata when docker restarts ? Can you share the experience in deploying ? We need a doccument how to do that and we really need your support. It would be nice , if we could contact in detail. Here is my email address: phamminhsyhcmus@gmail.com
b
Hi there! Welcome to DataHub. We'll follow up soon 🙂
e
Hi! We depend on a few storage layers: mysql (or other relational db), elasticsearch, neo4j, and kafka. If you have a version of these components running in your environment, it will make deployment easier. Once these storage layers have been deployed, you can deploy the datahub containers that uses these storage services to start up datahub. Please find more details https://datahubproject.io/docs/datahub-kubernetes
MySQL (or other relational db) is the source of truth for datahub. You can recreate other components from the local DB. The script to do so will be out soon. As such, you just need to backup mysql as of now.
For docker-compose, we have persistent volumes for all storage components. As such, destroying the containers will not destroy the underlying data unless you run ./nuke.sh or
datahub docker nuke
in which case all the volumes are destroyed as well.
The above document is a general doc on deploying datahub on an existing kubernetes cluster. https://datahubproject.io/docs/deploy/aws is a guide on how to quickstart on AWS in general. We will release a guide on starting on GCP by end of this week!
g
Once we are running DataHub, how can we update version if new one is released on git
e
We will do a helm version upgrade every time we release a new version! You can also set the image tags to the released version like v0.8.3
g
We’ve decided to deploy DataHub with Kubernetes. But we have some prolems: 1 . We have already Kafka, Elasticsearch available in our system. The thing is how to integrate these things to DataHub. How can we do that ? Can you give me an example ? 2. How can we backup mysql to get persistent data with Kubernetes? What is the steps to do so ?
b
It might be better to jump on a call here
e
For 1, check out this section in the AWS guide https://datahubproject.io/docs/deploy/aws/#use-aws-managed-services-for-the-storage-layer Ignoring the provisioning part, it shows you which part of the values.yaml to change!
g
I found this in datahub/datahub-kubernetes/datahub/values.yaml file. Are we going to use your mysql setup job configuration ? And it happens to lack of the name and mount path things in the volume mounts part that is exactly what we need to backup data. How can we change the configuration to do that ?
e
I don’t fully understand. This is just a setup job that runs once on upgrade. Where are you putting the backup logic?
g
Can you show me where i can put the backup logic if there is something i want to change ?
e
What is the backup logic you have in mind? Let’s see if it fits in any of our existing jobs
g
I want to set up where the bound persistent volume should be mounted within our MySQL Kubernetes container
It's just like i create a VolumePersistentClaim and then mount data to there in order to persistent data
e
ah interesting
this should be a setting in mysql container itself?
g
not really ! We want to setup somewhere to mount data to persistent it. Is there any way to do that ? Could you share we how you can backup data ?
e
For us, we use the s3 backup file in parquet and read it to restore. We don’t have support for mysql dumps yet, but we will work on it soon.
^ using AWS RDS as our db
😅 1
g
We don't have AWS available yet. How long does it take you to support for mysql dumps. Is there any solution for us right now ?
e
what kind of persistent store can you use? The dumps need to be stored somewhere outside kubernetes
g
Could it be be on file or MySQL on local ? Can you give me some suggestion ?
e
Ah so didn’t ask yet. Are you using kubernetes for deploys or docker-compose?
g
We are using Kubernetes for deploys
I didn't mention the setup job that runs once on upgrade but the mysql setup job. It contains the Volume Mount that i need