Hi all! I'm doing some experimentation with DataHu...
# all-things-deployment
s
Hi all! I'm doing some experimentation with DataHub. Is there any easy way to have helm chart only deploy datahub pods if the trying to get DataHub to run against hosted ElasticSearch, Postgres (or MySQL), and Kafka? For various reasons, I have to deploy via terraform so it might be easiest just to load the docker required docker containers into ECS using Terraform, setup ALB using Terraform, and then point everything at the right place instead of trying to use Helm. Feels a little bit off the beaten path so wanted to get a sense of whether this is sensible before I go down this route.
b
here is the documentation wih instruction to use aws managed service instead of deploying pre-requisite in kuberentes cluster using helm-chart https://datahubproject.io/docs/deploy/aws
these prerequisites need be pre-installed before datahub for it to work unless you are using aws managed services . Could you describe the pod for elasticsearch-master-1 and see what error it show under events .It could be that you are out of resources .
s
So I was able to get datahub up and running, but had to run the prerequisites and just let them fail (except for schema registry). Only other thing is that the upgrade job kept spinning up and getting oomkilled.
Copy code
Events:
  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  Unhealthy  75s (x6214 over 15h)  kubelet  Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s"
That's the elasticsearch output, but I was able to get elasticsearchSetup job to run and I'm using an external elasticsearch cluster, so not a problem.
I was even able to get datahub running against redpanda instead of kafka using the kafka bootstrap job, which was super cool.
The computer I'm using for testing has 500g of RAM (400g free) and 42 cores, so I doubt resources are the issue.
b
There are two ways you can deploy prerequisites (mysql, kafka, elasticsearch, neo4j(optional)) for datahub which are required dependency for datahub to work - 1. Kubernetes component - you can do so by running helm command
helm install prerequisites datahub/datahub-prerequisites
2. As AWS Managed service
you can use either of them , if you are using AWS Managed service for ElasticSearch, Postgres (or MySQL), and Kafka • you have to disable them in values.yaml for prerequisite https://github.com/acryldata/datahub-helm/blob/master/charts/prerequisites/values.yaml by changing
enabled: true
to
enabled: false
for each component except
schema-registry
• install pre-requisite using
helm install prerequisites datahub/datahub-prerequisites
which will install schema registry in kubernetes cluster • update values.yaml for datahub https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/values.yaml to point to these aws managed service - ElasticSearch, Postgres (or MySQL), and Kafka
s
The issue isn't configuring Datahub to talk to managed services, I got that working and can successfully log in. I need a clean way to disable all the pre-requisites, before I try deploying on production kubernetes cluster. I did the above for values.yaml, but it still tries to start the docker containers corresponding to to the pre-requisites.