Hello community! I'm making an internal modificati...
# all-things-deployment
i
Hello community! I'm making an internal modification to the DataHub chart to enable HPA for standalone consumers, GMS and Frontend. I saw that the GMS can use two caching technologies but one of them is only used when there is more than one replica. Can I use hazelcast even though I only have a single GMS replica? Is there any other problem that HPA can bring about by autoscaling the components? Everything going well, I can later make this contribution to the community.
b
Nice! Hazelcast can be used with one replica, we just didn’t want to cause confusion as well as deal with the docker-compose settings for quickstart
Note that the parallelization limit for the consumers is equal to the # of partitions in the kafka topics. So scaling replicas of the consumers will not work unless the # replicas <= # of topic partitions
I think particularly for the consumers, the challenge for HPA will be that the operations are typically not memory or cpu bound but i/o bound either in reading or writing.
Please share what you find out, I think GMS can benefit the most from HPA
i
Thanks for the feedback @brainy-tent-14503. We are thinking for the frontend to do a scaling based on ingress traffic and for gms based on resources. As for consumers, we are evaluating exporting kafka metrics through prometheus and create a custom metric based on topic lag. With this, we should be able to scale consumers at times such as high parallelism of ingestions or execution of index restoration job
I'm going to develop here internally, and everything working as expected (passing the stress tests) I intend to make a contribution to the community chart.