Hi all, we've recently ramped up traffic to our Da...
# all-things-deployment
j
Hi all, we've recently ramped up traffic to our Datahub deployment and are seeing some issues handling the load in several dimensions(ingestion from the kafka emitter has relatively high latency, UI is starting to get a bit laggy, and we sometimes get errors in the UI loading large lineage graphs). Before we start digging into the various components and where things are getting bottlenecked, I was curious to hear from the community on how they have scaled their deployments and what issues they had to work through as load increased. Any feedback/insight is appreciated!
b
Hi there! @orange-night-91387 can provide some insight about our production settings .. a couple questions: • how are you deploying kafka, elastic, and mysql? Are you using self-hosted or cloud-hosted instances of these services? • how many replicates of gms are you deploying? are you separating mae consumer job from gms pods? • how much memory are you giving the GMS pod(s)?
plus1 1
j
sorry for the delay, I didn't actually handle any of the setup for this, and I'm not that up to speed on our config so I had to track some things down first.
• how are you deploying kafka, elastic, and mysql? Are you using self-hosted or cloud-hosted instances of these services?
we are using cloud hosted instances within our internal kafka/search deployments on top of aws.
• how many replicates of gms are you deploying? are you separating mae consumer job from gms pods?
looks like we are running 10 replicas. I don't see any separate config for the mce job, I'll have to circle back on this one.
• how much memory are you giving the GMS pod(s)?
again I'm not totally clear where this is getting set, but best I can tell looks like these are set to 4gb(which sounds low to me, but I don't have a lot of context). Our internal lead on this is out on right now, so I'm playing a bit of catch-up
cc: @orange-night-91387