Apache Pinot

I'm doing some profiling to understand what's the bottleneck of the current setup - which AFAICT is 'similar' to the one used by a client.

I don't know if this latency that I'm observing is something that I should expect given the current setup of if I'm misconfiguring something

It does not seem out of the ballpark, but there's definitely room for improvement

BTW, IMO, if you have your client's setup (data/query), I think that would be more helpful in profiling/optimizing

With this setup if I setup a harness client to send 10 queries concurrently I start seeing couple of seconds of latency. I was really surprised by that.

Fully agree with that.. I'm trying to get that.

Yeah, we have various optimizations for different cases. For example in your case of `sum() on all records`, star-tree index would work perfectly (as it will pre-compute some of the cubes).

<@U01CM89MP4G> FWIW, I've noticee the max GC pause to be 200ms in your settings. I've recently done some optimization around that by setting it to lower value like 20ms and that has improved query latencies. It depends on overall GC patterns though. If you want to confirm, analyze GC logs

<@U01CX5XB7EH> lets move the discussion here. What’s the environment you are running right now?

Does the new region have access to deep store?

If you want to be smart, you just need to copy zookeeper directory

And start same number of Pinot servers in the new cluster

And it will download the segments from gcs 

Would we need downtime to copy zookeeper disks?

Or is it possible to add nodes to the zookeeper cluster in the new region and then remove nodes in the old region?

<@UDQU92KBK> was saying we can add new pinot servers in the new region and remove old pinot servers once the new ones are replicated to. Does that involve tagging?

Yes, this ^^ will get you zero downtime, if that is important for you

Sounds good! Is that as simple as going to the zk explorer page and adding the tags?

Another migration question: our kafka cluster will be moving and offsets will be reset. How can we ensure that pinot keeps ingesting, is there a way to do this with no downtime?

How many tables do you have in your cluster?

We have ~15 tables, all hybrid except 2 that are realtime only

I will see if we can set the offsets on the new cluster.

Is it possible to update a realtime table def to change the kafka broker url with no issues?