Apache Pinot #feat-tracing

Join Slack

Rajat Venkatesh

05/29/2025, 5:05 AM

Adding @Vivek Iyer Vaidyanathan @Praveen Kumar Chaganlal

Rajat Venkatesh

05/29/2025, 5:05 AM

Thanks @Xiang Fu

Rajat Venkatesh

06/05/2025, 5:08 AM

@Vivek Iyer Vaidyanathan @Praveen Kumar Chaganlal Can you review https://github.com/apache/pinot/pull/16004 ? Hopefully the first many clean up PRs. This one fixes a bug with GRPC queries as well.

Rajat Venkatesh

06/05/2025, 2:31 PM

@Vivek Iyer Vaidyanathan @Praveen Kumar Chaganlal next issue in the pipeline: https://github.com/apache/pinot/issues/16015 I have a draft PR

Vivek Iyer Vaidyanathan

06/09/2025, 3:55 AM

Hey Rajat. I was out on vacation last week. I'll get to these PRs and issues this week.

Rajat Venkatesh

06/09/2025, 4:02 AM

I've just created a parent issue: https://github.com/apache/pinot/issues/16039

Rajat Venkatesh

06/16/2025, 3:21 PM

@Vivek Iyer Vaidyanathan @Praveen Kumar Chaganlal I studied a few incidents at a customer when OOM Protection kicked in. I have pasted my summary below. I can walk you through the data if required. One callout is that the feature tracks allocatedBytes and not live objects size. IIUC your feature will need live objects size to work the best. • OOM protections was triggered at the right time. In all incidents: ◦ Logs mention that heap usage over the threshold. ◦ There are long GC pauses. ◦ However JVM Heap Memory Used & % metrics show much lesser usage. ▪︎ This is because the scraping frequency is 30s. • Was the right query chosen for kill ? ◦ Yes. ◦ Logs has a list of all know queries running on the server. ◦ The query with the largest allocation is chosen. ◦ However ▪︎ In one incident, the largest query used 1GB memory and total query usage was only 2.26GB ▪︎ In two incidents, the largest query reported 40GB (greater than max of 25GB) ▪︎ This is because the feature tracks allocated memory and not live objects size. • Drop in heap usage after query termination ◦ In 3 of 4 incidents there was a drop. ◦ In one of the incidents, there was no affect even though a 40GB query was killed.

Vivek Iyer Vaidyanathan

06/18/2025, 11:26 PM

@Rajat Venkatesh After https://github.com/apache/pinot/pull/16051 is merged, can we pause for a week on your side before making the rest of the cleanups? A number of interfaces and common classes are changing with my PR open - https://github.com/apache/pinot/pull/15798. So rebasing after every change is getting tricky. And also a number of interface functions are getting marked deprecated with us adding other parameters. So @Praveen Kumar Chaganlal and I will use the next week to rebase our PR on top of your cleanups.

Rajat Venkatesh

06/19/2025, 2:01 AM

Sure. I’ll open a PR to delete all the deprecated APIs after that. Clean up the code after that

Vivek Iyer Vaidyanathan

06/19/2025, 7:56 PM

Thank you! 🙏

Rajat Venkatesh

06/20/2025, 5:35 AM

The PR is merged. I'll pause merges that will affect you.

Rajat Venkatesh

06/21/2025, 9:37 AM

FYI: Brainstorming a different way to cancel queries. The motivation is that runner/worker thread architecture is not applicable to MSE. More discussion in https://github.com/apache/pinot/issues/16039 , I think we are making a mistake in trying to match the architecture of these two modules. Ideally, the watcher task's pseudo code should be:

Copy code

if (shouldKill) {
            _queryRunner.cancel(maxUsageTuple._queryId, /* maybe an extra parameter that this is a system cancel */);
          }

•

_queryRunner

has been injected during construction of the watcher task in

BaseServerStarter

. •

_queryRunner

maintains a

queryId -> FutureTask

map. •

PerQuery...Accountant

forces a

queryId -> Thread

map through many thread locals that have to be maintained and setup correctly rather than using the map that already exists. Instead the current architecture tries to store a handle to a root thread in a thread local. IMO the OOM design has painted itself into a bad design corner and is forcing us to make bad design calls. https://github.com/apache/pinot/pull/16142

Rajat Venkatesh

06/23/2025, 10:07 AM

I am cleaning up logs in this PR. It should affect you. PTAL: https://github.com/apache/pinot/pull/16172

Praveen Kumar Chaganlal

06/24/2025, 3:00 AM

@Rajat Venkatesh Can you PTAL at this PR : ~~https://github.com/apache/pinot/pull/16051~~, since Vivek got pulled into some other work, I went ahead and incorporated some of the suggestion you had, ready for another round of review https://github.com/apache/pinot/pull/15798

Rajat Venkatesh

06/26/2025, 4:55 AM

I made an initial pass on the PR. I'll go through it again.

Rajat Venkatesh

06/26/2025, 4:58 AM

@Vivek Iyer Vaidyanathan @Praveen Kumar Chaganlal Is

QueryAggregator

a replacement for the

WatcherTask

? Can you extract that out and merge it separately. After the thread-local, the embedded watcher task is the top-most pain and needs to be junked. One more required improvement in it is to support updates of some configs like critical and panic threshholds during runtime.

Vivek Iyer Vaidyanathan

06/26/2025, 5:09 PM

> Is

QueryAggregator

a replacement for the

WatcherTask

Not fully the case. The watcher task still exists in the main ResourceUsageAccountant. But how the aggregation is done differs between the

QueryAggregator

(this just has logic copied over from

PerQueryCPUMem....

) and

WorkloadAggregator

. Agree on the watcher task being the pain-point. But we should perhaps take up together after this code is merged. Thoughts?

Vivek Iyer Vaidyanathan

06/26/2025, 8:25 PM

@Rajat Venkatesh Would you be able to review the PR by Friday? We are shooting for merging on Friday as Linkedin has our July 4 shutdown coming up next week.

Rajat Venkatesh

06/27/2025, 3:30 AM

Sure. I’ll take a look today

Rajat Venkatesh

06/27/2025, 3:31 AM

I’ll work on redesigning the watcher task. It will definitely go in after your change.

Vivek Iyer Vaidyanathan

06/27/2025, 6:40 AM

Sounds good. Thank you! :)

Rajat Venkatesh

06/27/2025, 3:02 PM

I've reduced my ambitions quite a bit. For now I am focusing on making configuration editable. This change should NOT affect you as I have added a default function in the interface. Issue: https://github.com/apache/pinot/issues/16218 PR: https://github.com/apache/pinot/pull/16219

thanks 1

Vivek Iyer Vaidyanathan

07/07/2025, 6:44 PM

@Rajat Venkatesh I've addressed your review comments in this PR - https://github.com/apache/pinot/pull/15798. PTAL.

Rajat Venkatesh

07/08/2025, 10:45 AM

cc @Gonzalo Ortiz @Yash

Rajat Venkatesh

07/18/2025, 4:28 AM

@Praveen Kumar Chaganlal did you get a chance to ask the reason for the gc calls re: https://github.com/apache/pinot/pull/16374

Praveen Kumar Chaganlal

07/18/2025, 4:31 AM

I am not able to find the exact details on it, speaking to folks on the team, the concern at point was on accuracy. But it should be fine to go ahead with your change, since we are seeing large performance penalty with it

thanks 1

Rajat Venkatesh

07/22/2025, 5:33 AM

@Vivek Iyer Vaidyanathan @Praveen Kumar Chaganlal PTAL https://github.com/apache/pinot/pull/16360

👀 1