For most of Time Series Audit data Time Criteria is the basi Apache Pinot #general

For most of Time Series /Audit data, Time Criteria...

Vengatesh Babu

05/12/2021, 6:50 PM

For most of Time Series /Audit data, Time Criteria is the basic one. (E.g) For one-year data, segments created on daily basis will have 365 segments per year. Even for queries that access only last month, last week data will be scheduled to scan all segments including unnecessary ones. is it possible to maintain min/max values of the primary time column in table Meta ?. maintaining time column meta will help broker side segment pruning similar to partition.

Mayank

05/12/2021, 6:53 PM

Pinot already does that and prunes segments based on min-max time stamp in the segment metadata.

Vengatesh Babu

05/12/2021, 6:58 PM

so query which accesses last week data(7 segments) will be scheduled to scan only 7 segments ?. Does segment pruning happen at the broker level itself or at server level?

Mayank

05/12/2021, 7:01 PM

We have some pruning that happens at broker and other server level

Mayank

05/12/2021, 7:02 PM

Yes, only 7 days of segment will be processed. also Pinot has sorted and inv index that can be used to further avoid scanning all data inside these 7 segments

Vengatesh Babu

05/12/2021, 7:12 PM

1. Based on my understanding from documentation, partitioning helps segment pruning at the broker level itself. 2. For last week's data query, all 365 segments will be scheduled in the broker and only 7 segments will be processed in the server remaining segments will be pruned in the server based on segment metadata. 3. My suggestions is to handle main time-column criteria similar to partition column criteria. i.e pruning ar broker level to avoid unnecessary scheduling to avoid cpu wastage.

Vengatesh Babu

05/12/2021, 7:14 PM

please let me know if my understanding is wrong

Mayank

05/12/2021, 7:19 PM

Yes we have optimized these based on real production use cases. There is always a balance, eye broker needs to read metadata from zk, or cache it, so that is the overhead. but these are optimizations we consider at thousands of qps and millisecond latency. Is your usecase in that range? If not then you might be over optimizing?

Vengatesh Babu

05/12/2021, 7:24 PM

ok fine. For now, we expect 500 qps only and with sub 100 ms latency. we will test and let you know if any issue due to overscheduling.

Mayank

05/12/2021, 7:25 PM

Yeah, server level pruning + partitioning + sorting + inv index + replica group will give you much better than that.

👍 1

Open in Slack

Previous Next