<@UDRJ7G85T> and I were discussing my team modifyi...
# general
n
@User and I were discussing my team modifying the pinot server to include a
lazy
mode that would set it to lazily pull segments as they are requested using an LRU cache. It should just take some modification to the
SegmentDataManager
and maybe the table manager. This would allow using s3 as the primary storage, with pinot as the query/caching layer for long term historical tiers of data. Similar to the tiering example, you’d have a third set of lazy servers for reading data older than 2 weeks. This is explicitly to avoid large EBS volume costs for very large data sets. My main concern is this — a moderately sized dataset for us is 130GB a day. We have some that can be in the terra range per day. Using 500MB segments, you’re looking at ~260 segments a day. Maybe ~80k segments a year. In this case, broker pruning is very important because any segment query sent to the lazy server means materializing data from s3. This data is mainly time series, which means segments would be in time-bound chunks. Does Pinot broker prune segments by time? How is the broker managing segments? Does it just have an in-memory list of all segments for all tables? If so, metadata pruning will become a bottleneck for us on most queries. I’d like to see query time scale logarithmically with the size of the data. Other concerns for us are around data types. It does not seem Pinot supports data types we commonly use like uint64, fixedpoints, etc. It also doesn’t seem to support nested data structures. How difficult would this be to add? Java
BigInt
and
BigDecimal
could handle the former assuming we implemented metadata handling. Nested data types is a little more nuanced.
🙌 2
👍 3
k
Thanks for writing this up. Let’s create an issue and continue the discussion there. Others can jump in
n
Two separate issues, I suppose, on the data types
Unless custom data types are already supported?
k
oh we dont use apache jira
gituhub issues
n
Explains why there aren’t many haha
BigDecimal support is already there, we haven't added it as a DataType yet but you can use Bytes to represent BigDecimal
Is there an issue tracking adding BigDecimal as a data type?
k
no,
n
So, how is segment metadata currently handled? Is it stored in some kind of database? Is that database indexed in anyway, perhaps by table?
I found the pruning code, that seems to happen in memory over some list of segments. But I’m not sure where that list of segments would come from
k
It comes from Helix (via Zookeeper)
Broker maintains a watch and everytime a new segment it created, it updates the routing table
n
Ah. So there’s a routing table for each pinot table.
k
yes
n
So at most, you’re scouring through one table worth of segments. You said there was a time when a table had 1 million segments. How did that perform for each query? Are segments pruned by time block?
k
pruning would happen on server side
and since there are many servers, pruning would happen in parallel
n
Well, in this case you’d want to avoid that. Because you can’t prune without having the segment to look at. Which means all of the lazy segments would materialize
Also from a performance standpoint, having servers pruning 3 years worth of segments for a single query isn’t great.
k
yeah, lets discuss that on the issue. We might want to keep the segment metadata local or in memory
n
Especially if that query is
WHERE day = <x>
k
agree
n
Yeah. Worth discussing on the issue. Just wanted to make sure my understanding was correct before discussing formally there