Noah Prince
10/26/2020, 1:47 PMlazy
mode that would set it to lazily pull segments as they are requested using an LRU cache. It should just take some modification to the SegmentDataManager
and maybe the table manager.
This would allow using s3 as the primary storage, with pinot as the query/caching layer for long term historical tiers of data. Similar to the tiering example, you’d have a third set of lazy servers for reading data older than 2 weeks. This is explicitly to avoid large EBS volume costs for very large data sets.
My main concern is this — a moderately sized dataset for us is 130GB a day. We have some that can be in the terra range per day. Using 500MB segments, you’re looking at ~260 segments a day. Maybe ~80k segments a year. In this case, broker pruning is very important because any segment query sent to the lazy server means materializing data from s3. This data is mainly time series, which means segments would be in time-bound chunks. Does Pinot broker prune segments by time? How is the broker managing segments? Does it just have an in-memory list of all segments for all tables? If so, metadata pruning will become a bottleneck for us on most queries. I’d like to see query time scale logarithmically with the size of the data.
Other concerns for us are around data types. It does not seem Pinot supports data types we commonly use like uint64, fixedpoints, etc. It also doesn’t seem to support nested data structures. How difficult would this be to add? Java BigInt
and BigDecimal
could handle the former assuming we implemented metadata handling. Nested data types is a little more nuanced.Kishore G
Noah Prince
10/26/2020, 2:40 PMNoah Prince
10/26/2020, 2:40 PMNoah Prince
10/26/2020, 2:44 PMKishore G
Kishore G
Noah Prince
10/26/2020, 2:45 PMKishore G
Kishore G
Noah Prince
10/26/2020, 2:47 PMNoah Prince
10/26/2020, 2:49 PMKishore G
Noah Prince
10/26/2020, 2:51 PMNoah Prince
10/26/2020, 2:51 PMKishore G
Kishore G
Noah Prince
10/26/2020, 2:57 PMKishore G
Noah Prince
10/26/2020, 2:58 PMKishore G
Kishore G
Noah Prince
10/26/2020, 2:59 PMNoah Prince
10/26/2020, 3:00 PMKishore G
Noah Prince
10/26/2020, 3:00 PMWHERE day = <x>
Kishore G
Noah Prince
10/26/2020, 3:01 PM