< UDRJ7G85T> and I were discussing my team modifying the pin Apache Pinot #general

<@UDRJ7G85T> and I were discussing my team modifyi...

Noah Prince

10/26/2020, 1:47 PM

@User and I were discussing my team modifying the pinot server to include a

lazy

mode that would set it to lazily pull segments as they are requested using an LRU cache. It should just take some modification to the

SegmentDataManager

and maybe the table manager. This would allow using s3 as the primary storage, with pinot as the query/caching layer for long term historical tiers of data. Similar to the tiering example, you’d have a third set of lazy servers for reading data older than 2 weeks. This is explicitly to avoid large EBS volume costs for very large data sets. My main concern is this — a moderately sized dataset for us is 130GB a day. We have some that can be in the terra range per day. Using 500MB segments, you’re looking at ~260 segments a day. Maybe ~80k segments a year. In this case, broker pruning is very important because any segment query sent to the lazy server means materializing data from s3. This data is mainly time series, which means segments would be in time-bound chunks. Does Pinot broker prune segments by time? How is the broker managing segments? Does it just have an in-memory list of all segments for all tables? If so, metadata pruning will become a bottleneck for us on most queries. I’d like to see query time scale logarithmically with the size of the data. Other concerns for us are around data types. It does not seem Pinot supports data types we commonly use like uint64, fixedpoints, etc. It also doesn’t seem to support nested data structures. How difficult would this be to add? Java

BigInt

and

BigDecimal

could handle the former assuming we implemented metadata handling. Nested data types is a little more nuanced.

🙌 2

👍 3

Kishore G

10/26/2020, 2:31 PM

Thanks for writing this up. Let’s create an issue and continue the discussion there. Others can jump in

Noah Prince

10/26/2020, 2:40 PM

Two separate issues, I suppose, on the data types

Noah Prince

10/26/2020, 2:40 PM

Unless custom data types are already supported?

Noah Prince

10/26/2020, 2:44 PM

https://issues.apache.org/jira/browse/PINOT-12

Kishore G

10/26/2020, 2:44 PM

oh we dont use apache jira

Kishore G

10/26/2020, 2:44 PM

gituhub issues

Noah Prince

10/26/2020, 2:45 PM

Explains why there aren’t many haha

Kishore G

10/26/2020, 2:45 PM

https://github.com/apache/incubator-pinot/pull/6053

Kishore G

10/26/2020, 2:46 PM

BigDecimal support is already there, we haven't added it as a DataType yet but you can use Bytes to represent BigDecimal

Noah Prince

10/26/2020, 2:47 PM

https://github.com/apache/incubator-pinot/issues/6187

Noah Prince

10/26/2020, 2:49 PM

Is there an issue tracking adding BigDecimal as a data type?

Kishore G

10/26/2020, 2:49 PM

no,

Noah Prince

10/26/2020, 2:51 PM

So, how is segment metadata currently handled? Is it stored in some kind of database? Is that database indexed in anyway, perhaps by table?

Noah Prince

10/26/2020, 2:51 PM

I found the pruning code, that seems to happen in memory over some list of segments. But I’m not sure where that list of segments would come from

Kishore G

10/26/2020, 2:55 PM

It comes from Helix (via Zookeeper)

Kishore G

10/26/2020, 2:56 PM

Broker maintains a watch and everytime a new segment it created, it updates the routing table

Noah Prince

10/26/2020, 2:57 PM

Ah. So there’s a routing table for each pinot table.

Kishore G

10/26/2020, 2:58 PM

yes

Noah Prince

10/26/2020, 2:58 PM

So at most, you’re scouring through one table worth of segments. You said there was a time when a table had 1 million segments. How did that perform for each query? Are segments pruned by time block?

Kishore G

10/26/2020, 2:59 PM

pruning would happen on server side

Kishore G

10/26/2020, 2:59 PM

and since there are many servers, pruning would happen in parallel

Noah Prince

10/26/2020, 2:59 PM

Well, in this case you’d want to avoid that. Because you can’t prune without having the segment to look at. Which means all of the lazy segments would materialize

Noah Prince

10/26/2020, 3:00 PM

Also from a performance standpoint, having servers pruning 3 years worth of segments for a single query isn’t great.

Kishore G

10/26/2020, 3:00 PM

yeah, lets discuss that on the issue. We might want to keep the segment metadata local or in memory

Noah Prince

10/26/2020, 3:00 PM

Especially if that query is

WHERE day = <x>

Kishore G

10/26/2020, 3:01 PM

agree

Noah Prince

10/26/2020, 3:01 PM

Yeah. Worth discussing on the issue. Just wanted to make sure my understanding was correct before discussing formally there

Open in Slack

Previous Next