How does Pinot scale with offline tables I get the impressio Apache Pinot #general

How does Pinot scale with offline tables? I get th...

Noah Prince

10/25/2020, 6:42 PM

How does Pinot scale with offline tables? I get the impression that every offline segment is loaded into an active offline server, which implies all of your offline data is loaded in some server. This seems very expensive, especially for something like 2 year old data. Does pinot lazily load old segments based on query demand? And how do indexes scale into offline tables?

Mayank

10/25/2020, 6:47 PM

We mmap the indexes, so they get paged in as needed. Depending on your sla requirements, you can use SSD or regular HDD on server nodes

Noah Prince

10/25/2020, 6:49 PM

I'm talking more for something like using s3 as offline access

Noah Prince

10/25/2020, 6:50 PM

I'm looking for something that can hit low latency SLAs but retire old data to s3 daily or weekly. This will be large volumes of data (200k+ messages/sec), so we can't really be keeping all of it in normal storage.

Noah Prince

10/25/2020, 6:51 PM

Clickhouse + Parquet files in s3 + Presto is a workable solution, but doesn't really give you any indexing in offline mode. Pinot looked interesting in that it might bridge that gap between historical and real-time querying

Kishore G

10/25/2020, 6:53 PM

You can use ebs mounted volume

Noah Prince

10/25/2020, 6:54 PM

Right, but price-wise aren't EBS volumes much more expensive than S3 storage?

Kishore G

10/25/2020, 6:55 PM

Yes,

Kishore G

10/25/2020, 6:56 PM

We don’t have native s3 support as of now -

Yupeng Fu

10/25/2020, 7:15 PM

You can think of Pinot as an indexing engine, so you can index the fields that you will query. If you want to explore on demand caching, there is no such thing in Pinot yet. However, you can explore other file system caching service like alluxio, and mount s3 as underneath storage for Pinot.

Yupeng Fu

10/25/2020, 7:19 PM

Btw I have not tried this Alluxio set up with Pinot. Though in theory it works, you might have to investigate

Open in Slack

Previous Next