Hello I m running PoC with Pinot for quite heavy data I ve g Apache Pinot #general

Hello! I’m running PoC with Pinot for quite heavy ...

Pavel Stejskal

02/17/2022, 10:39 AM

Hello! I’m running PoC with Pinot for quite heavy data. I’ve got table with ~ 20 billions rows, 5 predicates (pred1 cardinality ~ 5 millions uniqs, pred2 and pred3 the same - tight correlation, last pred5 has very low cardinality, tens of values). I need to achieve the best possible speed for lookups by these predicates for whole range (20-50 billions/rows). Currently my table is creating for these predicates bloom & inverted indices. Second problem is ingestion rate - apparently there is no problem to get ~ 160k/s documents which is insane in contrast to resources needed, but at the same time the query performance is very bad - 6 servers are pretty busy with ingesting and GC thus query is pretty bad, 20-50 seconds. My current setup is 6 servers, 1 controller. Split commit enabled to s3. Because there will be low QPS, I need to achieve low memory allocation for indices/segments. Do I need to consider some kind of bucketing/hidden partitioning for predicate values or is Pinot able to handle these data in SLA ~ 1000-3000 ms only with proper indexing? I can imagine some sort of work delegation for servers, e.g. consuming/segment creating ~ 3-4 servers and for querying allocate 6 servers. PS: I’ve got replication 1 for space saving as final total will be ~ 20 TB, segment size is currently 460MB (but in table is set to 1GB). Ingesting from 36 kafka partitions Any improvements, thoughts or tricks are welcomed! 🙂

Richard Startin

02/17/2022, 11:22 AM

hi can we crosspost this in #C011C9JHN7R please?

Pavel Stejskal

02/17/2022, 11:26 AM

Sure, should I delete this post here?

Richard Startin

02/17/2022, 11:28 AM

It's probably better if you leave it because it will help others find #C011C9JHN7R

Richard Startin

02/17/2022, 11:29 AM

it's just more monitored for this sort of thing

👍 1

Open in Slack

Previous Next