Hi team, we have a use case for retrieving records...
# general
c
Hi team, we have a use case for retrieving records by scanning ~10b records and filtering by date range and id(1m ids in total) without doing any aggregations, with ~100 qps at peak time. Is it possible to scale and configure Pinot to support this kind of use case, if so what would be the best latency we can expect?
m
You want to fetch raw records based on date range? What’s the upper bound of records one query will fetch and what’s the avg?
Also, avg record/row length?
c
We can set the upper bound if necessary, currently there is not. In avg fetch records count could be as ~100k.
m
It should be fine. Can you give me a sample query structure?
For example if the query is something like
select * from <table> where id = xxx and date between (x, y)
then you can sort and partition on id (if it has equality predicate), and have range index on date, and you can also use replica-groups. This will give you great performance and scalability.
c
Select * from table where date between ‘2022-09-01’ and ‘2022-10-01’ and id in (1,2,3) and filter1=‘something’ order by date
m
Ok, will filter1 always have equality predicate, and is medium to high cardinality? If so, sort/partition on it.
c
that extra filter1 only has a few less than 5 possible values.
m
Ok, then sort on id for sure. You may not need to partition, given that that read qps ~100qps
c
By sorting you mean at ingestion time or query time?
m
ingestion time
c
Is having id in the invertedindexcolumns enough?
m
Yeah, that works too. Sorted index is better than inverted index for overall performance, as it brings in locality.
c
The thing is we would need to filter by two ids. Looks like sortedcolumn only support one column.
m
It helps with locality, that matters if you go at high throughput (> 1000 qps), but ok for your case.
c
We are having invertedindex on id columns but still fetching 500k records exceeded 10s timeout.
m
How long does count(*) take for that query? Also what’s your cluster setup like (instance type jvm conf etc)?
c
Count(*) took less than 500ms
m
500ms seems large
What’s the size of each record? Also does your query have order by
Do you have range index on time? What’s the time granularity
c
No order by, just select.
We don’t have any columns in rangeindexcolumns, the average query latency of this table is less than 100ms.
m
What happens if you use LIMIT to get a smaller number? When does it start to break? I suspect that your rows are large and that may be putting memory pressure.
c
You mean too many columns in a row?
m
Total size of a row (sum of all column values for a row)
c
It’s a demoralized table with about 80 columns with combination of string and Int metrics.
It start to break at 10k limit.
m
What’s the size of each row in bytes? How many servers, what’s their cpu/mem/jvm conf? I feel your cluster may be under resourced to handle the payload.
c
Is there some Pinot query to get size of row in bytes?
m
You can query one row, and get its size outside of Pinot
c
It’s a four broker cluster, the cpu usage is less than 5%, memory is at 80%. The performance has been decent with our traffic.
m
servers?
Not the usage, of cpu/mem, but actual available on servers
c
4 servers in the cluster. CPU 10G and memory 100G
m
100GB memory? That seems more than enough, what’s the jvm heap size
c
It’s at 24G
m
Then your per row size must be really big, otherwise I don’t see any other reason for the issue
c
It’s averagely 0.5 kb per row based on the table size and row count.
a
What level of latency are you looking for? P75? P90?
c
I was looking for something like p75: 1s p95:5s or so. But now it’s always timing out after 10 seconds even with just selecting 5 columns with 100k rows.
m
Table size is pinot segment size which is highly compressed. Can you get one row from Pinot and save it as text and give its file size (as a proxy)?