Hi all, I am working on improving the query latenc...
# general
p
Hi all, I am working on improving the query latency for my realtime time series table. There is no corresponding offline table and all the data is realtime data. It has about 61 billion records with 3.5 million unique ids and a size of 2.7 TB. I have the range index set as the timestamp and the unique id as the inverted index. I have the incoming streaming data coming from kafka partitioned. I have the segmentation strategy set to the default of balanced segmentation. Stats are saying that there are 2 servers queried, 34 segments matched, 34 segments processed and 34 segments matched. I am getting a query response time of ~2 seconds and sometimes 4 sec and repeated querying is giving me 50 ms. Would the following changes improve the query performance? 1. Changing the segmentation strategy to Partitioned Replica-Group Segment Assignment 2. Bloom filter (does it improve the performance for individual queries or aggregate queries only?) 3. I am assuming star tree index helps with aggregation and not independent records 4. we have the partitioning set as murmur in the table config 5. How can I allocate / increase the hot/warm memory 6. Tenants are set to DefaultTenant for both server and broker. Would changing this improve? If so, what should be changed 7. Would enabling default star tree and dynamic start tree creation help? 8. Would disabling nullhandling affect the performance? Its currently set to true, but i dont expect null values for the indexed id and timestamp fields 9. Should I set autoGeneratedInvertedIndex and createInvertedIndexDuringSegmentGeneration to true. They are false currently
m
Few questions: • What’s the read qps? • Broker/Server VM cpu/mem • What’s the JVM configurations?
p
Its not much currently. Even with 1 query, we are getting this low perf
Its not being actively used.. Just testing perf against the table on query console
server mem is 32 g and 8 cpu - We have 42 servers
Same configuration for broker and we have 3 brokers
server jvm used is around 9 gb avg across the servers
server cpu is about 20%
m
Are local disks attached to server SSD?
p
This is all setup on AWS
m
Is EBS SSD?
Also, can you share the broker response metadata and the log when query takes 4s
p
Is there a way to check if the EBS is SSD?
broker latency is 1 second
let me share the log
you need the broker log?
m
Just the log line for the query request
And also the response metadata returned by broker
p
it could be any of the broker instances right?
should I look at each of the broker logs?
Copy code
[BaseBrokerRequestHandler] [jersey-server-managed-async-executor-204831] requestId=17175209,table=xxx_REALTIME,timeMs=4407,docs=84901/508174840,entries=0/1358416,segments(queried/processed/matched/consuming/unavailable):36/36/36/1/0,consumingFreshnessTimeMs=1651536519201,servers=2/2,groupLimitReached=false,brokerReduceTimeMs=20,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-34_R=0,4383,3116753,1,-1;pinot-server-35_R=0,557,2997678,1,-1,offlineThreadCpuTimeNs=0,realtimeThreadCpuTimeNs=0
Also, numEntriesScannedInFilter is 0 - what does it mean?
and numEntriesScannedPostFilter is 1358416 while numDocsScanned is 84901
that seems pretty high
m
numEntriesScannedInFilter is 0
means index was used for filtering, which is good
numDocsScanned is 84901
-> total rows selected
numEntriesScannedPostFilter
implies number of entries to read during aggregation/group-by
I don’t see anything wrong so far, but one server took 4383ms which seems odd.
It could be due to GC in server side, or high ingestion rate / segment generation taking resources.
For the requestId, can you check what the server 34 log says: pinot-server-34_R=0,4383,3116753,1,-1
Does the server log also say it took 4383 ms?
Also share the query
p
sorry for the late response. Thank you for the responses.
If it is a GC issue, it should have been sporadic right. I see this behaviour consistently though whenever I first query based on the id and timestamp range. It then gives 50 ms response time for subsequent queries
Whenever I change the id, it gives the same response time around 2 or 4 sec
and then it comes down to 50 ms consistently
as if its caching or reading from hot/warm data
Since I have only 2 days of data retained and I need all the data at once, all of my data should be in hot/warm and nothing should be cold
How can I ensure that?
m
Ok, then it is likely loading mmaped segments from disk to main memory. Typically that happens if you have too much data for a node, or a slow ebs or too little memory left for mmap (total - Xmx)
Can you check on these
p
Copy code
2022/05/03 13:56:26.001 INFO [QueryScheduler] [pqr-1] Processed requestId=17229229,table=channel_util_event_REALTIME,segments(queried/processed/matched/consuming)=19/19/19/1,schedulerWaitMs=0,reqDeserMs=1,totalExecMs=4507,resSerMs=1,totalTimeMs=4509,minConsumingFreshnessMs=1651586181491,broker=Broker_pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=28175,scanInFilter=0,scanPostFilter=450800,sched=fcfs,threadCpuTimeNs=0
is there any way to set the memory for hot/warm data?
what should be the percentage of Xmx
k
4 seconds for this query is bit high, most likely you have disk and most likely its remote (ebs)
1
p
yes
it is an ebs storage
what else is expected?
are you saying that the data is being loaded from remote storage every time a query is made?
k
p
ssd
m
Hmm, the symptoms are definitely pointing towards slow load from storage to memory. Let me ping you to get more details.