Apache Pinot #troubleshooting

Kishore G

09/22/2020, 7:52 PM

Are you looking at JVM graph or system memory graph

Subbu Subramaniam

09/22/2020, 8:01 PM

what is the retention of your table? By default, completed segments are kept on the same servers that they are consumed in Is this a hybrid or realtime-only table? It will be useful to run the

RealtimeProvisioningHelper

to get an idea of memory usage

Pradeep

09/22/2020, 8:03 PM

This is the memory usage graph, I am looking at system memory (no other active processes live on the system)

Pradeep

09/22/2020, 8:04 PM

retention is set to more than 30 days I believe, it’s a. hybrid table

Pradeep

09/22/2020, 8:04 PM

I can try running that

Subbu Subramaniam

09/22/2020, 8:06 PM

If it is hybrid table, with a frequent push on the offline side, your retention for realime table should be short. e.g, 5 days for daily offline push

Kishore G

09/22/2020, 8:08 PM

thats expected and is the right thing, OS is pretty good at managing the system memory

Pradeep

09/22/2020, 8:17 PM

I did observe query latencies going on when the system memory is high, I believe if I use the offheap memory then page caches should cleared up, once the mmaped memory used for backing realtime segments are deleted. Let me try that and see (don’t want to touch the system now, will try this in offpeak hours)

Kishore G

09/22/2020, 8:19 PM

can you do lsof on the realtime process

Yash Agarwal

09/23/2020, 7:22 AM

Is there a way to configure which servers should be part of a single replica group? or will pinot randomly assign them ?

Xiang Fu

09/23/2020, 7:31 AM

it should be randomly

Subbu Subramaniam

09/23/2020, 5:20 PM

@Yash Agarwal you should be able to create a znode with specific assignments of each replica group if desired. @Jackie have we documented this?

👍 1

Pradeep

09/30/2020, 12:12 AM

@Neha Pawar wondering if you know what’s going on here, Jackie refered to you I have segment which is in

consuming

state for close to 20h, from zk metadata

Copy code

"segment.creation.time": "1601350869326",

But my table config has segment rolling config as

Copy code

"realtime.segment.flush.threshold.size": "0",
        "realtime.segment.flush.threshold.time": "2h",
        "realtime.segment.flush.desired.size": "500M",

Pradeep

09/30/2020, 5:47 PM

Just a note on the star-tree documentation, supported functions list DISTINCT_COUNT_HLL but it seems like correct way to specify that is DISTINCTCOUNTHLL__<colname>

👍 1

Jackie

09/30/2020, 5:50 PM

@Pradeep Good point, we should also support

DISTINCT_COUNT_HLL__<colname>

Pradeep

09/30/2020, 5:51 PM

I tried that but it threw an exception

Jackie

09/30/2020, 5:51 PM

Yeah, will submit a fix for that

Pradeep

09/30/2020, 5:55 PM

thanks

Jackie

09/30/2020, 6:03 PM

@Pradeep Here is the fix: https://github.com/apache/incubator-pinot/pull/6079. Once it's merged, it should accept both format

👍 1

Neha Pawar

10/01/2020, 1:13 AM

@Chinmay Soman ^^

Pradeep

10/01/2020, 5:31 PM

Hi, I am trying to optimize a query of the format

Copy code

select colA, distinctCountHll(colb)
from table
where timestamp > X
group by colA

We added a star-tree with dimensionsSplitOrder: [“colA”] and aggregateFunction as DistinctCountHLL__colB I am not seeing much query-time improvements, I am doing a comparison against aggregate by colC which is not part of star-treee and see very similar times. I see that star-tree index is getting generated. Wondering if I am missing someething?

Kishore G

10/01/2020, 5:41 PM

@Pradeep output stats?

Pradeep

10/01/2020, 5:43 PM

timeMs=8541,docs=361440597/16479407613,entries=336003488/722881194,segments(queried/processed/matched/consuming/unavailable):5229/227/227/8/0,consumingFreshnessTimeMs=1601574112264,servers=4/4,groupLimitReached=false,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);172.31.17.90_R=1,8536,68556,0;172.31.30.139_O=1,3,372,0;172.31.34.149_O=1,4,372,0;172.31.24.127_R=1,8470,69174,0,

Pradeep

10/01/2020, 5:44 PM

2servres have old data, so thy don’t match anything

Kishore G

10/01/2020, 5:44 PM

its still scanning a lot

Pradeep

10/01/2020, 5:44 PM

yaeh

Kishore G

10/01/2020, 5:44 PM

@Jackie ^^

Yash Agarwal

10/01/2020, 5:45 PM

Also I think timestamp should be part of the dimensionsSplitOrder right ?

Jackie

10/01/2020, 5:46 PM

Yes, yash gives the answer

Jackie

10/01/2020, 5:47 PM

Can you try removing the filter on

timestamp

and see the latency?