hey friends, i have a question regarding `Table Co...
# troubleshooting
l
hey friends, i have a question regarding
Table Consuming Latency
I have been turning off and on various part of pinot to see how it behaves, this time i decided to turn off for sometime the kafka app that produces the records to pinot, i saw a latency increase when i turned off the app and at least for p99, it was 160ms and now is over a minute, when things like this happen when do you expect pinot to get back to its regular level does it ever get back? I was thinking as the day goes by maybe and this topic start to get less traffic then maybe things come down but I was wondering if that somehow can come back any other way. Ofc this is still pretty fast but I’m wondering what happens if I were to take down the app for a longer time how could that impact the p99 times
m
Assuming the > 1min latency you are referring to is for consumption, I’d say the consumption catches up pretty fast, however, as you can imagine it is a function of data size, number of partitions, number of servers etc. I’d recommend testing it for practical scenarios you think you will run into.
l
this is 2 servers 16 partitions the kafka app is only 1 replica and it’s back to speed and we are processing 4k messages/sec
r
can you share how you measure the "Table Consuming Latency" during the time the app is turned off?
l
avg by (table) (pinot_server_freshnessLagMs_XXthPercentile{kubernetes_namespace="$namespace"})
when i turned off the kafka app particularly for p99 it went up
r
ok so that value is measured by
Copy code
System.currentTimeMillis() - minConsumingFreshnessMs
in the case your app was turned off. it is basically a linear function of wall-time.
(since your minConsumingFreshnessms is your last ingested kafka msg timestamp)
is it possible for you to test turning the app back on and how fast this metrics restore to 0? liek mayank suggested?
l
right the app is on already but that metric is not going down
message has been deleted
that’s p99
r
can you make a query?
l
like MAX time or something like that on the table?
r
just count* is fine
l
yea i can
r
and check if the metrics comes down
l
they don’t come down
but i’m curious how does it relate
r
upon checking the code path that metrics only update when a query is being processed.
l
ohh, this cluster is actively getting some queries
r
yeah but is that specific for that table?
l
yes specific for this table
the metric too
r
oh. interesting!