https://pinot.apache.org/ logo
#troubleshooting
Title
# troubleshooting
d

Diogo Baeder

02/25/2022, 8:54 PM
Sorry to spam you guys here, but yet another question: if I see something like this in the broker logs:
requestId=14,table=<redacted>,timeMs=545,docs=259503/9327428,entries=3080570/1038012,segments
this means that a query took 545ms to yield a result? Or does it just mean that the broker processed the query in that time and then sent the data queries to the servers? I'm asking this because to get all the data into my application (+ SQLAlchemy processing time) it took about 40s, so I'm wondering where all that time is being spent... (I might just do some profiling on my side, but I'm asking here because I want to have a better understanding of the logs I get from Pinot)
m

Mayank

02/25/2022, 9:41 PM
Total end-end latency as seen by broker is 545ms
Is there a complex json de-serialization happening on client side?
d

Diogo Baeder

02/25/2022, 9:49 PM
Ah, got it! Yeah, I did some profiling, there's some parts of the
pinotdb
library that could improve I think, but also a lot of inneficient processing on my side too - I first convert timestamps into
datetime
objects, and then do the aggregation on them, when I should actually be doing the inverse, first aggregating and only then converting to
datetime
. The reason why I do this is because this is for analysing user sessions in our website, where each session is a chunk of requests no longer than 30min apart, and since I didn't find any function in Pinot that could do this sort of aggregation I'm doing this in Python. But I recon I could do much better than this.
But if you know if it's possible to do something like this on the Pinot side (broker perhaps), I'd happily favor that instead 🙂
m

Mayank

02/25/2022, 10:19 PM
I am not fully sure if I understand the reuqirement, but you can always to datetime transforms on pinot side.
d

Diogo Baeder

02/25/2022, 11:03 PM
Yeah, I know, that I already do, but it transforms the data into a string, I then transform into a Python
datetime.datetime
instance (which is not a string). But don't worry 🙂 By the way, I just found out a quick and dirty, but somewhat reliable, way to cut out 10s from those 40s just by accessing some internals of
pinotdb
😄
There's a problem in the library which is, when iterating over the results it gets from the Broker API, it keeps popping the first element of a
list
, and doing this is inefficient in Python. I'll try to improve that in the library soon, if I find the time, there are other collections in Python that can be more appropriate.