Diogo Baeder
02/25/2022, 8:54 PMrequestId=14,table=<redacted>,timeMs=545,docs=259503/9327428,entries=3080570/1038012,segments
this means that a query took 545ms to yield a result? Or does it just mean that the broker processed the query in that time and then sent the data queries to the servers? I'm asking this because to get all the data into my application (+ SQLAlchemy processing time) it took about 40s, so I'm wondering where all that time is being spent... (I might just do some profiling on my side, but I'm asking here because I want to have a better understanding of the logs I get from Pinot)Mayank
Diogo Baeder
02/25/2022, 9:49 PMpinotdb
library that could improve I think, but also a lot of inneficient processing on my side too - I first convert timestamps into datetime
objects, and then do the aggregation on them, when I should actually be doing the inverse, first aggregating and only then converting to datetime
.
The reason why I do this is because this is for analysing user sessions in our website, where each session is a chunk of requests no longer than 30min apart, and since I didn't find any function in Pinot that could do this sort of aggregation I'm doing this in Python. But I recon I could do much better than this.Mayank
Diogo Baeder
02/25/2022, 11:03 PMdatetime.datetime
instance (which is not a string). But don't worry 🙂
By the way, I just found out a quick and dirty, but somewhat reliable, way to cut out 10s from those 40s just by accessing some internals of pinotdb
😄list
, and doing this is inefficient in Python. I'll try to improve that in the library soon, if I find the time, there are other collections in Python that can be more appropriate.