https://pinot.apache.org/ logo
#general
Title
# general
x

xtrntr

07/31/2021, 12:23 AM
hello, i’m trying to figure out how pinot can support the queries i’ll like to make. my schema has the following fields:
Copy code
userId | time (15 minutes) | cell
a        0                      1
a        1                      2
a        2                      -1
a        3                      -1
a        4                      -1
a        5                      3
a        6                      3
..
z        0                      1
i’ll like to interpolate the missing values (-1), then query the result of that transformation. my understanding is that pinot doesn’t support subqueries but does so in conjunction with presto, is this the right channel to ask?
Copy code
# example of values after interpolation
userId | time (15 minutes) | cell
a        0                      1
a        1                      2
a        2                      2
a        3                      2 or 3
a        4                      3
a        5                      3
a        6                      3

# example of values after forwardFill
userId | time (15 minutes) | cell
a        0                      1
a        1                      2
a        2                      2
a        3                      2
a        4                      2
a        5                      3
a        6                      3
m

Mayank

07/31/2021, 12:26 AM
Yes you can use Pinot+Presto to get a full sql querying support. However, was a bit unclear on your requirement, what's the actual sql query you'd like to make?
x

xtrntr

07/31/2021, 12:27 AM
ah im not quite sure how to express this transformation in sql
but the input/output should give an idea of what im trying to achieve with forwardFill/interpolation?
k

Ken Krugler

07/31/2021, 12:30 AM
I’m guessing you’d need to do this interpolation via an ETL process (prior to generating segments), as otherwise your performance is going to be painfully slow if you’re basically doing series transformations for each query.
x

xtrntr

07/31/2021, 12:32 AM
that is a good idea and something i will consider, thanks @User!
for more context, one of the reasons i am looking into this possibility is because i’m building an analytical dashboard
and one of the options we extend to our users is to specify the maximum gap size for interpolation (as well as different techniques to apply interpolation)
k

Kishore G

07/31/2021, 12:53 AM
Will this be done for a single user or across all users?
x

xtrntr

07/31/2021, 12:54 AM
many users, they will be allowed to specify their custom interpolation values. but its possible that offering this may be customization taken too far
but ken’s answer has given me an idea, essentially i can get this transformation with window functions but it’ll probably be very slow
k

Kishore G

07/31/2021, 12:54 AM
I meant in a single query
x

xtrntr

07/31/2021, 12:54 AM
ah sorry. across all users
k

Kishore G

07/31/2021, 12:55 AM
What do you plan to do with so many data points?
x

xtrntr

07/31/2021, 12:56 AM
you mean after the interpolation is done?
k

Kishore G

07/31/2021, 12:56 AM
For e.g after interpolated/forward filling, how will the client use this data
Yes
x

xtrntr

07/31/2021, 12:56 AM
a simple example would be filtering for users who have been in cell x for y timebins
k

Kishore G

07/31/2021, 12:58 AM
Okay. So raw data -> window function transform + filter ?
x

xtrntr

07/31/2021, 12:59 AM
yup, that’s one use case
there might be one final aggregate step
k

Kishore G

07/31/2021, 12:59 AM
Was about to ask that next 😉
Ok..one more question, how many rows per user is and is it possible to partition it?
Also batch vs real-time
x

xtrntr

07/31/2021, 1:01 AM
so 1 user will definitively be 96 timebins (4*24) per day
well, we actually have this dashboard up already
but we use spark to run these analytical queries and we’re trying to see if we can support these queries for real time
just curious, what is the direction of your questions leading down to?
k

Kishore G

07/31/2021, 1:11 AM
To understand how much work should be done pre vs post ingestion
👍 1