Priyank Bagrecha
06/21/2022, 11:24 PM{e1:b1,e2:b2,e3:b3...}
an account can be in multiple ab tests. the query is going to be SELECT exp_id, event_ts, bucket_id, DISTINCTCOUNTHLL(account_id) FROM table WHERE exp_id = <exp id> AND event_ts > start_time and event_ts < end_time GROUP BY event_ts, bucket_id
. event_ts has a granularity of some time interval so it is not a problem of high cardinality.Mayank
Priyank Bagrecha
06/22/2022, 1:41 AMMAP_VALUE(mapColumn, 'myKey', valueColumn)
to get bucket id. i am about to give it a try.Priyank Bagrecha
06/22/2022, 1:42 AMe1:e1#b1&e2:e2#b2...
so trying to use groovy script to add a derived complex field for quick testingMayank
Priyank Bagrecha
06/22/2022, 1:59 AMPriyank Bagrecha
06/22/2022, 2:22 AM"ingestionConfig": {
"transformConfigs": [
{
"columnName": "ab_test_map",
"transformFunction": "Groovy({string_map.split('&').collectEntries{ entry -> [(entry.split(':').first()): entry.split(':').last()]}}, string_map)"
}
]
},
Priyank Bagrecha
06/22/2022, 3:10 AM{"exp":[{"expId": "e1", "bucketId": "b1"},{"expId": "e2", "bucketId": "b2"}...]}
? how do i query that? i will have value of an experiment id i.e. e1
as a parameter for the query.Mayank
Priyank Bagrecha
06/22/2022, 4:39 AMMayank
Priyank Bagrecha
06/22/2022, 5:10 AMMayank
Priyank Bagrecha
06/22/2022, 5:44 AMPriyank Bagrecha
06/22/2022, 5:46 AMmyfield
from e1:e1#b1&e2:e2#b2..
to e1:e1#b1,e2:e2#b2,...
and the field in schema is
{
"name": "myfield",
"dataType": "STRING",
"singleValueField": false
}
and then tried this query
SELECT VALUEIN(myfield, 'e1:e1#b1') AS bucket_id, COUNT(*) FROM mytable GROUP BY bucket_id
but it returns nothing. i do see entries with that though so not sure what am i missingMayank
Priyank Bagrecha
06/22/2022, 5:49 AMSELECT VALUEIN(RandomAirports, 'SEA') as airport, count(*)
FROM airlineStats
GROUP BY airport
from https://docs.pinot.apache.org/configuration-reference/functions/valuein#signature with data like
RandomAirports
SEA,PSC,PHX,MSY,ATL,TYS,DEN,CHS,PDX,LAX,EWR,SFO,PIT,RDU,RAP,LSE,SAN,SBN,IAH,OAK,BRO,JFK,SAT,ORD,ACY,DFW,BWI,TPA,BFL,BOS,SNA,ISN
SEA,PSC,PHX,MSY,ATL,TYS,DEN,CHS,PDX,LAX,EWR,SFO,PIT,RDU,RAP,LSE,SAN,SBN,IAH,OAK
SEA,PSC,PHX,MSY,ATL,TYS,DEN
null
null
null
SEA,PSC,PHX,MSY,ATL,TYS,DEN,CHS
SEA,PSC,PHX,MSY,ATL,TYS,DEN,CHS,PDX,LAX,EWR,SFO,PIT,RDU,RAP,LSE,SAN
Mayank
Priyank Bagrecha
06/22/2022, 7:13 AM"ingestionConfig": {
"transformConfigs": [
{
"columnName": "derivedfield",
"transformFunction": "Groovy({myfield.split('&')}, myfield)"
}
]
},
and then doing this query
SELECT VALUEIN(derivedfield, 'e1:e1#b1', 'e1:e1#b2') AS bucket_id,
DISTINCTCOUNTHLL(account_id) AS distinct_allocations
FROM allocation_events
GROUP BY bucket_id
to get final response like
bucket_id distinct_allocations
e1:e1#b1 403
e1:e1#b2 418
Priyank Bagrecha
06/22/2022, 7:14 AMJackie
06/22/2022, 4:58 PMderivedfield = 'e1:e1#b1'
in the query?Jackie
06/22/2022, 4:58 PMPriyank Bagrecha
06/22/2022, 5:02 PMe1:e1#b1.e2:e2#b2,...
etc but yeah what i have right now wont scale well as it scans all documentsJackie
06/22/2022, 6:29 PMVALUE_IN
should give you the same result without scanning all the documentsJackie
06/22/2022, 6:29 PMSELECT VALUEIN(myfield, 'e1:e1#b1') AS bucket_id, COUNT(*) FROM mytable GROUP BY bucket_id
-> ``SELECT VALUEIN(myfield, 'e1:e1#b1') AS bucket_id, COUNT(*) FROM mytable WHERE myfield = 'e1:e1#b1' GROUP BY bucket_id`Jackie
06/22/2022, 6:30 PMSELECT VALUEIN(derivedfield, 'e1:e1#b1', 'e1:e1#b2') AS bucket_id,
DISTINCTCOUNTHLL(account_id) AS distinct_allocations
FROM allocation_events
GROUP BY bucket_id
->
SELECT VALUEIN(derivedfield, 'e1:e1#b1', 'e1:e1#b2') AS bucket_id,
DISTINCTCOUNTHLL(account_id) AS distinct_allocations
FROM allocation_events
WHERE derivedfield IN ('e1:e1#b1', 'e1:e1#b2')
GROUP BY bucket_id
Priyank Bagrecha
06/22/2022, 6:38 PMe1:b1
etc. i have events where derivedField = e1:b1,e2:b2.e3:b3...
so to me derivedField doesn't exist in ('e1:b1', 'e1:b2'). how does this work? is derivedField in evaluating every element in the array instead of evaluating entire array?Jackie
06/23/2022, 5:04 PM