Hi Team, Working on the complex query and need hel...
# general
r
Hi Team, Working on the complex query and need help in finding the best way to achieve it. The DB table has a time range based usage record. We have to create the time series usage. For Example: Sample Table: BEG_DT END_DT REGION 2020-01-01 2021-06-09 region_a 2020-06-29 2021-06-09 region_a 2020-01-01 2020-06-29 region_a 2020-01-01 2021-06-09 region_b 2020-01-01 2021-06-09 region_b 2020-01-01 2021-06-09 region_a 2020-01-01 2021-06-09 region_a 2020-07-08 2021-06-09 region_a 2020-01-01 2020-07-08 region_a 2021-05-10 2021-06-09 region_a 2020-01-01 2021-05-10 region_a 2020-01-01 2021-06-09 region_a ...... 2020-01-01 2021-06-09 region_a Result: Date: Active count: 2020-01-01 9000 2020-01-02 8940 ...... 2021-06-09 8067 What is the best way to write such query? Solution1: Do a join query. But it is will very resource intensive. Solution 2: Do application level processing and run select query in for loop for all the time series. Which solution is better 1 or 2? Or there is better way we can achieve this.
x
For 1, it means each 1 row will fan out to many rows depends on how many days between start/end dates. It can be achieved by adding a udf to convert the start/end date to a multi value dates within the range, then group by on the each individual value. For 2 you can loop start/end date then do group by query with predicate queryDate between startDate and endDate Cost perspective, 1 is definitely more cost effective, but requires some extra dev work. 2 is more costly as you need to compute the predicate for every date, but you can achieve it today in Pinot
r
@Xiang Fu: Thanks for the reply. Trying the first approach with UDF. When I use UDF in "Group by" query, it throws error. For simulating the issue, posting the similfied query. Query: SELECT groovy('{"returnType":"INT","isSingleValue":false}', 'def data = []; def c1 = 1 ;while ( c1 < 10 ) {data.add(c1);c1=c1+1} ;return data') as udf_output, count(*) FROM airlineStats group by udf_output ERROR: [ { "errorCode": 200, "message": "QueryExecutionError:\nProcessingException(errorCode:450, messageInternalError\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:236)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)" } ] Am I doing anything wrong? How can I make it work? Thanks in Advance.
x
I'm not familiar with groovy. Cc @Neha Pawar
r
@Xiang Fu: Do we have other options for writing UDF? In UDF scalar documentation, it's mentioned scalar supports only a single value output.
x
then it means groovy may not work, you need to write java code to add a new function
r
with Groovy I am getting multi-value output. only getting error when used with group by.
x
cc: @Jackie
is there null in your udf?
r
UDF doesn't have null. If you see the below query with UDF without Group by works fine. Query: SELECT groovy('{"returnType":"INT","isSingleValue":false}', '*def data = [1,2,3]; return data'*) as udf_output, * FROM airlineStats limit 1 Response: udf_output id 1,2,3 359 Query with Group by fails: SELECT groovy('{"returnType":"INT","isSingleValue":false}', '*def data = [1,2,3]; return data*') as udf_output, count(*) FROM airlineStats group by udf_output limit 1
x
hmm, your query works for me:
did you config:
Copy code
controller.disable.ingestion.groovy=false
pinot.broker.disable.query.groovy=false
r
I didn't.
x
can you config
controller.disable.ingestion.groovy=false
on controller and
pinot.broker.disable.query.groovy=false
on broker?
if you are using quickstart, you can put both lines in a file called quickstart.config, then add
-configFile quickstart.config
to the command line
r
Added "*controller.disable.ingestion.groovy=false"* and started controller "*bin/pinot-admin.sh StartController -configFileName config/pinot-controller.conf"* Added "*pinot.broker.disable.query.groovy=false"* and started broker "bin/pinot-admin.sh StartBroker -configFileName config/pinot-broker.conf" But still getting the same error. I am using Pinot 0.10.0.
x
Try newly released 0.11.0?
Do u have the logs?
For full stack trace
r
Thanks, @Xiang Fu! Works fine with 0.11.0. As the groovy script is vulnerable. Can we achieve UDF multi-value output with Scalar function?
x
Yes. You can do that
Just write a function and drop the jar should work