vishal
12/20/2022, 8:49 AMAshish Kumar
12/21/2022, 4:17 AMvishal
12/21/2022, 10:03 AMUDF - Scalar function
steps i've followed is as below:
created java project with the package name org.apache.pinot.scalar.ScalarFunc
.
Pom file:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="<http://maven.apache.org/POM/4.0.0>"
xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>"
xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/xsd/maven-4.0.0.xsd>">
<modelVersion>4.0.0</modelVersion>
<groupId>org.apache.pinot.scalar.ScalarFunc</groupId>
<artifactId>ScalarFunc</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.pinot</groupId>
<artifactId>pinot-common</artifactId>
<version>0.11.0</version>
</dependency>
</dependencies>
</project>
Main.java
package org.apache.pinot.scalar.ScalarFunc;
import org.apache.pinot.spi.annotations.ScalarFunction;
public class Main {
public static void main(String[] args) {
// System.out.println("Hello world!");
}
@ScalarFunction
static String getdata(String ref){
return "hurray testing is working";
}
}
created jar file using mvn clean install
and moved that jar file to pinot/plugins.
and than reinstalled the pinot.
tried to run select getdata(AirTime) from airlineStats limit 10
this query. here AirTime is column name.
but its returning error as below:
[
{
"message": "QueryExecutionError:\norg.apache.pinot.spi.exception.BadQueryRequestException: Unsupported function: getdata not found\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:304)\n\tat org.apache.pinot.core.operator.transform.TransformOperator.<init>(TransformOperator.java:65)\n\tat org.apache.pinot.core.plan.TransformPlanNode.run(TransformPlanNode.java:71)\n\tat org.apache.pinot.core.plan.SelectionPlanNode.run(SelectionPlanNode.java:71)",
"errorCode": 200
}
]
am i doing anything wrong?
reference: https://docs.pinot.apache.org/users/user-guide-query/scalar-functions#scalar-functionsAshish Kumar
12/22/2022, 3:52 AMRalph Debusmann
12/22/2022, 11:00 AMvishal
12/22/2022, 11:24 AMid, name, position, timestamp
1,vishal,SDE1,
2,Raj, SDE2
1,vishal, SDE2
1,vishal,NULL
here i am try to featch data for id=1 so SQL query will be SELECT position FROM table where id=1
it will return 3 records.
but i want only one record which is 1,vishal,SDE2
.
can we achieve through scalar function lets say we will pass id as arg and fetch all data in that function get write a logic which will merge the rows and return one string which is concat of all columns.
EX.:
SELECT data(1) FROM table;
def of data func(id) {
select all the data which has id=1 and merge those record and return one single string
}
is it possible? can we query inside the function? or we need to call api to query data inside the function?Shakti Singh
12/22/2022, 3:41 PMRostan TABET
12/23/2022, 1:29 PMRostan TABET
12/23/2022, 1:31 PMBloom filter helps prune segments that do not contain any record matching an EQUALITY predicate.I wonder if it also includes
IN
predicates.
For example, can a bloom filter be useful for a predicate in the form fruit IN ('banana', 'apple', 'orange')
or do I need to change it to fruit = 'banana' OR fruit = 'apple' OR fruit = 'orange'
?Dileep Kancharla
12/23/2022, 3:00 PMRohit Yadav
12/26/2022, 7:20 AMAmos Bird
12/27/2022, 7:10 AMAmos Bird
12/28/2022, 2:45 PMRaw value forward index
very confusing. Is it just a column storage of the original data? I don't see why it's called an index.Amos Bird
12/28/2022, 2:58 PMcolB
is reordered before doing dict-encoding in Dictionary-encoded forward index with bit compression (default)Timothy Spann
12/29/2022, 12:31 AMAbhishek Dubey
01/02/2023, 10:45 AMShankar Uprety
01/04/2023, 1:17 AMShreyans Bhavsar
01/04/2023, 11:08 AMHarshit
01/04/2023, 11:23 AMCould not find index for column: gKey, type: FORWARD_INDEX, segment: /tmp/data/pinotServerData/key1_OFFLINE/key1_3_
Schema
{
"schemaName": "key",
"dimensionFieldSpecs": [
{
"name": "rootKey",
"dataType": "STRING"
},
{
"name": "gKey",
"dataType": "STRING"
}
],
"primaryKeyColumns": [
"gKey"
]
}
Table config
{
"tableName": "key",
"tableType": "OFFLINE",
"isDimTable": true,
"segmentsConfig": {
"schemaName": "key",
"segmentPushType": "REFRESH",
"replication": "1"
},
"tenants": {},
"tableIndexConfig": {
"loadMode": "MMAP"
},
"metadata": {
"customConfigs": {}
},
"quota": {
"storage": "200M"
}
}
Harshit
01/05/2023, 6:46 AMCould not find index for column: gKey, type: FORWARD_INDEX, segment: /tmp/data/pinotServerData/key1_OFFLINE/key1_3_
Schema
{
"schemaName": "key",
"dimensionFieldSpecs": [
{
"name": "rootKey",
"dataType": "STRING"
},
{
"name": "gKey",
"dataType": "STRING"
}
],
"primaryKeyColumns": [
"gKey"
]
}
Table config
{
"tableName": "key",
"tableType": "OFFLINE",
"isDimTable": true,
"segmentsConfig": {
"schemaName": "key",
"segmentPushType": "REFRESH",
"replication": "1"
},
"tenants": {},
"tableIndexConfig": {
"loadMode": "MMAP"
},
"metadata": {
"customConfigs": {}
},
"quota": {
"storage": "200M"
}
}
Shreeram Goyal
01/05/2023, 9:13 AMPratik Tibrewal
01/06/2023, 8:39 AMdate_trunc('week', date_parse(datestr, '%y-%m-%d')) + interval '3' day
Tim Berglund
Tim Berglund
Unmesh Vijay Kadam
01/09/2023, 8:05 AMPrashant Korade
01/09/2023, 6:11 PMTim Berglund
Rostan TABET
01/10/2023, 9:18 AMCursor.fetchall
has the following docstring :
Fetch all (remaining) rows of a query result, returning them as a
sequence of sequences (e.g. a list of tuples). Note that the cursor's
arraysize attribute can affect the performance of this operation.However, the method's implementation is simply :
return list(self)
which basically creates a list by calling fetchone
, i.e. self._results.pop(0)
once for each element of the list self._results
. I wonder if there is a reason for this, instead of something like :
res = self._results
self._results = []
return res
My main concern is about possible performance issues when the query result contains many rowsSachin Mittal Consultant
01/10/2023, 5:08 PMSachin Mittal Consultant
01/10/2023, 8:08 PMCaught exception while decoding row
These mostly seem to be from my transformation functions, I have used some in-built functions and for those specific functions the stack trace indicated what problems could be and I have fixed them.
However I also needed from groovy scripts too and I executed this scripts standalone to check if they are working fine and they do
However it still does not seem to be decoding row and I am not able to now figure out where the problem is.
The stack trace I get is something like:
org.apache.pinot.shaded.com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0x89
at [Source: (ByteArrayInputStream); line: 1, column: 3]
at org.apache.pinot.shaded.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2337) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.shaded.com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:710) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3607) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeCharForError(UTF8StreamJsonParser.java:3350) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3582) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2688) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:870) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:762) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.shaded.com.fasterxml.jackson.databind.ObjectReader._bindAsTree(ObjectReader.java:2058) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.shaded.com.fasterxml.jackson.databind.ObjectReader._bindAndCloseAsTree(ObjectReader.java:2044) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.shaded.com.fasterxml.jackson.databind.ObjectReader.readTree(ObjectReader.java:1739) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.spi.utils.JsonUtils.bytesToJsonNode(JsonUtils.java:211) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder.decode(JSONMessageDecoder.java:61) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder.decode(JSONMessageDecoder.java:73) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder.decode(JSONMessageDecoder.java:37) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.spi.stream.StreamDataDecoderImpl.decode(StreamDataDecoderImpl.java:47) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.processStreamEvents(LLRealtimeSegmentDataManager.java:549) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.consumeLoop(LLRealtimeSegmentDataManager.java:434) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:629) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
at java.lang.Thread.run(Thread.java:832) [?:?]
Is there a way to debug is better ?