Sukesh Boggavarapu
11/16/2022, 10:29 PMSukesh Boggavarapu
11/16/2022, 10:30 PMMahesh babu
11/17/2022, 6:18 AM{
"tableName": "rcem_map_dly2",
"tableType": "OFFLINE",
"segmentsConfig": {
"segmentPushType": "APPEND",
"segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
"schemaName": "rcem_map_dly2",
"replication": "1"
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableIndexConfig": {
"loadMode": "MMAP"
},
"ingestionConfig": {
"batchIngestionConfig": {
"segmentIngestionType": "APPEND",
"segmentIngestionFrequency": "DAILY"
}
},
"metadata": {}
}
Varagini Karthik
11/17/2022, 9:25 AMjava.lang.OutOfMemoryError: Java heap space
how to increase the heap size ?Shubham Kumar
11/17/2022, 12:34 PMspark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master yarn --jars "<s3://dataplatform/jars/pinot-batch-ingestion-spark-3.2-0.11.0-shaded.jar,s3://dataplatform/jars/pinot-all-0.11.0-jar-with-dependencies.jar>" --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true --deploy-mode client --conf "spark.driver.extraClassPath=pinot-batch-ingestion-spark-3.2-0.11.0-shaded.jar:pinot-all-0.11.0-jar-with-dependencies.jar" --conf "spark.executor.extraClassPath=pinot-batch-ingestion-spark-3.2-0.11.0-shaded.jar:pinot-all-0.11.0-jar-with-dependencies.jar" --files <s3://testbucket-data/test/spark_spec92.yaml> <local://pinot-all-0.11.0-jar-with-dependencies.jar> -jobSpecFile spark_spec92.yaml
can somebody please help with this error : Caused by: java.lang.ClassNotFoundException: org.apache.pinot.plugin.filesystem.S3PinotFSPrashant Pandey
11/17/2022, 6:06 PMmaxRowCount
of a segment in the following cases:
1. realtime.segment.flush.threshold.rows
is not set in the stream config.
2. realtime.segment.flush.threshold.rows
is set to "0"
in the stream config.
Here’s a log of a segment with flush time of 1h
and realtime.segment.flush.threshold.rows
as 0:
2022/11/17 17:44:46.259 INFO [LLRealtimeSegmentDataManager_raw_service_view_1__9__477__20221117T1744Z] [HelixTaskExecutor-message_handle_thread_23] Starting consumption on realtime consuming segment raw_service_view_1__9__477__20221117T1744Z maxRowCount 703125 maxEndTime 2022-11-17T18:44:44.653Z
The max end-time is fine, but how is it getting the maxRowCount
as 703125? I could not look for any such logic in code: PartitionLevelStreamConfig#extractFlushThresholdRows
.
Similarly, when realtime.segment.flush.threshold.rows
is null, it again prints a random value. Want to understand how this is being calculated? My use-case is I want to flush the segment only based on endtime. That is, after every 1h. So I set only
"realtime.segment.flush.threshold.time": "1h"
and no other prop.Gerrit van Doorn
11/17/2022, 6:34 PMStuart Millholland
11/17/2022, 8:33 PMkurt
11/17/2022, 11:10 PMAbhishek Dubey
11/18/2022, 5:44 AMMahesh babu
11/18/2022, 8:12 AMharnoor
11/18/2022, 10:04 AMFailed to find segment ZK metadata for segment:
, which is causing high consumer lag. Can someone suggest how can we fix this issue?
Thanks
Update: Searching for similar errors here on slack helped. https://apache-pinot.slack.com/archives/C011C9JHN7R/p1649665219641519 . Restarted all the components and pausing and restarting the consumption for all the tables helped in resolving the issue.Tiger Zhao
11/18/2022, 5:01 PMkurt
11/18/2022, 7:12 PMINSERT INTO "baseballStats"
FROM FILE '<s3://my-bucket/public_data_set/baseballStats/rawdata/>'
OPTION(taskName=myTask-s3)
OPTION(input.fs.className=org.apache.pinot.plugin.filesystem.S3PinotFS)
OPTION(input.fs.prop.accessKey=my-key)
OPTION(input.fs.prop.secretKey=my-secret)
OPTION(input.fs.prop.region=us-west-2)
When I open up the pinot controller web interface, use the SQL/PQL Query console, paste that query in and run it, I get a parse exception:
ProcessingException(errorCode:150, message:PQLParsingError:
org.apache.pinot.sql.parsers.SqlCompilationException: Caught exception while parsing query: INSERT INTO "baseballStats"
FROM FILE '<s3://my-bucket/public_data_set/baseballStats/rawdata/>'
at org.apache.pinot.sql.parsers.CalciteSqlParser.compileToPinotQuery(CalciteSqlParser.java:139)
at org.apache.pinot.sql.parsers.CalciteSqlCompiler.compileToBrokerRequest(CalciteSqlCompiler.java:35)
at org.apache.pinot.controller.api.resources.PinotQueryResource.getQueryResponse(PinotQueryResource.java:166)
...
Caused by: org.apache.calcite.sql.parser.SqlParseException: Incorrect syntax near the keyword 'FROM' at line 2, column 1.
Was expecting one of:
"/*+" ...
"(" ...
"WITH" ...
...
Caused by: org.apache.calcite.sql.parser.babel.ParseException: Incorrect syntax near the keyword 'FROM' at line 2, column 1.
Was expecting one of:
"/*+" ...
"(" ...
"WITH" ...)
So, the example in the official documentation should parse successfully. It should error because the table name + S3 location + access key are invalid, but I know how to fix that. I’d like to get the basic syntax of this command working first. Am I possibly running the wrong version of Pinot?kurt
11/19/2022, 8:16 AMselect count(*) from my_table;
, I see the query response stats, but I don’t see the query results. Why?
Pinot has a built-in SQL query engine based on Calcite and also integrates with Trino and Presto for SQL query capabilities. When would I want to use the built-in SQL vs Trino vs Presto?kurt
11/20/2022, 4:41 AMRecordReader initialized will read a total of 51031051 records.
<snip>
Finished building StatsCollector!
Collected stats for 51031051 documents
<snip>
Start building IndexCreator!
<snip>
Assembled and processed 40733690 records from 25 columns in 346811 ms: 117.452126 rec/ms, 2936.303 cell/ms
time spent so far 0% reading (496 ms) and 99% processing (346811 ms)
at row 40733690. reading next block
block read in memory in 92 ms. row count = 6683170
Assembled and processed 47416860 records from 25 columns in 403202 ms: 117.60075 rec/ms, 2940.0188 cell/ms
time spent so far 0% reading (588 ms) and 99% processing (403202 ms)
at row 47416860. reading next block
block read in memory in 35 ms. row count = 3614191
Finished records indexing in IndexCreator!
Finished segment seal!
Converting segment: /var/pinot/minion/data/SegmentGenerationAndPushResult/tmp-27d53515-e7c4-46a5-9655-75221c493a9e/output/oemdashboard_OFFLINE_17743_19312_0 to v3 format
Reflections took 181 ms to scan 2 urls, producing 16 keys and 35 values
Initialized SegmentDirectoryLoaderRegistry with 1 segmentDirectoryLoaders: [default] in 185 ms
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007f0a8c85d166, pid=1, tid=62
#
# JRE version: OpenJDK Runtime Environment 18.9 (11.0.16+8) (build 11.0.16+8)
# Java VM: OpenJDK 64-Bit Server VM 18.9 (11.0.16+8, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# v ~StubRoutines::jbyte_disjoint_arraycopy
#
# Core dump will be written. Default location: /opt/pinot/core.1
#
# An error report file with more information is saved as:
# /opt/pinot/hs_err_pid1.log
Compiled method (c2) 1390132 7053 4 jdk.internal.misc.Unsafe::copyMemory (33 bytes)
total in heap [0x00007f0a94769010,0x00007f0a94769848] = 2104
relocation [0x00007f0a94769188,0x00007f0a947691a8] = 32
main code [0x00007f0a947691c0,0x00007f0a947694e0] = 800
stub code [0x00007f0a947694e0,0x00007f0a947694f8] = 24
metadata [0x00007f0a947694f8,0x00007f0a94769538] = 64
scopes data [0x00007f0a94769538,0x00007f0a94769778] = 576
scopes pcs [0x00007f0a94769778,0x00007f0a94769828] = 176
dependencies [0x00007f0a94769828,0x00007f0a94769830] = 8
nul chk table [0x00007f0a94769830,0x00007f0a94769848] = 24
Compiled method (c2) 1390134 7053 4 jdk.internal.misc.Unsafe::copyMemory (33 bytes)
total in heap [0x00007f0a94769010,0x00007f0a94769848] = 2104
relocation [0x00007f0a94769188,0x00007f0a947691a8] = 32
main code [0x00007f0a947691c0,0x00007f0a947694e0] = 800
stub code [0x00007f0a947694e0,0x00007f0a947694f8] = 24
metadata [0x00007f0a947694f8,0x00007f0a94769538] = 64
scopes data [0x00007f0a94769538,0x00007f0a94769778] = 576
scopes pcs [0x00007f0a94769778,0x00007f0a94769828] = 176
dependencies [0x00007f0a94769828,0x00007f0a94769830] = 8
nul chk table [0x00007f0a94769830,0x00007f0a94769848] = 24
Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled
Lee Wei Hern Jason
11/20/2022, 9:14 AMselect AVG(value) as value from table where cityID = 6 AND vehicleID IN (302) and eventTime > cast(now()-86400000 as timestamp) group by geohash, eventTime limit 10000000
"starTreeIndexConfigs": [
{
"dimensionsSplitOrder": [
"cityID",
"vehicleID",
"eventTime",
"geohash"
],
"skipStarNodeCreationForDimensions": [],
"functionColumnPairs": [
"AVG__value"
],
"maxLeafRecords": 1000
}
],
"enableDynamicStarTreeCreation": true,
kurt
11/21/2022, 3:51 PMapachepinot/pinot-presto
? I’m currently using tag pinot-0.11.0-preview
. I don’t see a non-preview 0.11.0 version. Almost all the recent image tags are snapshot tags.kurt
11/21/2022, 8:13 PMkurt
11/21/2022, 11:08 PM"pinot.multistage.engine.enabled": "true",
"pinot.server.instance.currentDataTableVersion": "4",
"pinot.query.server.port": "8421",
"pinot.query.runner.port": "8442"
The Helm chart offers “controller.extra.configs”, “broker.extra.configs”, “server.extra.configs”, “minion.extra.configs”, “minionStateless.extra.configs”. Do I add the four specified config settings to one of these or to something else?
For reference:
Multi-Stage-Query-Engine Docs: https://docs.pinot.apache.org/developers/advanced/v2-multi-stage-query-engine#troubleshoot
Helm Chart Values: https://github.com/apache/pinot/blob/master/kubernetes/helm/pinot/values.yamlArthur Zhou
11/22/2022, 1:27 AMbaseballStats
in query console(as screenshot shows). However, when I follow this: https://github.com/startreedata/pinot-client-go and run ./batch-quickstart. I got the error:
$ ./batch-quickstart
2022/11/21 172407 Failed to connect to [:1]2123: dial tcp [:1]2123: connect: connection refused
2022/11/21 172407 Failed to connect to 127.0.0.12123 dial tcp 127.0.0.12123 connect: connection refused
ERRO[0000] Failed to set a watcher on ExternalView path: localhost:2123/QuickStartCluster/EXTERNALVIEW/brokerResource, Error: zk: could not connect to a server
INFO[0000] Querying SQL
INFO[0000] Trying to query Pinot: select * from baseballStats limit 10
ERRO[0000] Unable to find an available broker for table baseballStats, Error: Unable to find the table: baseballStats
ERRO[0000] Unable to find the table: baseballStats
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0xa0 pc=0x1022e0200]
goroutine 1 [running]:
main.printBrokerResp(0x0)
/Users/xiaodong.zhou/Desktop/pinot_workspace/pinot-client-go/examples/batch-quickstart/main.go:38 +0x30
main.main()
/Users/xiaodong.zhou/Desktop/pinot_workspace/pinot-client-go/examples/batch-quickstart/main.go:33 +0x21cAnyone knows why I can’t get the table baseballStats from pinot client go library? Thanks.
Loïc Mathieu
11/22/2022, 3:28 PM{
"name": "__metadata$recordTimestamp",
"dataType": "STRING"
}
However, all fields have the same value -9223372036854775808
wich is not correct.
Any ideas ?eywek
11/22/2022, 5:00 PMSELECT * FROM worker_datasource_637cf8beaaee000100312f92_637cf8beaaee631c90312f91_1
WHERE (("reference" = '4') OR ("reference" = '3') OR ("reference" = '1') OR ("reference" = '2'))
LIMIT 0,20
I would like to know if this is possible for Pinot to return results ordered based on filters order?
Here I would like to have the row with reference=4 first, reference=3 second…
Currently it sort rows based on the $docId
Thank youShubham Kumar
11/23/2022, 5:40 AM{
"schemaName": "lineitem_spark92",
"dimensionFieldSpecs": [
{
"name": "l_orderkey",
"dataType": "INT",
"defaultNullValue": 0
},
{
"name": "l_partkey",
"dataType": "INT",
"defaultNullValue": 0
},
{
"name": "l_suppkey",
"dataType": "INT",
"defaultNullValue": 0
},
{
"name": "l_linenumber",
"dataType": "INT",
"defaultNullValue": 0
},
{
"name": "l_returnflag",
"dataType": "STRING",
"defaultNullValue": "null"
},
{
"name": "l_linestatus",
"dataType": "STRING",
"defaultNullValue": "null"
},
{
"name": "l_shipdate",
"dataType": "STRING",
"defaultNullValue": "null"
},
{
"name": "l_commitdate",
"dataType": "STRING",
"defaultNullValue": "null"
},
{
"name": "l_receiptdate",
"dataType": "STRING",
"defaultNullValue": "null"
},
{
"name": "l_shipinstruct",
"dataType": "STRING",
"defaultNullValue": "null"
},
{
"name": "l_shipmode",
"dataType": "STRING",
"defaultNullValue": "null"
},
{
"name": "l_comment1",
"dataType": "STRING",
"defaultNullValue": "null"
}
],
"metricFieldSpecs": [
{
"name": "l_quantity",
"dataType": "LONG",
"defaultNullValue": 0
},
{
"name": "l_extendedprice",
"dataType": "DOUBLE",
"defaultNullValue": 0
},
{
"name": "l_discount",
"dataType": "DOUBLE",
"defaultNullValue": 0
},
{
"name": "l_tax",
"dataType": "DOUBLE",
"defaultNullValue": 0
}
]
}
Schema shown in pinot :
{
"schemaName": "lineitem_spark92",
"dimensionFieldSpecs": [
{
"name": "l_orderkey",
"dataType": "INT",
"defaultNullValue": 0
},
{
"name": "l_partkey",
"dataType": "INT",
"defaultNullValue": 0
},
{
"name": "l_suppkey",
"dataType": "INT",
"defaultNullValue": 0
},
{
"name": "l_linenumber",
"dataType": "INT",
"defaultNullValue": 0
},
{
"name": "l_returnflag",
"dataType": "STRING"
},
{
"name": "l_linestatus",
"dataType": "STRING"
},
{
"name": "l_shipdate",
"dataType": "STRING"
},
{
"name": "l_commitdate",
"dataType": "STRING"
},
{
"name": "l_receiptdate",
"dataType": "STRING"
},
{
"name": "l_shipinstruct",
"dataType": "STRING"
},
{
"name": "l_shipmode",
"dataType": "STRING"
},
{
"name": "l_comment1",
"dataType": "STRING"
}
],
"metricFieldSpecs": [
{
"name": "l_quantity",
"dataType": "LONG"
},
{
"name": "l_extendedprice",
"dataType": "DOUBLE"
},
{
"name": "l_discount",
"dataType": "DOUBLE"
},
{
"name": "l_tax",
"dataType": "DOUBLE"
}
]
}
defaultNullValues are getting omitted for most of the fields. Am I doing something incorrect here?
Also, my spark batch ingestion job is failing with :
Caused by: java.lang.NumberFormatException: For input string: "null"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
can someone please help with this?Mayank
Mayank
Ethan Huang
11/23/2022, 10:23 AMThomas Steinholz
11/23/2022, 7:05 PMNikhil
11/23/2022, 11:57 PMRetentionManger
where our segments are not being removed as expected - we are running pinot 0.11.0. I will share the table config and controller config in thread 🧵reallyonthemove tous
11/25/2022, 3:49 AM