Ravikumar Maddi
04/23/2021, 4:32 PMtroywinter
04/26/2021, 3:01 PMCaught exception while transforming the record:
when ingesting realtime stream, it’s a java.lang.ClassCastException: null
exception, looks like it’s casting a null value to some type, is this a bug?Mohamed Sultan
04/27/2021, 11:03 AMVengatesh Babu
04/27/2021, 2:10 PM<https://stackoverflow.com/questions/65886253/pinot-nested-json-ingestion>
Even examples given for JSON data types in build also not working (githubEvents)
https://github.com/apache/incubator-pinot/tree/master/pinot-tools/src/main/resources/examples/batch/githubEvents
Schema file:
{
"metricFieldSpecs": [],
"dimensionFieldSpecs": [
{
"dataType": "STRING",
"name": "name"
},
{
"dataType": "LONG",
"name": "age"
},
{
"dataType": "STRING",
"name": "subjects_str"
},
{
"dataType": "STRING",
"name": "subjects_name",
"singleValueField": false
},
{
"dataType": "STRING",
"name": "subjects_grade",
"singleValueField": false
}
],
"dateTimeFieldSpecs": [],
"schemaName": "myTable"
}
Table Config:
{
"tableName": "myTable",
"tableType": "OFFLINE",
"segmentsConfig": {
"segmentPushType": "APPEND",
"segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
"schemaName": "myTable",
"replication": "1"
},
"tenants": {},
"tableIndexConfig": {
"loadMode": "MMAP",
"invertedIndexColumns": [],
"noDictionaryColumns": [
"subjects_str"
],
"jsonIndexColumns": [
"subjects_str"
]
},
"metadata": {
"customConfigs": {}
},
"ingestionConfig": {
"batchIngestionConfig": {
"segmentIngestionType": "APPEND",
"segmentIngestionFrequency": "DAILY",
"batchConfigMaps": [],
"segmentNameSpec": {},
"pushSpec": {}
},
"transformConfigs": [
{
"columnName": "subjects_str",
"transformFunction": "jsonFormat(subjects)"
},
{
"columnName": "subjects_name",
"transformFunction": "jsonPathArray(subjects, '$.[*].name')"
},
{
"columnName": "subjects_grade",
"transformFunction": "jsonPathArray(subjects, '$.[*].grade')"
}
]
}
}
Data.json
{"name":"Pete","age":24,"subjects":[{"name":"maths","grade":"A"},{"name":"maths","grade":"B--"}]}
{"name":"Pete1","age":23,"subjects":[{"name":"maths","grade":"A+"},{"name":"maths","grade":"B--"}]}
{"name":"Pete2","age":25,"subjects":[{"name":"maths","grade":"A++"},{"name":"maths","grade":"B--"}]}
{"name":"Pete3","age":26,"subjects":[{"name":"maths","grade":"A+++"},{"name":"maths","grade":"B--"}]}
please help me to rectify this issue.
Ingestion Job output: (No error)
bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile /home/sas/apache-pinot-incubating-0.7.1-bin/examples/batch/jsontype/ingestionJobSpec.yaml
SegmentGenerationJobSpec:
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
cleanUpOutputDir: false
excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
segmentMetadataPushJobRunnerClassName: null, segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
includeFileNamePattern: glob:**/*.json
inputDirURI: examples/batch/jsontype/rawdata
jobType: SegmentCreationAndTarPush
outputDirURI: examples/batch/jsontype/segments
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: '<http://localhost:9000>'}
pinotFSSpecs:
- {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
pushJobSpec: {pushAttempts: 2, pushParallelism: 1, pushRetryIntervalMillis: 1000,
segmentUriPrefix: null, segmentUriSuffix: null}
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.json.JSONRecordReader,
configClassName: null, configs: null, dataFormat: json}
segmentCreationJobParallelism: 0
segmentNameGeneratorSpec: null
tableSpec: {schemaURI: '<http://localhost:9000/tables/myTable/schema>', tableConfigURI: '<http://localhost:9000/tables/myTable>',
tableName: myTable}
tlsSpec: null
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Creating an executor service with 1 threads(Job parallelism: 0, available cores: 40.)
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Submitting one Segment Generation Task for file:/home/sas/apache-pinot-incubating-0.7.1-bin/examples/batch/jsontype/rawdata/test.json
Initialized FunctionRegistry with 119 functions: [fromepochminutesbucket, arrayunionint, codepoint, mod, sha256, year, yearofweek, upper, arraycontainsstring, arraydistinctstring, bytestohex, tojsonmapstr, trim, timezoneminute, sqrt, togeometry, normalize, fromepochdays, arraydistinctint, exp, jsonpathlong, yow, toepochhoursrounded, lower, toutf8, concat, ceil, todatetime, jsonpathstring, substr, dayofyear, contains, jsonpatharray, arrayindexofint, fromepochhoursbucket, arrayindexofstring, minus, arrayunionstring, toepochhours, toepochdaysrounded, millisecond, fromepochhours, arrayreversestring, dow, doy, min, toepochsecondsrounded, strpos, jsonpath, tosphericalgeography, fromepochsecondsbucket, max, reverse, hammingdistance, stpoint, abs, timezonehour, toepochseconds, arrayconcatint, quarter, md5, ln, toepochminutes, arraysortstring, replace, strrpos, jsonpathdouble, stastext, second, arraysortint, split, fromepochdaysbucket, lpad, day, toepochminutesrounded, fromdatetime, fromepochseconds, arrayconcatstring, base64encode, ltrim, arraysliceint, chr, sha, plus, base64decode, month, arraycontainsint, toepochminutesbucket, startswith, week, jsonformat, sha512, arrayslicestring, fromepochminutes, remove, dayofmonth, times, hour, rpad, arrayremovestring, now, divide, bigdecimaltobytes, floor, toepochsecondsbucket, toepochdaysbucket, hextobytes, rtrim, length, toepochhoursbucket, bytestobigdecimal, toepochdays, arrayreverseint, datetrunc, minute, round, dayofweek, arrayremoveint, weekofyear] in 942ms
Using class: org.apache.pinot.plugin.inputformat.json.JSONRecordReader to read segment, ignoring configured file format: AVRO
Finished building StatsCollector!
Collected stats for 4 documents
Using fixed length dictionary for column: subjects_grade, size: 20
Created dictionary for STRING column: subjects_grade with cardinality: 5, max length in bytes: 4, range: A to B--
Using fixed length dictionary for column: subjects_name, size: 5
Created dictionary for STRING column: subjects_name with cardinality: 1, max length in bytes: 5, range: maths to maths
Using fixed length dictionary for column: name, size: 20
Created dictionary for STRING column: name with cardinality: 4, max length in bytes: 5, range: Pete to Pete3
Created dictionary for LONG column: age with cardinality: 4, range: 23 to 26
Start building IndexCreator!
Finished records indexing in IndexCreator!
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Start pushing segments: []... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@4e31276e] for table myTable
Mayank
Jay Desai
04/27/2021, 9:37 PMapache:master
branch. I raised a PR a couple of days back and the “Checks” are not running for me. Do I need to enable some settings ?
PR Reference : https://github.com/apache/incubator-pinot/pull/6842Phúc Huỳnh
04/28/2021, 2:54 AMsegmentUriPush
and segmentMetadataPush
?Alon Burg
04/28/2021, 8:34 AMUnrecognized field at: whitelistDatasets
Opened an issue
Would love to submit a PR … not sure where to startMohamed Sultan
04/28/2021, 8:49 AMAlon Burg
04/28/2021, 12:30 PMdocker container exec -it pinot-quickstart bin/generator.sh complexWebsite
pinot just exits, without any error in the log. Trying to follow ThirdEye quickstartSyed Akram
04/29/2021, 8:50 AMPedro Silva
04/30/2021, 4:53 PMMayank
Pedro Silva
05/03/2021, 5:15 PMAkash
05/05/2021, 10:25 PMJay Desai
05/05/2021, 10:31 PMPedro Silva
05/06/2021, 2:42 PMcode: 500
error: "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory"
Does anyone know what this means?ayush sharma
05/06/2021, 5:07 PMhelm -n my-pinot-kube install pinot-zookeeper incubator/zookeeper --set replicaCount=1
and disabled the zookeeper in helm/pinot/values.yaml by modifying zookeeper.enabled to false.
But we see face an error from _helpers.tpl
indicating not able to fetch the zookeeper.url
and related to configurationOverride
.
Error: template: pinot/templates/server/statefulset.yml:63:27: executing "pinot/templates/server/statefulset.yml" at <include "zookeeper.url" .>: error calling include: template: pinot/templates/_helpers.tpl:79:33: executing "zookeeper.url" at <index .Values "configurationOverrides" "zookeeper.connect">: error calling index: index of nil pointer
Any help is appreciated !Arun Vasudevan
05/06/2021, 10:00 PMUpload the schema and Table Config
- https://docs.pinot.apache.org/basics/getting-started/pushing-your-streaming-data-to-pinot#uploading-your-schema-and-table-config
I am getting the following error…
Sending request: <http://pinot-quickstart:9000/schemas> to controller: ea8d7bfc16ea, version: Unknown
{"code":500,"error":"org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata"}
However I am able to login to my kafka docker and describe and that works fine…
bash-4.4# bin/kafka-topics.sh --bootstrap-server kafka:9092 --topic transcript-topic --describe
Topic: transcript-topic PartitionCount: 1 ReplicationFactor: 1 Configs: segment.bytes=1073741824
Topic: transcript-topic Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Any idea where i am missing?Ambika
05/07/2021, 2:42 PMPedro Silva
05/07/2021, 3:20 PM{
"columnName": "audioLength",
"transformFunction": "JSONPATH(result,'$.audioLength')"
}
But the result field, configured as:
{
"name": "result",
"dataType": "STRING"
}
Most of the time will have the following content:
{
"metadata": {
"isMatch": "Y"
}
}
But sometimes it can be something like:
{
"AudioCreated": "2021-05-06T23: 40: 28.6629486",
"AudioLength": "00: 04: 02.1800000",
"BlobPath": "068fd3f0-e5d6-499a-bfb0-94491499aba6/9db5efb9-4a72-44ae-a570-8647e1ac896a/33d3c59d-b8e1-4818-be60-124e637fb02b.wav",
"isValid": true,
"feedback": "",
"otherFeedback": "",
"result": 1,
"crowdMemberId": "90c97d94-91c3-4587-8c91-26f6e971d52c",
"tags": null,
"scriptToExecute": null
}
Aaron Wishnick
05/07/2021, 8:29 PMRK
05/09/2021, 4:18 PMPedro Silva
05/10/2021, 10:32 AM2021/05/10 10:29:48.876 ERROR [ServerSegmentCompletionProtocolHandler] [HitExecutionView__13__6__20210510T1029Z] Could not send request <http://pinot-controller-0.pinot-controller-headless.dc-pinot.svc.cluster.local:9000/segmentConsumed?name=HitExecutionView__13__6__20210510T1029Z&offset=952660&instance=Server_pinot-server-1.pinot-server-headless.dc-pinot.svc.cluster.local_8098&reason=rowLimit&memoryUsedBytes=7330344&rowCount=12500&streamPartitionMsgOffset=952660>
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_282]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_282]
at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_282]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_282]
at shaded.org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at org.apache.pinot.common.utils.FileUploadDownloadClient.sendRequest(FileUploadDownloadClient.java:383) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at org.apache.pinot.common.utils.FileUploadDownloadClient.sendSegmentCompletionProtocolRequest(FileUploadDownloadClient.java:675) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at org.apache.pinot.server.realtime.ServerSegmentCompletionProtocolHandler.sendRequest(ServerSegmentCompletionProtocolHandler.java:207) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at org.apache.pinot.server.realtime.ServerSegmentCompletionProtocolHandler.segmentConsumed(ServerSegmentCompletionProtocolHandler.java:174) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.postSegmentConsumedMsg(LLRealtimeSegmentDataManager.java:949) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:559) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
2021/05/10 10:29:48.877 ERROR [LLRealtimeSegmentDataManager_HitExecutionView__13__6__20210510T1029Z] [HitExecutionView__13__6__20210510T1029Z] Holding after response from Controller: {"streamPartitionMsgOffset":null,"buildTimeSec":-1,"isSplitCommitType":false,"status":"NOT_SENT","offset":-1}
This seems related, the server not being able to contact the controller?Tamás Nádudvari
05/10/2021, 8:21 PMRealtimeToOfflineSegmentsTask
for our hybrid table and we ran into a problem in our dev environment. We have time gaps in our data ingest and when it’s larger than the bucket time period, the minion task runs into an error of creating empty segment. After exception the minion fails to update the watermark, thus we’re ending up with a stuck task (trying to create an empty segment over an over again for this specific period). While it’s unlikely to run into this empty segment problem in production, we’re wondering what’s the recommended way to overcome this issue in a dev environment?Aaron Wishnick
05/10/2021, 9:26 PMAssembled and processed 7990100 records from 17 columns in 121766 ms: 65.618484 rec/ms, 1115.5142 cell/ms
time spent so far 0% reading (237 ms) and 99% processing (121766 ms)
at row 7990100. reading next block
block read in memory in 44 ms. row count = 1418384
Finished building StatsCollector!
Collected stats for 9408484 documents
Created dictionary for INT column: ...
...
RecordReader initialized will read a total of 9408484 records.
at row 0. reading next block
Got brand-new decompressor [.gz]
block read in memory in 133 ms. row count = 7990100
Start building IndexCreator!
Assembled and processed 7990100 records from 17 columns in 127060 ms: 62.884464 rec/ms, 1069.0359 cell/ms
time spent so far 0% reading (133 ms) and 99% processing (127060 ms)
at row 7990100. reading next block
block read in memory in 26 ms. row count = 1418384
Finished records indexing in IndexCreator!
Finished segment seal!
...
Generated 25884 star-tree records from 9408484 segment records
Finished constructing star-tree, got 1228 tree nodes and 2058 records under star-node
Finished creating aggregated documents, got 1227 aggregated records
Finished building star-tree in 276631ms
Starting building star-tree with config: StarTreeV2BuilderConfig[...]
Generated 9408484 star-tree records from 9408484 segment records
It's been stuck at that last line for a long time but still seems to be doing something. All the time measurements only add up to a few minutes. Any idea where it's spending time?Arun Vasudevan
05/10/2021, 11:26 PMAvro Schema:
{
"type": "record",
"name": "Clickstream",
"namespace": "com.acme.event.clickstream.business",
"fields": [
{
"name": "event_header",
"type": {
"type": "record",
"name": "EventHeader",
"namespace": "com.acme.event",
"fields": [
{
"name": "event_uuid",
"type": {
"type": "string",
"avro.java.string": "String",
"logicalType": "uuid"
},
"doc": "Universally Unique Identifier for this event "
},
{
"name": "published_timestamp",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
},
"doc": "Timestamp in milliseconds since the epoch that the event occurred on its producing device. e.g. <code>System.currentTimeMillis()</code>."
}]
}
}
}
The corresponding Pinot Schema i have is:
{
"schemaName": "user_clickstream_v1",
"dimensionFieldSpecs": [
{
"name": "event_header.event_uuid",
"dataType": "STRING"
}
],
"dateTimeFieldSpecs": [
{
"name": "event_header.published_timestamp",
"dataType": "LONG",
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
}
]
}
In the created Pinot Table i see all the values as null….I suspect the issue is in the Schema…
Any idea?Jonathan Meyer
05/11/2021, 3:14 PMHello
What is the recommended approach to getting the "last non-null value" ?
Use a UDF ?
Charles
05/12/2021, 6:20 AMGrpc port is not set for instance: Controller_10.252.125.84_9000
RK
05/12/2021, 7:55 AM