Chetan Anand
04/06/2022, 8:25 AMoutputDirURI
in the ingestionJobSpec
can be same as the controller.data.dir
? We are assuming yes. Config files, table config and job spec attached.André Siefken
04/06/2022, 8:58 AMSegmentGenerationAndPushTask
via the Minions.
All jobs keep failing with
Can you help me find a means to trace and debug the issue? AFAIK it is not easy to monitor Minion tasks, but do you know of any hints towards where it may fail - at the configs level and/or parsing the files (although each individually seem to be valid)?"INFO": "java.lang.RuntimeException: Failed to execute SegmentGenerationAndPushTask"
Aparna Razdan
04/06/2022, 11:28 AM{
"name": "sql_date_entered",
"dataType": "INT",
"format": "1:DAYS:EPOCH",
"granularity": "1:DAYS"
},
{
"name": "sql_date_entered_str",
"dataType": "STRING",
"format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
"granularity": "1:DAYS"
}
Other way to handle is using query transformations:
select sql_date_entered ,
DATETIMECONVERT(dateTrunc('week' , sql_date_entered, 'DAYS'), '1:DAYS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd', '1:DAYS') as week
from
kepler_product_pipegen
Is there any way that I can load the date in ( ‘YYYY-mm-dd’) format and still run the transformation like dateTrunc on the top of it ?
Pinot version = 0.7.1Ali Atıl
04/06/2022, 1:41 PMselect fieldA, fieldB, fieldC, messageTime from mytable where messageTime >= 1648813002065 and messageTime <= 1649245002065 order by messageTime limit 3000000, 1000000 option(timeoutMs=120000)
Grace Lu
04/06/2022, 7:34 PMLuis Fernandez
04/06/2022, 9:03 PMSELECT product_id, COUNT(*) as views
FROM table
ORDER BY views, product_id DESC
in regular RDBMS I would expect for it to order by the views first and then product id so something like
views, product_id
3 3
3 2
3 1
but pinot does something completely different is doing
views, product_id
1 10
1 9
1 8
anyone has an idea why this may be?Ali Atıl
04/07/2022, 6:39 AM[Consumer clientId=consumer-null-808, groupId=null] Seeking to offset 59190962 for partition mytopic-0
Ali Atıl
04/07/2022, 8:45 AMFizza Abid
04/07/2022, 12:07 PMLuis Fernandez
04/07/2022, 8:25 PMDiogo Baeder
04/07/2022, 9:13 PMAlice
04/08/2022, 1:43 AMMayank
francoisa
04/11/2022, 8:20 AMjava.lang.RuntimeException: Exception getting FSM for segment candidates__0__0__20220408T1410Z
at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager.lookupOrCreateFsm(SegmentCompletionManager.java:175) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager.segmentConsumed(SegmentCompletionManager.java:202) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.controller.api.resources.LLCSegmentCompletionHandlers.segmentConsumed(LLCSegmentCompletionHandlers.java:144) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at jdk.internal.reflect.GeneratedMethodAccessor128.invoke(Unknown Source) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:253) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:232) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:679) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:353) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:200) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at java.lang.Thread.run(Thread.java:829) [?:?]
My controller look for a non existing segment 😕 Is there an API way to tells him it does not exist anymore ? 😄 table still acessible and everything is OK but found it weirdAlice
04/11/2022, 11:42 AMEduardo Cusa
04/11/2022, 12:59 PMCaught exception in state transition from OFFLINE -> ONLINE for resource: adv1_OFFLINE, partition: adv1_OFFLINE_2022-03-01_2022-03-01_0
.
This could be related to the data itself? or something like OOMs/resources like Mayank mentioned in the thread ? Any suggestion on how to debug it?
ThanksShailesh Jha
04/11/2022, 1:05 PMerik bergsten
04/11/2022, 2:19 PM{
"tableName": "environment",
"tableType": "OFFLINE",
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"segmentsConfig": {
"schemaName": "environment",
"timeColumnName": "ts",
"replication": "1",
"replicasPerPartition": "1",
"retentionTimeUnit": null,
"retentionTimeValue": null,
"segmentPushType": "APPEND",
"segmentPushFrequency": "DAILY",
"crypterClassName": null,
"peerSegmentDownloadScheme": null
},
"tableIndexConfig": {
"loadMode": "MMAP",
"invertedIndexColumns": [],
"createInvertedIndexDuringSegmentGeneration": false,
"rangeIndexColumns": [],
"sortedColumn": [],
"bloomFilterColumns": [],
"bloomFilterConfigs": null,
"noDictionaryColumns": [],
"onHeapDictionaryColumns": [],
"varLengthDictionaryColumns": [],
"enableDefaultStarTree": false,
"starTreeIndexConfigs": null,
"enableDynamicStarTreeCreation": false,
"segmentPartitionConfig": null,
"columnMinMaxValueGeneratorMode": null,
"nullHandlingEnabled": false
},
"metadata": {},
"ingestionConfig": {
"filterConfig": null,
"transformConfigs": [
{
"columnName": "ts",
"transformFunction": "FromDateTime(\"DepartureDate\", 'yyyy-MM-dd''T''HH:mm:ss.SSSZ')"
}
]
},
"quota": {
"storage": null,
"maxQueriesPerSecond": null
},
"task": null,
"routing": {
"segmentPrunerTypes": null,
"instanceSelectorType": null
},
"instanceAssignmentConfigMap": null,
"query": {
"timeoutMs": null
},
"fieldConfigList": null,
"upsertConfig": null,
"tierConfigs": [
{
"name": "tierA",
"segmentSelectorType": "time",
"segmentAge": "5m",
"storageType": "pinot_server",
"serverTag": "DefaultTenant_a_OFFLINE"
}
]
}
We expected to see segments moved from DefaultTenant_OFFLINE to the other server 5 mins after ingestion (using batch ingestion) but nothing seems to happen. Is there anything obviously wrong in the config?
How should we pursue solving this problem? We cannot find any errors / interesting messages in any log.Luis Fernandez
04/11/2022, 7:19 PMhealth
endpoint what do you check for livenessProbe there?Grace Lu
04/11/2022, 7:37 PMyelim yu
04/12/2022, 1:26 AMSumit Lakra
04/12/2022, 11:29 AMfrancoisa
04/12/2022, 12:08 PMSaumya Upadhyay
04/13/2022, 4:24 AMKevin Liu
04/13/2022, 5:55 AMerik bergsten
04/13/2022, 12:47 PMCaused by: java.nio.file.FileSystemException: /var/pinot/server/data/index/environment_OFFLINE/environment_OFFLINE_1618208070664_1649743939567_7/v3/.nfs000000000134004000000058: Device or resource busy
in the logs from server-b. Could this be a problem with how the server is implemented or is it strictly an NFS problem on our end? The end result is that some or all segments go into an error state and the data goes missing during a rebalance.Nikhil
04/14/2022, 8:34 PMsudo spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master local --deploy-mode client --conf spark.local.dir=/mnt --conf "spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/conf/pinot-ingestion-job-log4j2.xml" --conf "spark.driver.extraClassPath=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar" /mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar -jobSpecFile /mnt/pinot/spark_job_spec_v8.yaml
the ingestion spec used is this:
executionFrameworkSpec:
name: 'spark'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner'
extraConfigs:
stagingDir: <s3://nikhil-dw-dev/pinot/staging/>
dependencyJarDir: '<s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins>'
jobType: SegmentCreation
inputDirURI: '<s3://nikhil-dw-dev/pinot/pinot_input/>'
includeFileNamePattern: 'glob:**/*.parquet'
outputDirURI: '<s3://nikhil-dw-dev/pinot/pinot_output3/>'
overwriteOutput: true
pinotFSSpecs:
-
className: org.apache.pinot.plugin.filesystem.S3PinotFS
scheme: s3
configs:
region: us-east-1
recordReaderSpec:
dataFormat: 'parquet'
className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
tableSpec:
tableName: 'students'
schemaURI: '<s3://nikhil-dw-dev/pinot/students_schema.json>'
tableConfigURI: '<s3://nikhil-dw-dev/pinot/students_table.json>'
But when running this on cluster mode, I get the class not found issue. The plugins.dir is available on all the EMR nodes, and we can see that the plugins are getting successfully loaded., I have tried passing the the s3 location as well as the /mnt path, and both are failing with the same error. I looked at these two previous posts [1] and [2] and they did not help in resolving it.
Here is the error
22/04/14 07:06:44 INFO PluginManager: Plugins root dir is [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins]
22/04/14 07:06:44 INFO PluginManager: Trying to load plugins: [[pinot-s3, pinot-parquet]]
22/04/14 07:06:44 INFO PluginManager: Trying to load plugin [pinot-s3] from location [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3]
22/04/14 07:06:44 INFO PluginManager: Successfully loaded plugin [pinot-s3] from jar file [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar]
22/04/14 07:06:44 INFO PluginManager: Successfully Loaded plugin [pinot-s3] from dir [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3]
22/04/14 07:06:44 INFO PluginManager: Trying to load plugin [pinot-parquet] from location [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet]
22/04/14 07:06:44 INFO PluginManager: Successfully loaded plugin [pinot-parquet] from jar file [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar]
22/04/14 07:06:44 INFO PluginManager: Successfully Loaded plugin [pinot-parquet] from dir [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet]
22/04/14 07:06:45 ERROR LaunchDataIngestionJobCommand: Got exception to generate IngestionJobSpec for data ingestion job -
Can't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
in 'string', line 1, column 1:
executionFrameworkSpec:
^
Will thread the different commands used to submit this job.
Thank you for your help 🙇Alice
04/15/2022, 12:31 PMDiogo Baeder
04/15/2022, 9:51 PMDiogo Baeder
04/16/2022, 10:12 PM