Ehsan Irshad
12/06/2022, 7:15 AMAaron Weiss
12/06/2022, 7:06 PMThere are 3 invalid segment/s. This usually means that they were created with an older schema. Please reload the table in order to refresh these segments to the new schema.
Is there any way to determine which segments these are?Pyry Kovanen
12/06/2022, 9:21 PMdata3.csv
that matches to the glob in the Job Spec. Also, based on the exception it seems that the ADLS plugin is able to see the file there.
This is my job spec YAML:
executionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '<adl2://my-example-storage.blob.core.windows.net/my-beatiful-fs>'
includeFileNamePattern: 'glob:**/*.csv'
outputDirURI: '<adl2://my-example-storage.blob.core.windows.net/my-beatiful-fs/segments>'
overwriteOutput: true
cleanUpOutputDir: true
pinotFSSpecs:
- scheme: adl2
className: org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS
configs:
accountName: 'my-example-storage'
accessKey: 'xxxx'
fileSystemName: 'my-beatiful-fs'
enableChecksum: true
recordReaderSpec:
dataFormat: 'csv'
className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
tableName: 'foo_data'
pinotClusterSpecs:
- controllerURI: '<http://localhost:9000>'
And this is the stack trace from the exception I get:
ADLSGen2PinotFS is initialized (accountName=my-example-storage, fileSystemName=my-beatiful-fs, dfsServiceEndpointUrl=<https://my-example-storage.dfs.core.windows.net>, blobServiceEndpointUrl=<https://my-example-storage.blob.core.windows.net>, enableChecksum=true)
Creating an executor service with 1 threads(Job parallelism: 0, available cores: 1.)
Got exception to kick off standalone data ingestion job -
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:152) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:121) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:130) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.Command.call(Command.java:33) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.Command.call(Command.java:29) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine.executeUserObject(CommandLine.java:1953) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine.access$1300(CommandLine.java:145) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine.execute(CommandLine.java:2078) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:165) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:196) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
Caused by: java.io.FileNotFoundException: /tmp/pinot-69425772-9d5a-4b3c-b9bc-1d812becb5b3/input/09e20700-f285-44a1-81f9-9914aa28e6ac/data3.csv (No such file or directory)
at java.io.FileOutputStream.open0(Native Method) ~[?:?]
at java.io.FileOutputStream.open(FileOutputStream.java:298) ~[?:?]
at java.io.FileOutputStream.<init>(FileOutputStream.java:237) ~[?:?]
at java.io.FileOutputStream.<init>(FileOutputStream.java:187) ~[?:?]
at org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS.copyToLocalFile(ADLSGen2PinotFS.java:451) ~[pinot-adls-0.11.0-shaded.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.submitSegmentGenTask(SegmentGenerationJobRunner.java:258) ~[pinot-batch-ingestion-standalone-0.11.0-shaded.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:224) ~[pinot-batch-ingestion-standalone-0.11.0-shaded.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:150) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
... 13 more
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:152)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:121)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:130)
at org.apache.pinot.tools.Command.call(Command.java:33)
at org.apache.pinot.tools.Command.call(Command.java:29)
at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
at picocli.CommandLine.execute(CommandLine.java:2078)
at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:165)
at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:196)
Caused by: java.io.FileNotFoundException: /tmp/pinot-69425772-9d5a-4b3c-b9bc-1d812becb5b3/input/09e20700-f285-44a1-81f9-9914aa28e6ac/data3.csv (No such file or directory)
at java.base/java.io.FileOutputStream.open0(Native Method)
at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:187)
at org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS.copyToLocalFile(ADLSGen2PinotFS.java:451)
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.submitSegmentGenTask(SegmentGenerationJobRunner.java:258)
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:224)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:150)
... 13 more
Does anyone have any idea on what might be wrong here and how to move forward?
Thanks already in advance!Caleb Shei
12/06/2022, 10:15 PMCaused by: java.lang.ClassNotFoundException: org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
at org.apache.pinot.spi.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:104)
at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:354)
at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:325)
at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:306)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:143)
Gaurav Pant
12/07/2022, 3:58 PMShubham Kumar
12/07/2022, 4:49 PMextraEnv:
- name: AWS_ACCESS_KEY_ID
value: XAWS_ACCESS_KEY_ID
- name: AWS_SECRET_ACCESS_KEY
value: XAWS_SECRET_ACCESS_KEY
- name: LOG4J_CONSOLE_LEVEL
value: all
extra:
configs: |-
pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.storage.factory.s3.region=ap-south-1
pinot.controller.segment.fetcher.protocols=file,http,s3
pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
controller.data.dir=<s3://test-data/pinot-data/pinot-default/controller-data/>
controller.local.temp.dir=/tmp/pinot-tmp-data/
Controller pod creation is failing with :
Failed to start a Pinot [CONTROLLER] at 5.321 since launch
java.lang.NullPointerException: null
at org.apache.pinot.common.utils.helix.HelixHelper.updateHostnamePort(HelixHelper.java:630) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.controller.BaseControllerStarter.updateInstanceConfigIfNeeded(BaseControllerStarter.java:623) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.controller.BaseControllerStarter.registerAndConnectAsHelixParticipant(BaseControllerStarter.java:599) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.controller.BaseControllerStarter.setUpPinotController(BaseControllerStarter.java:392) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.controller.BaseControllerStarter.start(BaseControllerStarter.java:322) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:118) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:87) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:251) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:304) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:250) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:196) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:187) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.Command.call(Command.java:33) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.Command.call(Command.java:29) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine.executeUserObject(CommandLine.java:1953) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine.access$1300(CommandLine.java:145) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at picocli.CommandLine.execute(CommandLine.java:2078) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:165) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:196) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
Shutting down Pinot Service Manager with all running Pinot instances...
Can you please help with what I am missing here?Leon Liu
12/07/2022, 8:12 PMGerrit van Doorn
12/07/2022, 9:48 PMAbhishek Dubey
12/08/2022, 11:27 AMAbhishek Dubey
12/08/2022, 11:28 AMGaurav Pant
12/08/2022, 5:06 PMNeeraja Sridharan
12/08/2022, 9:20 PMoffline
pinot tables, is it sufficient that the partition implementation logic (murmur) is the same on source & pinot or should the number of partitions also match?Pratik Bhadane
12/09/2022, 5:27 AMharnoor
12/09/2022, 7:10 PMnumSegmentsPrunedInvalid
during query execution is:
private static boolean isInvalidSegment(IndexSegment segment, QueryContext query) {
return !segment.getColumnNames().containsAll(query.getColumns());
}
A Few days ago I added a new column to my Realtime table and reloaded all the segments. I am not even running queries on the new column but I can see my query returning an incorrect response as all the completed segments are getting pruned for the query.
I saw that numSegmentsPrunedInvalid
= numSegmentsQueried
- numConsumingSegmentsQueried
in the query result and hence looks like the query is working fine only for the consuming segments. Shouldn’t the query run fine unless the new column is being selected?
Pinot version: 0.11.0Tony Requist
12/09/2022, 11:41 PM"retentionTimeUnit": "DAYS",
"retentionTimeValue": "90",
had two segments that were much older, one 112 days old and one ~260 days old. We have 6 tables with varying retention and this is the only case where old segments were not properly deleted when they passed the retention time. Any ideas why this might happen?Jatin Kumar
12/10/2022, 10:57 AMIN_PROGRESS
forever and next time we trying the job it failed
1. whats the reason that segment is in IN_PROGRESS
state and not going to COMPLETED
state.
2. How can we solve this issue?
cc: @ElonLee Wei Hern Jason
12/12/2022, 4:34 AMShreeram Goyal
12/12/2022, 7:01 AM2022/12/12 12:16:11.538 INFO [LLCSegmentCompletionHandlers] [grizzly-http-server-6] Processing segmentConsumed:Offset: -1,Segment name: order_items__0__0__20221212T0636Z,Instance Id: Server_i43592-a14160_8098,Reason: timeLimit,NumRows: 49392,BuildTimeMillis: -1,WaitTimeMillis: -1,ExtraTimeSec: -1,SegmentLocation: null,MemoryUsedBytes: 18772108,SegmentSizeBytes: -1,StreamPartitionMsgOffset: 49392
2022/12/12 12:16:11.542 INFO [SegmentCompletionManager] [grizzly-http-server-6] Created FSM {order_items__0__0__20221212T0636Z,HOLDING,1670827571540,null,null,true,<http://localhost:9001>}
2022/12/12 12:16:11.542 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-6] Processing segmentConsumed(Server_i43592-a14160_8098, 49392)
2022/12/12 12:16:11.542 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-6] HOLDING:Picking winner time=2 size=1
2022/12/12 12:16:11.542 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-6] HOLDING:Committer notified winner instance=Server_i43592-a14160_8098 offset=49392
2022/12/12 12:16:11.542 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-6] HOLDING:COMMIT for instance=Server_i43592-a14160_8098 offset=49392 buldTimeSec=126
2022/12/12 12:16:11.542 INFO [LLCSegmentCompletionHandlers] [grizzly-http-server-6] Response to segmentConsumed for segment:order_items__0__0__20221212T0636Z is :{"offset":49392,"status":"COMMIT","isSplitCommitType":true,"controllerVipUrl":"<http://localhost:9001>","streamPartitionMsgOffset":"49392","buildTimeSec":126}
2022/12/12 12:16:11.543 INFO [ControllerResponseFilter] [grizzly-http-server-6] Handled request from 10.64.14.160 GET <http://i40790-a46135:9001/segmentConsumed?reason=timeLimit&streamPartitionMsgOffset=49392&instance=Server_i43592-a14160_8098&offset=-1&name=order_items__0__0__20221212T0636Z&rowCount=49392&memoryUsedBytes=18772108>, content-type null status code 200 OK
2022/12/12 12:16:12.979 INFO [LLCSegmentCompletionHandlers] [grizzly-http-server-1] Processing segmentCommitStart:Offset: -1,Segment name: order_items__0__0__20221212T0636Z,Instance Id: Server_i43592-a14160_8098,Reason: null,NumRows: 49392,BuildTimeMillis: 995,WaitTimeMillis: 0,ExtraTimeSec: -1,SegmentLocation: null,MemoryUsedBytes: 18772108,SegmentSizeBytes: 15179507,StreamPartitionMsgOffset: 49392
2022/12/12 12:16:12.981 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-1] Processing segmentCommitStart(Server_i43592-a14160_8098, 49392)
2022/12/12 12:16:12.981 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-1] COMMITTER_NOTIFIED:Uploading for instance=Server_i43592-a14160_8098 offset=49392
2022/12/12 12:16:12.981 INFO [LLCSegmentCompletionHandlers] [grizzly-http-server-1] Response to segmentCommitStart for segment:order_items__0__0__20221212T0636Z is:{"offset":-1,"status":"COMMIT_CONTINUE","isSplitCommitType":false,"streamPartitionMsgOffset":null,"buildTimeSec":-1}
2022/12/12 12:16:12.981 INFO [ControllerResponseFilter] [grizzly-http-server-1] Handled request from 10.64.14.160 GET <http://i40790-a46135:9001/segmentCommitStart?segmentSizeBytes=15179507&buildTimeMillis=995&streamPartitionMsgOffset=49392&instance=Server_i43592-a14160_8098&offset=-1&name=order_items__0__0__20221212T0636Z&rowCount=49392&memoryUsedBytes=18772108>, content-type null status code 200 OK
2022/12/12 12:16:13.015 INFO [LLCSegmentCompletionHandlers] [grizzly-http-server-4] Processing segmentCommitEndWithMetadata:Offset: -1,Segment name: order_items__0__0__20221212T0636Z,Instance Id: Server_i43592-a14160_8098,Reason: null,NumRows: 49392,BuildTimeMillis: 995,WaitTimeMillis: 0,ExtraTimeSec: -1,SegmentLocation: file:/data/pinot/controller/data/order_items/order_items__0__0__20221212T0636Z.tmp.d86245bc-1c04-4b81-99fc-9becd3bea891,MemoryUsedBytes: 18772108,SegmentSizeBytes: 15179507,StreamPartitionMsgOffset: 49392
2022/12/12 12:16:13.020 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-4] Processing segmentCommitEnd(Server_i43592-a14160_8098, 49392)
2022/12/12 12:16:13.020 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-4] Committing segment order_items__0__0__20221212T0636Z at offset 49392 winner Server_i43592-a14160_8098
2022/12/12 12:16:13.020 INFO [PinotLLCRealtimeSegmentManager] [grizzly-http-server-4] Committing segment file for segment: order_items__0__0__20221212T0636Z
2022/12/12 12:16:13.021 WARN [BasePinotFS] [grizzly-http-server-4] Source file:/data/pinot/controller/data/order_items/order_items__0__0__20221212T0636Z.tmp.d86245bc-1c04-4b81-99fc-9becd3bea891 does not exist
2022/12/12 12:16:13.021 ERROR [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-4] Caught exception while committing segment file for segment: order_items__0__0__20221212T0636Z
java.lang.IllegalStateException: Failed to move segment file for segment: order_items__0__0__20221212T0636Z from: file:/data/pinot/controller/data/order_items/order_items__0__0__20221212T0636Z.tmp.d86245bc-1c04-4b81-99fc-9becd3bea891 to: file:/data/pinot/controller/data/order_items/order_items__0__0__20221212T0636Z
at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:738) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.controller.helix.core.realtime.PinotLLCRealtimeSegmentManager.commitSegmentFile(PinotLLCRealtimeSegmentManager.java:486) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager$SegmentCompletionFSM.commitSegment(SegmentCompletionManager.java:1085) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager$SegmentCompletionFSM.segmentCommitEnd(SegmentCompletionManager.java:660) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager.segmentCommitEnd(SegmentCompletionManager.java:326) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.controller.api.resources.LLCSegmentCompletionHandlers.segmentCommitEndWithMetadata(LLCSegmentCompletionHandlers.java:430) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at jdk.internal.reflect.GeneratedMethodAccessor146.invoke(Unknown Source) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:356) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:200) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at java.lang.Thread.run(Thread.java:829) [?:?]
Shubham Kumar
12/12/2022, 1:03 PMSegmentCreationAndMetadataPush
jobType. But we are observing that spark job gets succeeded but segments created are in BAD
state.
This is the spec yaml file :
executionFrameworkSpec:
name: 'spark'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SegmentUriPushJobRunner'
segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SegmentMetadataPushJobRunner'
# Recommended to set jobType to SegmentCreationAndMetadataPush for production environment where Pinot Deep Store is configured
jobType: SegmentCreationAndMetadataPush
inputDirURI: '<s3://test-data/tpch-data/lineitem_dummy/parquet/>'
includeFileNamePattern: 'glob:**/*.parquet'
excludeFileNamePattern: 'glob:**/*.tmp'
outputDirURI: '<s3://test-data/pinot/segment_stage/>'
overwriteOutput: true
pinotFSSpecs:
- scheme: s3
className: org.apache.pinot.plugin.filesystem.S3PinotFS
configs:
region: ap-south-1
accessKey: *****************
secretKey: SdF************************HAd
recordReaderSpec:
dataFormat: 'parquet'
className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
tableSpec:
tableName: 'lineitem_spark_92'
schemaURI: '<https://pinot.np.tech.in/schemas/li_spark_append>'
tableConfigURI: '<https://pinot.np.tech.in/tables/li_spark_append>'
pinotClusterSpecs:
- controllerURI: '<https://pinot.np.tech.in/>'
pushJobSpec:
pushAttempts: 2
pushRetryIntervalMillis: 1000
In the minion logs I am able to see :
Copied segment: li_spark_append_OFFLINE_1993-09-17_1993-11-08_5 of table: li_spark_append_OFFLINE to final location: file:/var/pinot/controller/data,<s3://test-data/pinot-data/pinot-default/controller-data//li_spark_append/li_spark_append_OFFLINE_1993-09-17_1993-11-08_5>
but there is no data present at the deep store controller directory. However, segments get pushed to segment store provided in jobSpec file.
In the above minion log we are able to see copied to deep store path and controller mount path. But upon verifying, there isn’t any segment files there
extra controller configs :
configs: |-
pinot.set.instance.id.to.hostname=true
pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.storage.factory.s3.region=ap-south-1
pinot.controller.segment.fetcher.protocols=file,http,s3
pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
controller.data.dir=<s3://test-data/pinot-data/pinot-default/controller-data/>
controller.local.temp.dir=/tmp/pinot-tmp-data/
Jatin Kumar
12/12/2022, 7:56 PMIN_PROGRESS
state and it never completed, i am attaching one of lineage example in thread.
any pointer how to debug this further?
cc: @Xiang Fuvishal
12/13/2022, 5:43 AMVenkatesh Radhakrishnan
12/13/2022, 10:02 AMCarl
12/13/2022, 10:27 PMApoorv
12/14/2022, 12:59 PM"upsertConfig": {
"mode": "PARTIAL",
"partialUpsertStrategies": {},
"defaultPartialUpsertStrategy": "OVERWRITE",
"hashFunction": "NONE"
},
francoisa
12/14/2022, 3:49 PMPadma Malladi
12/14/2022, 7:14 PM#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f64cd30b3f6, pid=1, tid=5336
#
# JRE version: OpenJDK Runtime Environment 18.9 (11.0.14.1+1) (build 11.0.14.1+1)
# Java VM: OpenJDK 64-Bit Server VM 18.9 (11.0.14.1+1, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# J 19588 c2 java.nio.DirectByteBuffer.getInt(I)I java.base@11.0.14.1 (28 bytes) @ 0x00007f64cd30b3f6 [0x00007f64cd30b3a0+0x0000000000000056]
#
# Core dump will be written. Default location: /opt/pinot/core.1
#
# An error report file with more information is saved as:
# /opt/pinot/hs_err_pid1.log
Bobby Richard
12/14/2022, 8:39 PMShubham Kumar
12/15/2022, 7:03 AMOverwriteOutput: true
flag.
I have some fundamental doubts, suppose I have input s3 directory with few partitions lets say /1
and `/2`:
1. For both APPEND and REFRESH, I ingested historical data from the input path using spark job and then before next run I remove a file from a partition (from partition /1
) , In this case, after the next run I have observed
a. older segment files were not removed from staging directory or deep store, moreover segments corresponding to all the remaining files present in partition /1 got created again
b. resulted in duplication of segments
2. Everything was as expected for REFRESH mode when we added a file to partition, added a new partition or removed entire partition.
3. For APPEND mode, when we added a file to a partition, all the segments corresponding to that partition were refreshed. though, we were expecting only one new segment. but it seems segment refresh is working on partition level? Is there any documentation on how is it decided which files to pick for segment refresh/creation?
I have observed this behaviour for both APPEND and REFRESH mode . I doubt that this is the expected behaviour, can somebody please explain what is happening here. Is this use case supported by pinot?
We are using this table config :
executionFrameworkSpec:
name: 'spark'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentUriPushJobRunner'
segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentMetadataPushJobRunner'
# Recommended to set jobType to SegmentCreationAndMetadataPush for production environment where Pinot Deep Store is configured
jobType: SegmentCreationAndMetadataPush
inputDirURI: '<s3://test-data/tpch-data/lineitem_dummy/parquet/>'
includeFileNamePattern: 'glob:**/*.parquet'
excludeFileNamePattern: 'glob:**/*.tmp'
outputDirURI: '<s3://test-data/pinot/segment_stageII/li/append/>'
overwriteOutput: true
pinotFSSpecs:
- scheme: s3
className: org.apache.pinot.plugin.filesystem.S3PinotFS
configs:
region: ap-south-1
accessKey: *******
secretKey: ********************+5M+EX3
recordReaderSpec:
dataFormat: 'parquet'
className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
tableSpec:
tableName: 'li_spark_inc'
schemaURI: .............
tableConfigURI: .............
pinotClusterSpecs:
- controllerURI: '<https://pinot.np.tech.in/>'
pushJobSpec:
pushAttempts: 2
copyToDeepStoreForMetadataPush: true
pushRetryIntervalMillis: 1000
Mathieu Alexandre
12/15/2022, 2:32 PM0.10.0
can't pause stream ingestion like this.
On your opinion, what would be the best way to get the same result (stop kafka broker maybe) ?
i'd like to keep active reads on Pinot but stop modifications in order to copy datas without side effectsAaron Weiss
12/15/2022, 3:03 PMFailed to download segment immutable_events__1__1075__20221212T1720Z from deep store:
. However, as I said, these segments are for sure in the deep store.
2. Invalid segment(s) / older schema message_:_ This one isn't horrible because you can still query the table, but that specific segment is unavailable as I understand it. Based on the schema evolution doc, reloading all segments should fix this. But after running and completing a reload all segments, this error message persists. In addition, I have found no way in the Pinot UI or Swagger command to determine which segment(s) are impacted.