Nick Bowles
02/18/2021, 5:50 PMminion
component to ingest data. When doing a POST at tasks/schedule, it looks like the minions are doing something (talks about using AVRO in logs) but they’ll either just hang, or error out. Any insights?
I also made these changes:
controller.task.scheduler.enabled=true
minion config:
pinot.set.instance.id.to.hostname=true
<http://pinot.minion.storage.factory.class.gs|pinot.minion.storage.factory.class.gs>=org.apache.pinot.plugin.filesystem.GcsPinotFS
pinot.minion.storage.factory.gs.projectId=REDACTED
pinot.minion.storage.factory.gs.gcpKey=REDACTED
pinot.minion.segment.fetcher.protocols=file,http,gs
pinot.minion.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
plugins.include=pinot-gcs
Added auth key to controller, server, and minion (auth worked before ssh’ing into server and running a job)Daniel Lavoie
02/18/2021, 5:52 PMpinotMinion.log
has more verbose logs.
How big are the files you are ingesting?Nick Bowles
02/18/2021, 5:55 PMDaniel Lavoie
02/18/2021, 5:57 PMpinotMinion.log
in the home dirNick Bowles
02/18/2021, 5:59 PMDaniel Lavoie
02/18/2021, 5:59 PMNick Bowles
02/18/2021, 6:00 PMDaniel Lavoie
02/18/2021, 6:00 PM"includeFileNamePattern": "glob:**/*.gz",
Nick Bowles
02/18/2021, 6:00 PMDaniel Lavoie
02/18/2021, 6:01 PMNick Bowles
02/18/2021, 6:01 PMDaniel Lavoie
02/18/2021, 6:02 PMNick Bowles
02/18/2021, 6:02 PMDaniel Lavoie
02/18/2021, 6:02 PMNick Bowles
02/18/2021, 6:03 PMDaniel Lavoie
02/18/2021, 6:03 PMNick Bowles
02/18/2021, 6:10 PMDaniel Lavoie
02/18/2021, 6:14 PMlatest
and IfNotPresent
as a pull-policyNick Bowles
02/18/2021, 6:15 PM"TaskQueue_SegmentGenerationAndPushTask_Task_SegmentGenerationAndPushTask_1613626284771_99": {
"Minion_pinot-minion-6.pinot-minion-headless.default.svc.cluster.local_9514": "DROPPED"
}
For the other task, this is the output of context:
{
"id": "WorkflowContext",
"simpleFields": {
"NAME": "TaskQueue_SegmentGenerationAndPushTask",
"START_TIME": "1613626265328",
"STATE": "IN_PROGRESS"
},
"mapFields": {
"JOB_STATES": {
"TaskQueue_SegmentGenerationAndPushTask_Task_SegmentGenerationAndPushTask_1613626284771": "COMPLETED"
},
"StartTime": {
"TaskQueue_SegmentGenerationAndPushTask_Task_SegmentGenerationAndPushTask_1613626284771": "1613626302656"
}
},
"listFields": {}
}
Daniel Lavoie
02/18/2021, 6:18 PMNick Bowles
02/18/2021, 6:20 PMDaniel Lavoie
02/18/2021, 6:22 PMpinotController.log
?Nick Bowles
02/18/2021, 6:41 PM2021/02/18 18:15:09.643 ERROR [ZkBaseDataAccessor] [grizzly-http-server-1] paths is null or empty
Tamás Nádudvari
02/18/2021, 9:20 PMpinot.minion
.
Minion: https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java#L352-L354
Controller: https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java#L323-L324
Ended up with a configuration like this that works with a `RealtimeToOfflineSegmentsTask`:
pinot.minion.port=9514
storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
storage.factory.s3.region=my-region
segment.fetcher.protocols=file,http,s3
segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
But I had a quite clear error message something about that the minion doesn’t have a class factory for s3
scheme.Nick Bowles
02/18/2021, 9:24 PM