Hey team I created a table with this in it to atte...
# troubleshooting
n
Hey team I created a table with this in it to attempt to use the
minion
component to ingest data. When doing a POST at tasks/schedule, it looks like the minions are doing something (talks about using AVRO in logs) but they’ll either just hang, or error out. Any insights? I also made these changes: controller.task.scheduler.enabled=true minion config:
Copy code
pinot.set.instance.id.to.hostname=true
      <http://pinot.minion.storage.factory.class.gs|pinot.minion.storage.factory.class.gs>=org.apache.pinot.plugin.filesystem.GcsPinotFS
      pinot.minion.storage.factory.gs.projectId=REDACTED
      pinot.minion.storage.factory.gs.gcpKey=REDACTED
      pinot.minion.segment.fetcher.protocols=file,http,gs
      pinot.minion.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
      plugins.include=pinot-gcs
Added auth key to controller, server, and minion (auth worked before ssh’ing into server and running a job)
d
You can monitor CPU activity on the minion worker. Also,
pinotMinion.log
has more verbose logs. How big are the files you are ingesting?
n
The CPU goes up on the minions when I launch the jobs but no output after a few hours. Files range but the max is right under a gig. Each minion pod has 1 cpu and 5g, and java mem settings are -Xms1G -Xmx4G
I also tried to kill the task and kick off a new one but it doesn’t like that. Have a few logs just combing through them to see what’s valuable.
d
If you are running a pod, get in the minion pod and look at the logs of
pinotMinion.log
in the home dir
Why have you configured an outputDirURI on the batchconfig ?
n
To store the segments in deep storage?
d
The controller will do that on its own
n
ahh ok, was porting over what I had from the job
I’m looking in the logs right now one second please
d
you config is also missing
"includeFileNamePattern": "glob:**/*.gz",
n
I’m grabbing all files in that dir does that matter?
d
Try “includeFileNamePattern”: “glob:*.gz” then
How many files are there?
n
~150
d
Ok, I don’t see anything unreasonable with what you described.
n
let me add that, purge the log, and then kick it off and tail
d
Remote the outputdiruri too
Have you configured deepstore on the controller and server?
n
I have and they have been writing there fine. all are using same auth
d
Ok, just try without outputdiruri. When minion uploads a segment to pinot, it will endup in deepstore thanks to the controller
After that, all I have left for you to to make an healthcheck on all systems such as controler, servers and zookeeper. ensure their heap and offheap is fine.
Also, are you sure that no segments are uploaded?
n
There were some from a prior job but none had been added/modified. I just delete a couple segments to see if it’ll try
when I do the post this is my response: {“SegmentGenerationAndPushTask”:null} And the controller logs this: 2021/02/18 180726.436 WARN [ZKMetadataProvider] [grizzly-http-server-1] Path: /SEGMENTS/REDACTED_OFFLINE does not exist
I’m in one of the minions and no logs in pod or from kubectl logs yet other than startup.
not sure if I need to try another minion
If I do a GET on tasks/SegmentGenerationAndPushTask/state the response is IN_PROGRESS
none of the minion pods appear to have any spikes in cpu/mem utilization
d
Can you go in the zookeeper explorer and look for the status of the sub tasks status here :
Are you running the latest version of pinot? Make sure your pods are not running with
latest
and
IfNotPresent
as a pull-policy
n
correct I changed that last night
I have two tasks in there, one has many subtasks, status is TASK_ERROR. Also see this in PreviousResourceAssignment:
Copy code
"TaskQueue_SegmentGenerationAndPushTask_Task_SegmentGenerationAndPushTask_1613626284771_99": {
      "Minion_pinot-minion-6.pinot-minion-headless.default.svc.cluster.local_9514": "DROPPED"
    }
For the other task, this is the output of context:
Copy code
{
  "id": "WorkflowContext",
  "simpleFields": {
    "NAME": "TaskQueue_SegmentGenerationAndPushTask",
    "START_TIME": "1613626265328",
    "STATE": "IN_PROGRESS"
  },
  "mapFields": {
    "JOB_STATES": {
      "TaskQueue_SegmentGenerationAndPushTask_Task_SegmentGenerationAndPushTask_1613626284771": "COMPLETED"
    },
    "StartTime": {
      "TaskQueue_SegmentGenerationAndPushTask_Task_SegmentGenerationAndPushTask_1613626284771": "1613626302656"
    }
  },
  "listFields": {}
}
Thanks again for your help 🙂
d
No errors on controller and server?
n
Untitled.txt
server has no logs since last restart, controller just spit this out
d
what about
pinotController.log
?
n
This is the only error, and corresponds to when I’m doing things in the controller UI’s zookeeper page:
2021/02/18 18:15:09.643 ERROR [ZkBaseDataAccessor] [grizzly-http-server-1] paths is null or empty
Rest of the logs corresponding to tasks
If there are syntax errors etc that’s because I just edited for readability
t
I’m probably way off here, but I noticed that the minion segment configurations aren’t prefixed with
pinot.minion
. Minion: https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java#L352-L354 Controller: https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java#L323-L324 Ended up with a configuration like this that works with a `RealtimeToOfflineSegmentsTask`:
Copy code
pinot.minion.port=9514
    storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
    storage.factory.s3.region=my-region
    segment.fetcher.protocols=file,http,s3
    segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
But I had a quite clear error message something about that the minion doesn’t have a class factory for
s3
scheme.
n
That’s a great catch, I just assumed that’s what it was. Let me change that and give it a go!