Hi Team, I have included this in the OFFLINE table...
# troubleshooting
n
Hi Team, I have included this in the OFFLINE table config, and wanted to test how tasks work. Though the table and schema are getting created, I’m not seeing any tasks in Minion Task Manager.
Copy code
"ingestionConfig": {
  "batchIngestionConfig": {
    "segmentIngestionType": "APPEND",
    "segmentIngestionFrequency": "HOURLY",
    "batchConfigMaps": [
      {
        "input.fs.className": "org.apache.pinot.plugin.filesystem.S3PinotFS",
        "input.fs.prop.region": "us-east-1",
        "inputDirURI": "<s3://masked-bucket/dataset-sample/year=2022/month=10/day=10/>",
        "includeFileNamePattern": "glob:**/*.parquet",
        "excludeFileNamePattern": "glob:**/*.tmp",
        "inputFormat": "parquet"
      }
    ]
  }
},
"tasks": {
  "taskTypeConfigsMap": {
    "SegmentGenerationAndPushTask": {
      "schedule": "0 * * * * ?"
    }
  }
},
This is what is passed to controller through values.yaml, where the task scheduler is enabled. Can you please help me how are the tasks enabled?
Copy code
extra:
  configs: |-
    pinot.set.instance.id.to.hostname=true
    controller.task.scheduler.enabled=true
m
Any logs in the minion instances?
n
Copy code
Updating instance: Minion_pinot-minion-stateless-647f9b6557-lgtvz_9514 with hostname: pinot-minion-stateless-647f9b6557-lgtvz
No Messages to process
76 END:INVOKE CallbackHandler 0, /pinot-quickstart/INSTANCES/Minion_pinot-minion-stateless-647f9b6557-lgtvz_9514/MESSAGES listener: org.apache.helix.messaging.handling.HelixTaskExecutor@663f1ebc type: CALLBACK Took: 15ms
Updating instance: Minion_pinot-minion-stateless-647f9b6557-lgtvz_9514 with default tags: []
fallbackRoot: /pinot-quickstart/HELIX_PROPERTYSTORE doesn't exist, skip creating fallback property store
Starting minion admin application on: <http://0.0.0.0:9514>
Sep 28, 2022 4:45:51 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:9514]
Sep 28, 2022 4:45:51 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
Reflections took 4265 ms to scan 1 urls, producing 74333 keys and 147122 values 
Initializing health check callback
Pinot minion started
Pinot Minion instance [Minion_pinot-minion-stateless-647f9b6557-lgtvz_9514] is Started...
Started Pinot [MINION] instance [Minion_pinot-minion-stateless-647f9b6557-lgtvz_9514] at 12.728s since launch
No update since sept 28th.
m
Hmm, @Haitao Zhang can you take a look?
thankyou 1
h
(1) Did you see the task type listed under the minion task manager? (2) Did you see any task generation errors? (grep -E "PinotTask|Quartz" should list something)
n
1. The task was not showing up in Minion Task Manager. 2. Where should I check? In server?
h
for 2, in controller (I thought you are using quick start and all logs are mixed together?)
n
No logs in controller as well 😞
I’m still using quickstart, however I modified few values to install within EKS
h
can you go to
/home/pinot/data/logs/pinotController/
folder to find pinot controller full logs?
there are many logs, so you cannot get all of them using kubectl logs
n
Also, when I logged into the pod and manually checked conf/pinot-controller.conf. I couldn’t see this property.
Copy code
controller.task.scheduler.enabled=true
I don’t see this value getting reflected in any of the controller templates. Do you think that could be the reason?
Never mind, the property is enabled in the config map. I tried looking at the logs within pinot-controller pod, but couldn’t find anything with the corresponding task name. However there are some logs related to other tables
h
Did you get anything by
grep -E "PinotTask|Quartz"
? We should see something like "Trying to schedule task type: <different task type>"
n
Copy code
2022/10/11 01:23:05.597 INFO [PinotTaskManager] [ZkClient-EventThread-121-pinot-zookeeper:2181] Cleaning up task in scheduler for table silvermineBIServiceLogs_OFFLINE
2022/10/11 01:23:05.606 INFO [PinotTaskManager] [ZkClient-EventThread-121-pinot-zookeeper:2181] Checking task config changes in table configs
2022/10/11 01:23:30.308 INFO [PinotTaskManager] [ZkClient-EventThread-121-pinot-zookeeper:2181] Checking task config changes in table configs
2022/10/11 01:23:30.309 INFO [PinotTaskManager] [ZkClient-EventThread-121-pinot-zookeeper:2181] Trying to update task schedule for table: silvermineBIServiceLogs_OFFLINE
2022/10/11 01:23:30.311 INFO [PinotTaskManager] [ZkClient-EventThread-121-pinot-zookeeper:2181] taskConfig is null, trying to remove all the tasks for table silvermineBIServiceLogs_OFFLINE if any
This is what I got in the log file within controller
h
I guess the SegmentGenerationAndPushTask is configured for table
silvermineBIServiceLogs
?
n
Yes
h
taskConfig is null, trying to remove all the tasks for table silvermineBIServiceLogs_OFFLINE if any
means no tasks are scheduled. Just to double confirm: 1/ do we have files in the folder? 2/ do we have aws key provided?
n
Files are available. I didn’t provide aws key, since the EKS Node instance role directly has access to the S3 Bucket.
Is providing access key mandatory, is there a way to leverage Roles to gain access to that S3 bucket?
h
if the EKS node has access, then it should be fine
oh, your configuration uses the wrong keyword 😃
task
->
tasks
n
That’s a bummer! Thanks for pointing that out. @Haitao Zhang Is there a way we can leverage roles to access S3? Like Kinesis Role based access
h
I am not an expert on that. @Navina any thoughts?
n
catching up on this thread. hang on
@Nagendra Gautham Gondi I think you should be able to leverage roles to access S3. is it a cross-account IAM role that you have setup ? if the policies are correctly attached to your instance, then you can simply remove the accesskey and secretKey from your table config and see if it works.
n
Hi Navina, Yes! I would like to assume cross account role to access S3.
n
I have tried S3 bucket access across different accounts using "IAM policies and resource-based bucket policies" (See https://aws.amazon.com/premiumsupport/knowledge-center/cross-account-access-s3/) . Not the cross-account IAM role access for S3 . I think that will require some changes in Pinot oss to use the
AssumeRole
credential provider . From what I know, that hasn't been implemented yet.
n
Got it! Yes, I am able to access with Bucket based policies in a different account. I was wondering if even assume role has been implemented. Thanks Navina
n
yeah. that has been pending. We always welcome contributions @Nagendra Gautham Gondi 🙏
thankyou 1