Is anybody using Pinot with an on prem S3 like filesystem ra Apache Pinot #troubleshooting

Is anybody using Pinot with an on-prem S3-like fil...

Aaron Wishnick

02/19/2021, 5:30 PM

Is anybody using Pinot with an on-prem S3-like filesystem rather than AWS' S3? I am doing this and trying to run a batch ingest, and I get this error:

Copy code

Got exception to kick off standalone data ingestion job -                                                                                             
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner           
        at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:144) ~[pinot-all-0.7.0-SNAPSHOT-jar
-with-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                       
        at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:113) ~[pinot-all-0.7.0-SNAPSHOT-jar-wit
h-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                           
        at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:132) [pinot-all-0.7.0-SNAPSHO
T-jar-with-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                  
        at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.
7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                                                
        at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0
-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                                                   
Caused by: java.io.IOException: software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id you provided does not exist in our records
. (Service: S3, Status Code: 403, Request ID: 0306422796023ADB, Extended Request ID: njXFdh82iDAWK78LUjRq1SCfJDgSD0Dcr9EhworrYh4CT7X0ZsPFVmHl2TUSmLK9e
P/EyAwhAm8=)                                                                                                                                          
        at org.apache.pinot.plugin.filesystem.S3PinotFS.mkdir(S3PinotFS.java:308) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-
7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                                                             
        at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:127) ~[pinot-batch-ingest
ion-standalone-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                     
        at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:142) ~[pinot-all-0.7.0-SNAPSHOT-jar
-with-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                       
        ... 4 more

Aaron Wishnick

02/19/2021, 5:54 PM

Ok so -- looks like the batch ingest job was loading my credentials from

~/.aws/credentials

which 1) were not for this filer and 2) don't have the ability to specify my endpoint.

Aaron Wishnick

02/19/2021, 5:55 PM

I've configured the controller and server with the right credentials and endpoint as documented here: https://docs.pinot.apache.org/basics/data-import/pinot-file-system/amazon-s3

Aaron Wishnick

02/19/2021, 5:55 PM

i.e. I'm setting:

Copy code

pinot.controller.storage.factory.s3.region=ap-southeast-1
pinot.controller.storage.factory.s3.accessKey=foo
pinot.controller.storage.factory.s3.secretKey=foo
pinot.controller.storage.factory.s3.endpoint=<http://foo>

Aaron Wishnick

02/19/2021, 5:55 PM

(and s/controller/server as well for the server conf)

Aaron Wishnick

02/19/2021, 5:56 PM

How can I pick up these settings for the batch ingest job? After deleting .aws/credentials I get this error on batch ingest:

Aaron Wishnick

02/19/2021, 5:56 PM

Copy code

aused by: java.io.IOException: software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from any of the providers in the 
chain AwsCredentialsProviderChain(credentialsProviders=[SystemPropertyCredentialsProvider(), EnvironmentVariableCredentialsProvider(), WebIdentityToke
nCredentialsProvider(), ProfileCredentialsProvider(), ContainerCredentialsProvider(), InstanceProfileCredentialsProvider()]) : [SystemPropertyCredenti
alsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or sy
stem property (aws.accessKeyId)., EnvironmentVariableCredentialsProvider(): Unable to load credentials from system settings. Access key must be specif
ied either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId)., WebIdentityTokenCredentialsProvider(): Either the envir
onment variable AWS_WEB_IDENTITY_TOKEN_FILE or the javaproperty aws.webIdentityTokenFile must be set., ProfileCredentialsProvider(): Profile file cont
ained no credentials for profile 'default': ProfileFile(profiles=[]), ContainerCredentialsProvider(): Cannot fetch credentials from container - neithe
r AWS_CONTAINER_CREDENTIALS_FULL_URI or AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variables are set., InstanceProfileCredentialsProvider(): U
nable to load credentials from service endpoint.]

Aaron Wishnick

02/19/2021, 5:56 PM

Is there any way to set my own endpoint for batch ingestion?

Nick Bowles

02/19/2021, 6:49 PM

When you say S3 like can you give more detail? I don’t know the low level details of the S3 plugin but I’m guessing you won’t want to use that unless it’s actually S3 you’re grabbing from.

Aaron Wishnick

02/19/2021, 6:50 PM

Oh sure -- it's literally API-compatible with S3, just I need to set the endpoint to something on-prem rather than AWS' servers

Aaron Wishnick

02/19/2021, 6:51 PM

In other words, from the Pinot docs, I need to set

pinot.controller.storage.factory.s3.endpoint

and the server equivalent I should be good -- but somehow this doesn't seem to be working for the batch ingest?

Aaron Wishnick

02/19/2021, 6:51 PM

I think the S3 plugin should work. I already do this with Trino and Trino's built-in S3 support using the aws sdk works

Aaron Wishnick

02/19/2021, 7:39 PM

Ok, I think I figured this out -- in addition to the S3PinotFS config options in the controller and server configuration files, I needed to set them in the job spec

Nick Bowles

02/19/2021, 7:44 PM

I had to do the same for GCP. Not sure if you’ve seen this but this doc has an example job file here

Aaron Wishnick

02/19/2021, 7:45 PM

Thanks! I wasn't aware that I could put more under

configs

than region, this seems to work!

Xiang Fu

02/19/2021, 7:48 PM

https://docs.pinot.apache.org/basics/data-import/pinot-file-system/amazon-s3

Xiang Fu

02/19/2021, 7:48 PM

you can put endpoint and more configs

Open in Slack

Previous Next