Hi team, I’m trying to upload a sample file using ...
# troubleshooting
g
Hi team, I’m trying to upload a sample file using
ingestFromURI
. I’m using the following fields: tableNameWithType: foo_OFFLINE batchConfigMapStr:
Copy code
{
  "inputFormat": "recordio",
  "input.fs.className": "com.company.mlutils.pinot.plugin.filesystem.object_store.ObjectStorePinotFS"
}
sourceURIStr: os://DATA/day=2022-10-12/hour=10/partition-310_foo_1665569284.recordio It fails with a 500 error
Copy code
{
  "code": 500,
  "error": "Caught exception when ingesting file into table: foo_OFFLINE. Could not create directory for downloading input file locally: s3:/pinot-deep-store/segments/upload_dir/working_dir_foo_OFFLINE_1665613948057/input_data_dir"
}
and I see the following in the controller log:
Copy code
2022/10/12 22:32:28.057 INFO [FileIngestionHelper] [jersey-server-managed-async-executor-5] Starting ingestion of URI payload to table: foo_OFFLINE using working dir: /opt/pinot/s3:/pinot-deep-store/segments/upload_dir/working_dir_foo_OFFLINE_1665613948057
2022/10/12 22:32:28.058 ERROR [FileIngestionHelper] [jersey-server-managed-async-executor-5] Caught exception when ingesting file to table: foo_OFFLINE
java.lang.IllegalStateException: Could not create directory for downloading input file locally: s3:/pinot-deep-store/segments/upload_dir/working_dir_foo_OFFLINE_1665613948057/input_data_dir
        at shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:518) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.pinot.controller.util.FileIngestionHelper.buildSegmentAndPush(FileIngestionHelper.java:102) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.pinot.controller.api.resources.PinotIngestionRestletResource.ingestData(PinotIngestionRestletResource.java:200) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.pinot.controller.api.resources.PinotIngestionRestletResource.ingestFromURI(PinotIngestionRestletResource.java:175) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.lambda$apply$0(ResourceMethodInvoker.java:381) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2$1.run(ServerRuntime.java:819) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2.run(ServerRuntime.java:814) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]
What could be wrong here? Note that the controller is able to upload data to the S3 deep-store on the realtime table
r
/opt/pinot/s3:/pinot-deep-store/segments/upload_dir/working_dir_foo_OFFLINE_1665613948057
seem wrong. there's a
s3:
in the middle of the path
g
Yes I noticed that. How is that path constructed
?
r
this is configured by your cluster's
controller.data.dir
g
Hmm odd because that contains:
Copy code
controller.data.dir=<s3://pinot-deep-store/segments>
controller.local.temp.dir=/tmp/pinot/apache_pinot/controller/temp
r
this looks like you had
opt/pinot/
configured somewhere else and somehow those 2 configs got stitch together into one.
h
I see your input data dir starts with
s3:/
but not
s3://
, is this related?
g
Copy code
controller.data.dir=<s3://pinot-deep-store/segments>
looks like
s3://
to me
h
I mean this:
java.lang.IllegalStateException: Could not create directory for downloading input file locally: s3:/pinot-deep-store/segments/upload_dir/working_dir_foo_OFFLINE_1665613948057/input_data_dir
it contains
s3:/
g
Isn’t that where I configure the deep store for the controller?
h
I feel it seems to be the URI you configured to download data? but let me check the code
something seems not right to me:
Copy code
controller.data.dir=<s3://pinot-deep-store/segments>
the data dir is a s3 folder?
it should be a local dir
g
hmm interesting. Doesn’t the controller also do some operations on the deep store? How would the controller know the URI for the deep store?
h
Please ignore my comment, I was wrong
g
np. But yes I did get the feeling it somehow combines 2 variables. None of my configs contains
/opt/pinot
though, which is the pinot directory inside the docker image
h
it seems a bug to me
let me double check
h
@Gerrit van Doorn this is indeed a bug, thanks! here is a quick fix: https://github.com/apache/pinot/pull/9591
g
Thanks @Haitao Zhang! Would this bug also manifest itself if I were to use Spark for ingestion?
h
if you are not using
ingestFromUri
, you will not touch this bug
if you still want to use
ingestFromUri,
to walkaround this problem , you can change permission of folder
/opt/pinot
to allow Pinot creating folders and files
g
I was working towards using spark but wanted to test ingest from Uri first. Might just jump to spark