Hi folks, I'm having issues running batch ingestio...
# troubleshooting
d
Hi folks, I'm having issues running batch ingestion in my local experiments with Pinot using docker-compose. More in this thread.
Here's what I get, with the command I run at the top:
Copy code
$ docker-compose -f docker-compose-databases.yml exec pinot-controller bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile /config/ingestion/weights.yaml
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/pinot/lib/pinot-all-0.10.0-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-environment/pinot-azure/pinot-azure-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-metrics/pinot-dropwizard/pinot-dropwizard-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-metrics/pinot-yammer/pinot-yammer-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/opt/pinot/lib/pinot-all-0.10.0-jar-with-dependencies.jar) to method java.lang.Object.finalize()
WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
and here are the ingestion jobs files visible inside the controller:
Copy code
$ docker-compose -f docker-compose-databases.yml exec pinot-controller ls -lh /config/ingestion/
total 12K
-rw-r--r-- 1 1000 1000 956 Apr 15 21:34 brands_metrics.yaml
-rw-r--r-- 1 1000 1000 935 Apr 15 21:37 filters.yaml
-rw-r--r-- 1 1000 1000 935 Apr 15 21:31 weights.yaml
any ideas what's going on?
x
The warning itself is not fatal so it does not help finding what is happening. Can you try ssh into that pinot controller container and find out the Pinot logs?
d
Sure! Hold on...
These are the only non-INFO log lines I found when trying to run the batch jobs:
Copy code
2022/04/15 22:02:59.975 INFO [ControllerPeriodicTask] [pool-7-thread-7] Processing 3 tables in task: OfflineSegmentIntervalChecker
2022/04/15 22:02:59.981 WARN [ZKMetadataProvider] [pool-7-thread-7] Path: /SEGMENTS/filters_OFFLINE does not exist
2022/04/15 22:02:59.991 WARN [ZKMetadataProvider] [pool-7-thread-7] Path: /SEGMENTS/brands_metrics_OFFLINE does not exist
2022/04/15 22:02:59.994 WARN [ZKMetadataProvider] [pool-7-thread-7] Path: /SEGMENTS/weights_OFFLINE does not exist
but I'm not sure what's going on - shouldn't Pinot just create the segments automatically, when running those jobs?
Here's my job spec for `weights`:
Copy code
executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/sensitive-data/outputs/'
includeFileNamePattern: 'glob:weights/**/*.json'
outputDirURI: '/opt/pinot/data'
overwriteOutput: true
pinotFSSpecs:
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
  dataFormat: 'json'
  className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
  tableName: 'weights'
  schemaURI: '<http://localhost:9000/tables/weights/schema>'
pinotClusterSpecs:
  - controllerURI: '<http://localhost:9000>'
and the data I have in that directory is a .json file, but which contains a list of JSON dictionaries/maps/objects (not JSONL, just a simple JSON file whose top-level element is a list/array)
x
We can try look through the log lines and see what the job has progressed into. For example, is the log line in https://github.com/apache/pinot/blob/734d203a50726318b8f8a6de40c51165429868ba/pino[…]/org/apache/pinot/spi/ingestion/batch/IngestionJobLauncher.java seen?
d
Here are the full Controller logs for the time when I try to run the load job:
x
Right now based on the code, it seems if the directory/path points to a place where there is no input file, Pinot does not print out warning log https://github.com/apache/pinot/blob/734d203a50726318b8f8a6de40c51165429868ba/pino[…]ugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java If that’s the case, it could be a good chance for you to submit a code change for Pinot too
The log seems to be cut off short.
Is very unlikely but is the pinot process alive? Someone else killed the process?
d
But here are the relevant configs for the location of the input files:
Copy code
inputDirURI: '/sensitive-data/outputs/'
includeFileNamePattern: 'glob:weights/**/*.json'
and here are the files, for example:
Copy code
root@d7ae8b63b8e1:/opt/pinot/logs# find /sensitive-data/outputs/weights/ -name *2013041*.json
/sensitive-data/outputs/weights/br/20130415.json
/sensitive-data/outputs/weights/br/20130418.json
/sensitive-data/outputs/weights/br/20130416.json
/sensitive-data/outputs/weights/br/20130419.json
/sensitive-data/outputs/weights/br/20130417.json
/sensitive-data/outputs/weights/br/20130414.json
/sensitive-data/outputs/weights/br/20130413.json
Lemme check
All instances alive, yes
x
The pinot logs stopped producing (if you copied the file properly), it has only 86 lines.
If even logs are not written, something is wrong that Pinot has not even started yet
d
Yeah, I noticed the truncation there, but that's what shows up in my terminal, not sure why
I'll check the RAM usage
x
My past experience with such truncation of logs is the process was killed, so that’s why I brought it up. Other case could be the Pinot process is stuck in a dead loop, which is very unlikely. Or maybe the error log is redirected somewhere else? maybe checking the log4j2.xml file?
d
Using ~16G, I have 32 total. So not that much.
Let me try taking a look at those, hold on.
The process hasn't been killed, it's been running for more than 1h. Dunno if a thread has been killed though.
Is it normal for the Controller to spawn hundreds of threads?
Alright, I truncated the whole log fine, checked the log4j configs and they seem just fine, and I ran the batch jobs again, same issue, the logs are somehow truncated. Let me restart Pinot from scratch with everything clean and try to run the jobs again. I did notice a stupid bug in my ingestable data that basically invalidates many of the data I have, but at least some of it should have been processed.
I started from scratch, cleaned up all directories related to those instances, created the tables, truncated the log file so that I could only have the stuff for the ingestion job, and I still get truncated logs after running it, and no useful lines telling me what went wrong
Anyway, I'll fix my issue with the dates I'm generating, maybe that by itself fixes the issue, but I have low confidence that it will.
Thanks for the help!
@User found the issue! It was with the configuration of the input, this is the configuration that works for me now:
Copy code
inputDirURI: '/sensitive-data/outputs/weights'
includeFileNamePattern: 'glob:**/*.json'
I think Pinot was unhappy with me starting the pattern with a subdirectory (
weights
), so what I did was to have the
inputDirURI
contain up to the deepest subdirectory level possible, and then just made the glob pattern start from there, with
**
. I'm not sure why the other approach didn't work though.
x
Awesome! Glad that it works now.