Diogo Baeder
04/15/2022, 9:51 PMDiogo Baeder
04/15/2022, 9:52 PM$ docker-compose -f docker-compose-databases.yml exec pinot-controller bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile /config/ingestion/weights.yaml
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/pinot/lib/pinot-all-0.10.0-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-environment/pinot-azure/pinot-azure-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-metrics/pinot-dropwizard/pinot-dropwizard-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-metrics/pinot-yammer/pinot-yammer-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/opt/pinot/lib/pinot-all-0.10.0-jar-with-dependencies.jar) to method java.lang.Object.finalize()
WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
and here are the ingestion jobs files visible inside the controller:
$ docker-compose -f docker-compose-databases.yml exec pinot-controller ls -lh /config/ingestion/
total 12K
-rw-r--r-- 1 1000 1000 956 Apr 15 21:34 brands_metrics.yaml
-rw-r--r-- 1 1000 1000 935 Apr 15 21:37 filters.yaml
-rw-r--r-- 1 1000 1000 935 Apr 15 21:31 weights.yaml
any ideas what's going on?Xiaoman Dong
04/15/2022, 10:00 PMDiogo Baeder
04/15/2022, 10:01 PMDiogo Baeder
04/15/2022, 10:05 PM2022/04/15 22:02:59.975 INFO [ControllerPeriodicTask] [pool-7-thread-7] Processing 3 tables in task: OfflineSegmentIntervalChecker
2022/04/15 22:02:59.981 WARN [ZKMetadataProvider] [pool-7-thread-7] Path: /SEGMENTS/filters_OFFLINE does not exist
2022/04/15 22:02:59.991 WARN [ZKMetadataProvider] [pool-7-thread-7] Path: /SEGMENTS/brands_metrics_OFFLINE does not exist
2022/04/15 22:02:59.994 WARN [ZKMetadataProvider] [pool-7-thread-7] Path: /SEGMENTS/weights_OFFLINE does not exist
but I'm not sure what's going on - shouldn't Pinot just create the segments automatically, when running those jobs?Diogo Baeder
04/15/2022, 10:11 PMexecutionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/sensitive-data/outputs/'
includeFileNamePattern: 'glob:weights/**/*.json'
outputDirURI: '/opt/pinot/data'
overwriteOutput: true
pinotFSSpecs:
- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
dataFormat: 'json'
className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
tableName: 'weights'
schemaURI: '<http://localhost:9000/tables/weights/schema>'
pinotClusterSpecs:
- controllerURI: '<http://localhost:9000>'
and the data I have in that directory is a .json file, but which contains a list of JSON dictionaries/maps/objects (not JSONL, just a simple JSON file whose top-level element is a list/array)Xiaoman Dong
04/15/2022, 10:25 PMXiaoman Dong
04/15/2022, 10:27 PMDiogo Baeder
04/15/2022, 10:30 PMXiaoman Dong
04/15/2022, 10:30 PMXiaoman Dong
04/15/2022, 10:33 PMXiaoman Dong
04/15/2022, 10:33 PMDiogo Baeder
04/15/2022, 10:34 PMinputDirURI: '/sensitive-data/outputs/'
includeFileNamePattern: 'glob:weights/**/*.json'
and here are the files, for example:
root@d7ae8b63b8e1:/opt/pinot/logs# find /sensitive-data/outputs/weights/ -name *2013041*.json
/sensitive-data/outputs/weights/br/20130415.json
/sensitive-data/outputs/weights/br/20130418.json
/sensitive-data/outputs/weights/br/20130416.json
/sensitive-data/outputs/weights/br/20130419.json
/sensitive-data/outputs/weights/br/20130417.json
/sensitive-data/outputs/weights/br/20130414.json
/sensitive-data/outputs/weights/br/20130413.json
Diogo Baeder
04/15/2022, 10:34 PMDiogo Baeder
04/15/2022, 10:35 PMXiaoman Dong
04/15/2022, 10:37 PMXiaoman Dong
04/15/2022, 10:38 PMDiogo Baeder
04/15/2022, 10:50 PMDiogo Baeder
04/15/2022, 10:50 PMXiaoman Dong
04/15/2022, 10:52 PMDiogo Baeder
04/15/2022, 10:52 PMDiogo Baeder
04/15/2022, 10:52 PMDiogo Baeder
04/15/2022, 10:53 PMDiogo Baeder
04/15/2022, 10:55 PMDiogo Baeder
04/15/2022, 11:00 PMDiogo Baeder
04/15/2022, 11:18 PMDiogo Baeder
04/15/2022, 11:19 PMDiogo Baeder
04/15/2022, 11:19 PMDiogo Baeder
04/16/2022, 4:33 AMinputDirURI: '/sensitive-data/outputs/weights'
includeFileNamePattern: 'glob:**/*.json'
I think Pinot was unhappy with me starting the pattern with a subdirectory (weights
), so what I did was to have the inputDirURI
contain up to the deepest subdirectory level possible, and then just made the glob pattern start from there, with **
. I'm not sure why the other approach didn't work though.Xiaoman Dong
04/16/2022, 4:41 AM