Diogo Baeder
05/08/2022, 3:11 AM/ingestFromURI
endpoint from the Controller API to ingest a file as a segment, but defining the segment name myself? I tried passing segment.name
to the batchConfigMapStr
parameter JSON, but it didn't work, the Controller ends up creating the segment name by itself. I'd like to have more control over this, because I want to be able to more easily replace segments.Diogo Baeder
05/08/2022, 3:13 AM/ingestFromURI?tableNameWithType=cases_OFFLINE&batchConfigMapStr={\"inputFormat\":\"json\",\"input.fs.className\":\"org.apache.pinot.spi.filesystem.LocalPinotFS\",\"segment.name\":\"cases_br_2015_07\"}&sourceURIStr=file:///sensitive-data/outputs/cases/br/2015_07.json
Mayank
Mayank
Diogo Baeder
05/08/2022, 3:24 AMsegmentNameGeneratorSpec
then...
I don't want to use the usual batch ingestion approach because I want to be able to manually upload segments, to be able to replace them later on demand. Batch ingestion doesn't ingest files that have already been marked as processed, right?Diogo Baeder
05/08/2022, 4:17 AMsegmentNameGenerator.type
and segmentNameGenerator.configs.segment.name
(with the literal dots) as keys for the batchConfigMapStr
JSON and it's now generating segments as I want 🙂Diogo Baeder
05/08/2022, 5:02 AMMark Needham
Mark Needham
Mark Needham
Diogo Baeder
05/08/2022, 11:14 PMNeha Pawar
/ingestFromFile
and /ingestFromURI
endpoints aren’t the recommended ways to do ingestion in production setups. The reason being, they will download the entire file onto the controller and build the segment on the controller. If your files are large, the controllers aren’t often provisioned to perform such operationsDiogo Baeder
05/12/2022, 9:51 PMDiogo Baeder
05/12/2022, 9:52 PMNeha Pawar
I don't want to use the usual batch ingestion approach because I want to be able to manually upload segments, to be able to replace them later on demand. Batch ingestion doesn't ingest files that have already been marked as processed, right?
Having said that, with your file size and controller spec, you should be goodDiogo Baeder
05/12/2022, 10:15 PMNeha Pawar
/ingestFrom
APIs. Whatever you are doing from those APIs, you can definitely do using that command. You can specify the file name and also the segment name in this command as well. only difference is the compute is on the system of your choice, and not on the controller.Diogo Baeder
05/12/2022, 11:02 PM