Hi folks! How can I use the `/ingestFromURI` endpo...
# troubleshooting
d
Hi folks! How can I use the
/ingestFromURI
endpoint from the Controller API to ingest a file as a segment, but defining the segment name myself? I tried passing
segment.name
to the
batchConfigMapStr
parameter JSON, but it didn't work, the Controller ends up creating the segment name by itself. I'd like to have more control over this, because I want to be able to more easily replace segments.
Just for reference, this is my full URI:
Copy code
/ingestFromURI?tableNameWithType=cases_OFFLINE&batchConfigMapStr={\"inputFormat\":\"json\",\"input.fs.className\":\"org.apache.pinot.spi.filesystem.LocalPinotFS\",\"segment.name\":\"cases_br_2015_07\"}&sourceURIStr=file:///sensitive-data/outputs/cases/br/2015_07.json
m
You can use batch ingestion that allows you full control: https://docs.pinot.apache.org/basics/data-import/batch-ingestion
d
Ah, it's
segmentNameGeneratorSpec
then... I don't want to use the usual batch ingestion approach because I want to be able to manually upload segments, to be able to replace them later on demand. Batch ingestion doesn't ingest files that have already been marked as processed, right?
Worked now! I just used
segmentNameGenerator.type
and
segmentNameGenerator.configs.segment.name
(with the literal dots) as keys for the
batchConfigMapStr
JSON and it's now generating segments as I want 🙂
👍 1
Aaaand I could also replace the segment by just ingesting the same segment again, with changed data! Yay!
It sounds like you did something similar to what I did in this one
maybe I can update it to show how to use the HTTP API
d
Yeah, I used the API just because I didn't want to depend on the admin CLI on the server that will be managing the loading of data into Pinot. In any case, it worked quite well for me, after I finally understood how to properly configure the segment name.
n
one thing to note here - the
/ingestFromFile
and
/ingestFromURI
endpoints aren’t the recommended ways to do ingestion in production setups. The reason being, they will download the entire file onto the controller and build the segment on the controller. If your files are large, the controllers aren’t often provisioned to perform such operations
d
Got it. But then what's the recommended way to do ad-hoc ingestion of files with ad-hoc names for the segments? I want to be able to both ingest at the time I want, and generate the segment with the name I want. I mean, if the Controller servers are beefy enough, they should handle these fine, provided that the data doesn't go through the roof, right?
I'm talking about JSON files of 500MB at most, or around that, so I'm assuming this could be handled by a Controller with, say, 16GB RAM.
n
The ingestion job from pinot-admin would be the best approach for such adhoc operations: https://docs.pinot.apache.org/basics/data-import/batch-ingestion#ingestion-jobs This won’t be a problem
I don't want to use the usual batch ingestion approach because I want to be able to manually upload segments, to be able to replace them later on demand. Batch ingestion doesn't ingest files that have already been marked as processed, right?
Having said that, with your file size and controller spec, you should be good
d
Got it. If I use the usual pinot-admin approach, though, I would have to upload the job spec file somewhere visible to Pinot, and make it so that the file has a unique name so that Pinot just doesn't discard filenames that had been processed before, right? I'm asking this because I also need to be able to replace the segments by updated segments.
n
LaunchDataIngestionJob from pinot-admin is a more comprehensive version of
/ingestFrom
APIs. Whatever you are doing from those APIs, you can definitely do using that command. You can specify the file name and also the segment name in this command as well. only difference is the compute is on the system of your choice, and not on the controller.
d
Ah, got it... I'll probably use that approach then, and use the Minion server, to avoid overloading the Controller 🙂