https://pinot.apache.org/ logo
#troubleshooting
Title
# troubleshooting
d

Diogo Baeder

05/08/2022, 3:11 AM
Hi folks! How can I use the
/ingestFromURI
endpoint from the Controller API to ingest a file as a segment, but defining the segment name myself? I tried passing
segment.name
to the
batchConfigMapStr
parameter JSON, but it didn't work, the Controller ends up creating the segment name by itself. I'd like to have more control over this, because I want to be able to more easily replace segments.
Just for reference, this is my full URI:
Copy code
/ingestFromURI?tableNameWithType=cases_OFFLINE&batchConfigMapStr={\"inputFormat\":\"json\",\"input.fs.className\":\"org.apache.pinot.spi.filesystem.LocalPinotFS\",\"segment.name\":\"cases_br_2015_07\"}&sourceURIStr=file:///sensitive-data/outputs/cases/br/2015_07.json
m

Mayank

05/08/2022, 3:13 AM
You can use batch ingestion that allows you full control: https://docs.pinot.apache.org/basics/data-import/batch-ingestion
d

Diogo Baeder

05/08/2022, 3:24 AM
Ah, it's
segmentNameGeneratorSpec
then... I don't want to use the usual batch ingestion approach because I want to be able to manually upload segments, to be able to replace them later on demand. Batch ingestion doesn't ingest files that have already been marked as processed, right?
Worked now! I just used
segmentNameGenerator.type
and
segmentNameGenerator.configs.segment.name
(with the literal dots) as keys for the
batchConfigMapStr
JSON and it's now generating segments as I want 🙂
👍 1
Aaaand I could also replace the segment by just ingesting the same segment again, with changed data! Yay!
It sounds like you did something similar to what I did in this one
maybe I can update it to show how to use the HTTP API
d

Diogo Baeder

05/08/2022, 11:14 PM
Yeah, I used the API just because I didn't want to depend on the admin CLI on the server that will be managing the loading of data into Pinot. In any case, it worked quite well for me, after I finally understood how to properly configure the segment name.
n

Neha Pawar

05/12/2022, 9:43 PM
one thing to note here - the
/ingestFromFile
and
/ingestFromURI
endpoints aren’t the recommended ways to do ingestion in production setups. The reason being, they will download the entire file onto the controller and build the segment on the controller. If your files are large, the controllers aren’t often provisioned to perform such operations
d

Diogo Baeder

05/12/2022, 9:51 PM
Got it. But then what's the recommended way to do ad-hoc ingestion of files with ad-hoc names for the segments? I want to be able to both ingest at the time I want, and generate the segment with the name I want. I mean, if the Controller servers are beefy enough, they should handle these fine, provided that the data doesn't go through the roof, right?
I'm talking about JSON files of 500MB at most, or around that, so I'm assuming this could be handled by a Controller with, say, 16GB RAM.
n

Neha Pawar

05/12/2022, 10:12 PM
The ingestion job from pinot-admin would be the best approach for such adhoc operations: https://docs.pinot.apache.org/basics/data-import/batch-ingestion#ingestion-jobs This won’t be a problem
I don't want to use the usual batch ingestion approach because I want to be able to manually upload segments, to be able to replace them later on demand. Batch ingestion doesn't ingest files that have already been marked as processed, right?
Having said that, with your file size and controller spec, you should be good
d

Diogo Baeder

05/12/2022, 10:15 PM
Got it. If I use the usual pinot-admin approach, though, I would have to upload the job spec file somewhere visible to Pinot, and make it so that the file has a unique name so that Pinot just doesn't discard filenames that had been processed before, right? I'm asking this because I also need to be able to replace the segments by updated segments.
n

Neha Pawar

05/12/2022, 10:19 PM
LaunchDataIngestionJob from pinot-admin is a more comprehensive version of
/ingestFrom
APIs. Whatever you are doing from those APIs, you can definitely do using that command. You can specify the file name and also the segment name in this command as well. only difference is the compute is on the system of your choice, and not on the controller.
d

Diogo Baeder

05/12/2022, 11:02 PM
Ah, got it... I'll probably use that approach then, and use the Minion server, to avoid overloading the Controller 🙂