https://pinot.apache.org/ logo
#general
Title
# general
s

Saoirse Amarteifio

09/28/2021, 11:22 AM
Hello - is there a way to use the controller's REST interface to submit OFFLINE table ingestion tasks? Details are; • Parquet files on S3 • Have created a schema and table spec already on Pinot (which is deployed on K8s (Argo/Helm) Have seen there is an ingestion task that can be triggered e.g. using the scripts/utils in the pinot distribution but would like to directly do this using REST commands. I would like to apply the strategy of
SegmentCreationAndUriPush
- and i would either setup something that is running on a daily or hourly schedule OR just trigger once off tasks myself. Either works.
Copy code
curl -X POST "<http://localhost:9000/ingestFromURI?tableNameWithType=foo_OFFLINE>
&batchConfigMapStr={
  "inputFormat":"json",
  "input.fs.className":"org.apache.pinot.plugin.filesystem.S3PinotFS",
  "input.fs.prop.region":"us-central",
  "input.fs.prop.accessKey":"foo",
  "input.fs.prop.secretKey":"bar"
}
&sourceURIStr=<s3://test.bucket/path/to/json/data/data.json>"
🙏 1
s

Saoirse Amarteifio

09/28/2021, 5:54 PM
Thank you @User - the docs suggest this is for small files or testing because files need to be downloaded - or am i reading this wrong?
m

Mayank

09/28/2021, 6:07 PM
The ingestionFromURI endpoint is for a quickstart kind of setup.
Do you mean you want to schedule and control the job that can generate and push segmetns?
s

Saoirse Amarteifio

09/28/2021, 6:35 PM
yes - that is exactly what i want to do.
k

Kishore G

09/28/2021, 9:01 PM
we dont have this now but we are thinking of a simple solution.. can you please file a ticket.. I will add my thoughts to that
s

Saoirse Amarteifio

09/29/2021, 4:27 PM
will do - do i add that as a github issue ? or ?
m

Mayank

09/29/2021, 7:01 PM
Yes github issue @User
👍 1
s

Saoirse Amarteifio

10/11/2021, 4:26 PM
Hi @User @User - i did not add as an issue as i wanted to see what made sense as a pattern. I have successfully launched in a way I am happy with from Argo (Workflows). I attached the Argo Workflow and explain why this is useful. I want a "client" to be able to post requests and run Pinot tasks on K8s. Argo (Workflows) is a nice way to mange K8s deployments; i can post via an event framework or run as cron and i can add workflow steps that run scripts, python code etc giving me full control. • The kubernetes Pinot examples were limiting for my use case (arguably) because dealing with volume mounts is a little cumbersome • With Argo (Workflows) i can post a Yaml file or reference it as an artifact and then run the Pinot docker image in a container referencing that file • I can also easily attach a cron job to the workflow with/out parameters to run this ingestion job on some interval Putting this all together i can create an Argo service/supervisor that launches tasks via the Pinot controller on that cluster in a way im comfortable with - hence no issue from my perspective.