Thomas Steinholz
10/24/2022, 7:37 PMThe node was low on resource: ephemeral-storage. Container pinot-job-batch-ingestion was using 1944724Ki, which exceeds its request of 0.
). I have actually mounted a persistent volume to the pod executing this job, but it does not seem to be using it. I am currently mounting it at /var/pinot/minion/data
and /var/pinot/server/data
but this is not working for either. What directory should this volume be mounted to so that the BatchIngestJob uses the volume instead of the ephemeral storage?
As a secondary question, is there a simpler way to do this within the kubernetes cluster running Pinot? Or is the standard way to utilize an external Spark Cluster with the custom compiled pinot image?Thomas Steinholz
10/24/2022, 7:41 PMapiVersion: v1
kind: ConfigMap
metadata:
name: batch-job-metadata-configmap
namespace: datalake
data:
batch-ingest-job-spec.yaml: |-
executionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: 's3://<omitted>'
outputDirURI: 's3://<omitted>'
overwriteOutput: true
pinotFSSpecs:
- scheme: s3
className: org.apache.pinot.plugin.filesystem.S3PinotFS
configs:
region: '<omitted>'
recordReaderSpec:
dataFormat: 'parquet'
className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
tableSpec:
tableName: 'uplinkpayloadevent'
pinotClusterSpecs:
- controllerURI: '<omitted>'
pushJobSpec:
pushParallelism: 2
pushAttempts: 2
pushRetryIntervalMillis: 1000
segmentUriPrefix : 's3://'
segmentUriSuffix : pinot-offline/
---
apiVersion: batch/v1
kind: Job
metadata:
name: pinot-batch-ingest-job
namespace: <omitted>
spec:
template:
spec:
containers:
- name: pinot-batch-ingestion
image: apachepinot/pinot:latest
args:
- "LaunchDataIngestionJob"
- "-jobSpecFile"
- "/var/linklabs/batch/batch-ingest-job-spec.yaml"
env:
- name: JAVA_OPTS
value: "-Xms32G -Xmx64G -Dpinot.admin.system.exit=true"
resources:
requests:
memory: "32Gi"
limits:
memory: "64Gi"
envFrom:
- secretRef:
name: <omitted>
volumeMounts:
- name: batch-job-metadata
mountPath: /var/linklabs/batch
- name: data
mountPath: /var/pinot/server/data
restartPolicy: OnFailure
volumes:
- name: batch-job-metadata
configMap:
name: batch-job-metadata-configmap
- name: data
persistentVolumeClaim:
claimName: task-pv-pinot-etl-claim
backoffLimit: 10
Mayank
Haitao Zhang
10/24/2022, 8:33 PMMayank
Ken Krugler
10/25/2022, 3:49 AMSystem._getProperty_("<http://java.io|java.io>.tmpdir")
. So if you can set that property to your permanent volume (or maybe a /tmp dir on that volume) then I think it would work as you want.Haitao Zhang
10/25/2022, 3:52 AMThomas Steinholz
10/25/2022, 7:03 PM/tmp
dir is working, the job is still running but it hasn’t gotten this far before yet - so that is a good sign!