https://pinot.apache.org/ logo
#getting-started
Title
# getting-started
a

Amit Chopra

12/11/2020, 4:50 PM
Then i changed config as mentioned in https://docs.pinot.apache.org/operators/operating-pinot/decoupling-controller-from-the-data-path. And now segments are not being written to S3. I do see segments being created, as they show up on query browser. But the segments show up as status BAD. Can someone help to point what is wrong with the configuration: Configs: controller.conf ------------------------------ controller.helix.cluster.name=pinot-quickstart controller.port=9000 controller.enable.split.commit=true controller.allow.hlc.tables=false controller.data.dir=/tmp/pinot-tmp-data/ controller.local.temp.dir=/tmp/pinot-tmp-data/ pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.controller.storage.factory.s3.region=us-west-2 pinot.controller.segment.fetcher.protocols=file,http,s3 pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher controller.zk.str=pinot-zookeeper:2181 pinot.set.instance.id.to.hostname=true server.conf ------------------ pinot.server.netty.port=8098 pinot.server.instance.enable.split.commit=true pinot.server.adminapi.port=8097 pinot.server.instance.dataDir=/tmp/pinot-tmp/server/index pinot.server.instance.segment.store.uri=s3://pinot-quickstart-s3/pinot-data/pinot-s3-example/controller-data pinot.server.instance.segmentTarDir=/tmp/pinot-tmp/server/segmentTars pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.server.storage.factory.s3.region=us-west-2 pinot.server.segment.fetcher.protocols=file,http,s3 pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher pinot.set.instance.id.to.hostname=true pinot.server.instance.realtime.alloc.offheap=true table conf ---------------------- { “REALTIME”: { “tableName”: “demo1_REALTIME”, “tableType”: “REALTIME”, “segmentsConfig”: { “timeType”: “MILLISECONDS”, “schemaName”: “demo1", “timeColumnName”: “mergedTimeMillis”, “retentionTimeUnit”: “DAYS”, “retentionTimeValue”: “60", “replication”: “1", “replicasPerPartition”: “1", “completionConfig”: { “completionMode”: “DOWNLOAD” }, “peerSegmentDownloadScheme”: “http” }, “tenants”: { “broker”: “DefaultTenant”, “server”: “DefaultTenant” }, “tableIndexConfig”: { “streamConfigs”: { “streamType”: “kafka”, “stream.kafka.consumer.type”: “lowlevel”, “stream.kafka.topic.name”: “demo1", “stream.kafka.decoder.class.name”: “org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder”, “stream.kafka.consumer.factory.class.name”: “org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory”, “stream.kafka.zk.broker.url”: “z-1.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:2181,z-3.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:2181,z-2.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:2181", “stream.kafka.broker.list”: “b-2.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:9092,b-1.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:9092", “realtime.segment.flush.threshold.time”: “10m”, “realtime.segment.flush.threshold.size”: “10000", “stream.kafka.consumer.prop.auto.offset.reset”: “smallest” }, “enableDefaultStarTree”: false, “enableDynamicStarTreeCreation”: false, “loadMode”: “MMAP”, “autoGeneratedInvertedIndex”: false, “createInvertedIndexDuringSegmentGeneration”: false, “aggregateMetrics”: false, “nullHandlingEnabled”: false }, “metadata”: { “customConfigs”: {} } } }
x

Xiang Fu

12/11/2020, 5:04 PM
I think this controller.data.dir=/tmp/pinot-tmp-data/ should be on s3?
oic, this is for split commit
have you seen any logs on pinot server for not able to write to s3?
a

Amit Chopra

12/11/2020, 5:30 PM
BTW - if i change controller.data.dir to s3 path, things start to work. Segments are getting created in S3. But how do i know then if it is controller or server creating and uploading the segments to S3?
@Xiang Fu - basically trying to understand if pinot.server.instance.segment.store.uri is set with S3 path for server config, does controller also need the S3 path set using controller.data.dir? And if required to be set to controller too, why does it need that?
x

Xiang Fu

12/11/2020, 5:38 PM
no need
in your case it's separation
I think your config is fine
can you check server log and see if there is any exception about saving segment to s3
👍 1
a

Amit Chopra

12/11/2020, 5:39 PM
i see. so next steps is for me to remove the s3 path from controller conf. And then check the logs on server logs
x

Xiang Fu

12/11/2020, 5:40 PM
yes, controller is not on data path
a

Amit Chopra

12/11/2020, 8:15 PM
@Ting Chen @Xiang Fu - i see the following in server logs (with controller.data.dir pointing to local temp dir) 2020/12/11 201346.613 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__1__125__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”false,“streamPartitionMsgOffset”null,“buildTimeSec” 1,“status”“FAILED”,“offset”:-1} 2020/12/11 201347.851 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__4__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”false,“streamPartitionMsgOffset”null,“buildTimeSec” 1,“status”“FAILED”,“offset”:-1} 2020/12/11 201348.052 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__3__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”false,“streamPartitionMsgOffset”null,“buildTimeSec” 1,“status”“FAILED”,“offset”:-1} 2020/12/11 201349.012 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__0__125__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”false,“streamPartitionMsgOffset”null,“buildTimeSec” 1,“status”“FAILED”,“offset”:-1} 2020/12/11 201349.331 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__2__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”false,“streamPartitionMsgOffset”null,“buildTimeSec” 1,“status”“FAILED”,“offset”:-1} 2020/12/11 201349.695 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__1__125__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”false,“streamPartitionMsgOffset”null,“buildTimeSec” 1,“status”“FAILED”,“offset”:-1} 2020/12/11 201350.931 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__4__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”false,“streamPartitionMsgOffset”null,“buildTimeSec” 1,“status”“FAILED”,“offset”:-1} 2020/12/11 201351.115 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__3__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”false,“streamPartitionMsgOffset”null,“buildTimeSec” 1,“status”“FAILED”,“offset”:-1} 2020/12/11 201352.082 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__0__125__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”false,“streamPartitionMsgOffset”null,“buildTimeSec” 1,“status”“FAILED”,“offset”:-1} 2020/12/11 201352.389 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__2__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”false,“streamPartitionMsgOffset”null,“buildTimeSec” 1,“status”“FAILED”,“offset”:-1} 2020/12/11 201352.752 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__1__125__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”false,“streamPartitionMsgOffset”null,“buildTimeSec” 1,“status”“FAILED”,“offset”:-1} 2020/12/11 201354.014 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__4__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”false,“streamPartitionMsgOffset”null,“buildTimeSec” 1,“status”“FAILED”,“offset”:-1} 2020/12/11 201354.206 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__3__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”false,“streamPartitionMsgOffset”null,“buildTimeSec” 1,“status”“FAILED”,“offset”:-1}
t

Ting Chen

12/11/2020, 8:44 PM
can you check the controller log to find why CommitEnd failed?
m

Mahesh Yeole

12/14/2020, 7:20 PM
@Ting Chen @Xiang Fu I see lot of files are written to S3 under same timestamp but i do see error on controller as well as server.    I see on cluster manager console, segment keep showing consuming…. We are tying to use split commit feature but even setting split.commit to true for both controller and server , we do see "isSplitCommitType":false in server error.                                 error on server logs [LLRealtimeSegmentDataManager_pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z] [pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z] CommitEnd failed with response {"isSplitCommitType":false,"streamPartitionMsgOffset":null,"buildTimeSec":-1,"status":"FAILED","offset":-1} Error on controller logs [SegmentCompletionFSM_pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z] [grizzly-http-server-1] Caught exception while committing segment file for segment: pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z java.io.IOException: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The specified key does not exist. (Service: S3, Status Code: 404, Request ID: E62169F11317304B, Extended Request ID: 3dlRY25FjPWIVJsA82PfQnhwlyp/26Nw1VM2xZCzlqEUvNSIXpFSexbvMewbLTR3ZuaDSHE6rq8=) This is my controller.conf   controller.helix.cluster.name=pinot-cluster controller.port=9000 controller.local.temp.dir=/var/pinot/controller/data controller.data.dir=s3://pinot-cluster-segment-s3/pinot-data/pinot-s3-example/controller-data/ controller.zk.str=pinot-zookeeper:2181 pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.controller.storage.factory.s3.region=us-west-2 pinot.controller.segment.fetcher.protocols=file,http,s3 pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher controller.allow.hlc.tables=false controller.enable.split.commit=true pinot.set.instance.id.to.hostname=true                                                  This is my server.conf pinot.server.netty.port=8098 pinot.server.adminapi.port=8097 pinot.server.instance.dataDir=/var/pinot/server/data/index pinot.server.instance.segmentTarDir=/var/pinot/server/data/segment pinot.set.instance.id.to.hostname=true pinot.server.instance.realtime.alloc.offheap=true pinot.server.instance.segment.store.uri=s3://pinot-cluster-segment-s3/pinot-data/pinot-s3-example/controller-data/ pinot.server.instance.enable.split.commit=true pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.server.storage.factory.s3.region=us-west-2 pinot.server.segment.fetcher.protocols=file,http,s3 pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcherroot@pinot-server-0:/opt/pinot#
@Ting Chen @Xiang Fu Are we missing any configuration ?
x

Xiang Fu

12/14/2020, 9:09 PM
Copy code
java.io.IOException: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The specified key does not exist. (Service: S3, Status Code: 404, Request ID: E62169F11317304B, Extended Request ID: 3dlRY25FjPWIVJsA82PfQnhwlyp/26Nw1VM2xZCzlqEUvNSIXpFSexbvMewbLTR3ZuaDSHE6rq8=)
do we set the credential correctly for aws
m

Mahesh Yeole

12/14/2020, 9:35 PM
@Xiang Fu yes we did set env variables for AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN in templates/controller/statefulset.yaml and templates/server/statefulset.yml
I am able to access S3 from server pod
myeole-a01:pinot myeole$ aws s3 ls pinot-cluster-segment-s3/pinot-data/pinot-s3-example/controller-data/pullRequestMergedEventsAwsMskDemo/ 2020-12-14 134707     0  2020-12-14 102619   855564 pullRequestMergedEventsAwsMskDemo__0__0__20201214T1817Z00d97c80-cad9-48d8-ab08-4bea626f30a3 2020-12-14 104849   855564 pullRequestMergedEventsAwsMskDemo__0__0__20201214T1817Z00da9537-502a-4f56-b6cb-85656dc743e8
x

Xiang Fu

12/14/2020, 9:49 PM
ic
have you checked the pinotServer.log and pinotController.log for ERROR
that should have more logs than kubectl console log
m

Mahesh Yeole

12/14/2020, 9:52 PM
yes I see these errors in logs
error on server logs [LLRealtimeSegmentDataManager_pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z] [pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z] CommitEnd failed with response {"isSplitCommitType":false,"streamPartitionMsgOffset":null,"buildTimeSec":-1,"status":"FAILED","offset":-1} Error on controller logs [SegmentCompletionFSM_pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z] [grizzly-http-server-1] Caught exception while committing segment file for segment: pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z java.io.IOException: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The specified key does not exist. (Service: S3, Status Code: 404, Request ID: E62169F11317304B, Extended Request ID: 3dlRY25FjPWIVJsA82PfQnhwlyp/26Nw1VM2xZCzlqEUvNSIXpFSexbvMewbLTR3ZuaDSHE6rq8=)
i dont see any other error
x

Xiang Fu

12/14/2020, 9:52 PM
for
<s3://pinot-quickstart-s3/pinot-data/pinot-s3-example/controller-data>
is bucket
pinot-quickstart-s3
?
or it’s
s3://<bucket>/pinot-quickstart-s3/pinot-data/pinot-s3-example/controller-data
can you try to fill the bucket name there as well
m

Mahesh Yeole

12/14/2020, 9:59 PM
I am using this pinot.server.instance.segment.store.uri=s3://pinot-cluster-segment-s3/pinot-data/pinot-s3-example/controller-data/
x

Xiang Fu

12/14/2020, 10:02 PM
right, i mean do you have a bucket with this s3 url?
i somehow feel that segment is not accessible inside pinot
so
pinot-cluster-segment-s3
is the s3 bucket name?
m

Mahesh Yeole

12/14/2020, 10:06 PM
yes
x

Xiang Fu

12/14/2020, 10:07 PM
ic
can you check segment metadata in swagger ui
for the failed segment
you can give table name/ segment name
m

Mahesh Yeole

12/14/2020, 10:08 PM
Copy code
{
  "segment.realtime.endOffset": "9223372036854775807",
  "segment.time.unit": null,
  "segment.start.time": "-1",
  "segment.flush.threshold.size": "10000",
  "segment.realtime.startOffset": "10000",
  "segment.end.time": "-1",
  "segment.total.docs": "-1",
  "segment.table.name": "pullRequestMergedEventsAwsMskDemo_REALTIME",
  "segment.realtime.numReplicas": "1",
  "segment.creation.time": "1607971877305",
  "segment.realtime.download.url": null,
  "segment.name": "pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z",
  "segment.index.version": null,
  "custom.map": null,
  "segment.flush.threshold.time": null,
  "segment.type": "REALTIME",
  "segment.crc": "-1",
  "segment.realtime.status": "IN_PROGRESS"
}
this is on cluster manager UI
from swagger API
I am getting this
Copy code
{
  "code": 404,
  "error": "Segment: pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z of table: pullRequestMergedEventsAwsMskDemo_REALTIME not found at: <s3://pinot-cluster-segment-s3/pinot-data/pinot-s3-example/controller-data//pullRequestMergedEventsAwsMskDemo_REALTIME/pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z>"
}
objects in S3 are in this format, the files are created looks like every few seconds ..why not sure
pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z000af865-ade9-4d23-844e-e1e3c2a0d36d - December 14, 2020, 11:02 (UTC-08:00) 875.0 KB Standard pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z00266be0-75b2-4f9d-bd22-a0a41a0632a7 - December 14, 2020, 11:03 (UTC-08:00) 875.0 KB Standard pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z00278389-5ccb-4641-99a1-bb8a54170c7e - December 14, 2020, 12:33 (UTC-08:00) 875.0 KB Standard pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z003590a9-6c21-4707-8b76-22c24f2b577a - December 14, 2020, 14:16 (UTC-08:00) 875.0 KB Standard pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z0048e9d0-c701-4840-a017-7e482d422243 - December 14, 2020, 13:45 (UTC-08:00) 875.0 KB Standard pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z00515765-c3e2-447b-bff7-0b2f4320a8f3 - December 14, 2020, 13:11 (UTC-08:00) 875.0 KB Standard pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z0061a14d-e1fa-4caf-8a97-7d5ff12c47cf - December 14, 2020, 11:25 (UTC-08:00) 875.0 KB Standard pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z0073c589-929a-4c45-ac96-7ed2b012a288
x

Xiang Fu

12/14/2020, 10:21 PM
hmmm, I feel those segments are created and uploaded from server
but controller commit got failed
@Ting Chen do u have any idea?
a

Amit Chopra

12/15/2020, 6:15 PM
@Ting Chen @Xiang Fu Just to close on this thread. I deleted my k8s namespace and redeployed. This time passed s3 path to both server and controller. And things are working as expected. Thanks for all the help 🙂 I see the following logs on server instance: 2020/12/15 170225.113 INFO [LLRealtimeSegmentDataManager_demo2__0__0__20201215T1652Z] [demo2__0__0__20201215T1652Z] Successfully built segment in 312 ms, after lockWaitTime 0 ms 2020/12/15 170225.139 INFO [FileUploadDownloadClient] [demo2__0__0__20201215T1652Z] Sending request: http://pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local:9000/segmentCommitStart?name=demo2__0__0__20201215T1652Z&amp;offset=298&amp;instance=Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098&amp;buildTimeMillis=312&amp;memoryUsedBytes=6327504&amp;segmentSizeBytes=132903&amp;rowCount=298&amp;streamPartitionMsgOffset=298 to controller: pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local, version: Unknown 2020/12/15 170225.140 INFO [ServerSegmentCompletionProtocolHandler] [demo2__0__0__20201215T1652Z] Controller response {“streamPartitionMsgOffset”null,“buildTimeSec”-1,“isSplitCommitType”false,“status”“COMMIT_CONTINUE”,“offset”:-1} for http://pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local:9000/segmentCommitStart?name=demo2__0__0__20201215T1652Z&amp;offset=298&amp;instance=Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098&amp;buildTimeMillis=312&amp;memoryUsedBytes=6327504&amp;segmentSizeBytes=132903&amp;rowCount=298&amp;streamPartitionMsgOffset=298 2020/12/15 170225.767 INFO [S3PinotFS] [pool-9-thread-1] Copy /tmp/pinot-tmp/server/index/demo2_REALTIME/demo2__0__0__20201215T1652Z.tar.gz from local to s3://pinot-quickstart-s3/pinot-data/pinot-s3-example/controller-data/demo2/demo2__0__0__20201215T1652Zb58f082e-07be-4928-af1e-a941d09101b2 2020/12/15 170225.883 INFO [PinotFSSegmentUploader] [demo2__0__0__20201215T1652Z] Successfully upload segment demo2__0__0__20201215T1652Z to s3://pinot-quickstart-s3/pinot-data/pinot-s3-example/controller-data/demo2/demo2__0__0__20201215T1652Zb58f082e-07be-4928-af1e-a941d09101b2. 2020/12/15 170226.410 INFO [FileUploadDownloadClient] [demo2__0__0__20201215T1652Z] Sending request: http://pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local:9000/segmentCommitEndWithMetadata?name=demo2__0__0__20201215T1652Z&amp;offset=298&amp;instance=Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098&amp;buildTimeMillis=312&amp;memoryUsedBytes=6327504&amp;segmentSizeBytes=132903&amp;rowCount=298&amp;location=s3://pinot-quickstart-s3/pinot-data/pinot-s3-example/controller-data/demo2/demo2__0__0__20201215T1652Zb58f082e-07be-4928-af1e-a941d09101b2&amp;streamPartitionMsgOffset=298 to controller: pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local, version: Unknown 2020/12/15 170226.411 INFO [ServerSegmentCompletionProtocolHandler] [demo2__0__0__20201215T1652Z] Controller response {“streamPartitionMsgOffset”null,“buildTimeSec”-1,“isSplitCommitType”false,“status”“COMMIT_SUCCESS”,“offset”:-1} for http://pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local:9000/segmentCommitEndWithMetadata?name=demo2__0__0__20201215T1652Z&amp;offset=298&amp;instance=Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098&amp;buildTimeMillis=312&amp;memoryUsedBytes=6327504&amp;segmentSizeBytes=132903&amp;rowCount=298&amp;location=s3://pinot-quickstart-s3/pinot-data/pinot-s3-example/controller-data/demo2/demo2__0__0__20201215T1652Zb58f082e-07be-4928-af1e-a941d09101b2&amp;streamPartitionMsgOffset=298
my final config, in case it helps for future: Server ------------------------- /var/pinot/server/config/pinot-server.conf  pinot.server.netty.port=8098 pinot.server.instance.enable.split.commit=true pinot.server.adminapi.port=8097 pinot.server.instance.dataDir=/tmp/pinot-tmp/server/index pinot.server.instance.segment.store.uri=s3://pinot-quickstart-s3/pinot-data/pinot-s3-example/controller-data pinot.server.instance.segmentTarDir=/tmp/pinot-tmp/server/segmentTars pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.server.storage.factory.s3.region=us-west-2 pinot.server.segment.fetcher.protocols=file,http,s3 pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher pinot.set.instance.id.to.hostname=true pinot.server.instance.realtime.alloc.offheap=true controller ----------------------- /var/pinot/controller/config/pinot-controller.conf  controller.helix.cluster.name=pinot-quickstart controller.port=9000 controller.enable.split.commit=true controller.allow.hlc.tables=false controller.data.dir=s3://pinot-quickstart-s3/pinot-data/pinot-s3-example/controller-data controller.local.temp.dir=/tmp/pinot-tmp-data/ pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.controller.storage.factory.s3.region=us-west-2 pinot.controller.segment.fetcher.protocols=file,http,s3 pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher controller.zk.str=pinot-zookeeper:2181 pinot.set.instance.id.to.hostname=true Table config --------------------- {   “REALTIME”: {     “tableName”: “demo2_REALTIME”,     “tableType”: “REALTIME”,     “segmentsConfig”: {       “timeColumnName”: “mergedTimeMillis”,       “retentionTimeUnit”: “DAYS”,       “retentionTimeValue”: “60",       “completionConfig”: {         “completionMode”: “DOWNLOAD”       },       “peerSegmentDownloadScheme”: “http”,       “timeType”: “MILLISECONDS”,       “schemaName”: “demo2”,       “replication”: “1”,       “replicasPerPartition”: “1”     },     “tenants”: {       “broker”: “DefaultTenant”,       “server”: “DefaultTenant”     },     “tableIndexConfig”: {       “autoGeneratedInvertedIndex”: false,       “loadMode”: “MMAP”,       “createInvertedIndexDuringSegmentGeneration”: false,       “streamConfigs”: {         “streamType”: “kafka”,         “stream.kafka.consumer.type”: “lowlevel”,         “stream.kafka.topic.name”: “demo2",         “stream.kafka.decoder.class.name”: “org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder”,         “stream.kafka.consumer.factory.class.name”: “org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory”,         “stream.kafka.zk.broker.url”: “z-1.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:2181,z-3.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:2181,z-2.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:2181",         “stream.kafka.broker.list”: “b-2.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:9092,b-1.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:9092",         “realtime.segment.flush.threshold.time”: “10m”,         “realtime.segment.flush.threshold.size”: “10000",         “stream.kafka.consumer.prop.auto.offset.reset”: “smallest”       },       “enableDefaultStarTree”: false,       “enableDynamicStarTreeCreation”: false,       “aggregateMetrics”: false,       “nullHandlingEnabled”: false     },     “metadata”: {       “customConfigs”: {}     }   } }
t

Ting Chen

12/15/2020, 6:19 PM
awesome. glad to see you get it working!
thanks for sharing the final config. one minor optimization: you can remove this optional table config:
Copy code
"completionConfig": {
        "completionMode": "DOWNLOAD"
      }
by doing so, it can reduce the amount of downloads from s3.
I will update our Apache doc on this too.
a

Amit Chopra

12/15/2020, 6:22 PM
Got it. Thanks. Will change and try. BTW, remove the completion config. But keep the “peerSegmentDownloadScheme”: “http”?
@Ting Chen trying to understand the effect of this config better. As per

https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+f[…]145724933/Screen%20Shot%202020-02-05%20at%204.45.39%20PM.png

, replica server (non commit server) will download the segment instead of catching up. And thus your suggestion to remove it, so that download from S3 can be reduced. Am i understanding it correctly?
t

Ting Chen

12/15/2020, 6:36 PM
yes.
a

Amit Chopra

12/15/2020, 6:37 PM
ok, cool. Thanks 🙂
j

Jatin Yadav

11/15/2023, 7:24 AM
Hi @Ting Chen I am also facing the same issue , so I followed the above conversation, created table, in s3 tmp files of controller data is created, and segment status is consuming, , though “realtime.segment.flush.threshold.time” :"03m" , please can you help me what is the issue , thanks!
t

Ting Chen

11/20/2023, 9:06 PM
Hmm.. this is a thread closed 3 years ago. What problem are you facing exactly?