https://pinot.apache.org/ logo
p

Pedro Silva

06/08/2021, 5:03 PM
Hello, does Pinot support some auto-scaling of some sort to deal with increasingly larger and heavier workloads? I have a single real-time table consuming events from kafka (this is a wide table but not many fields, currently there are 39, mostly strings, one of which has a max length of 2147483647 (INTEGER.MAX_VALUE) since it holds a json blob). My pinot cluster is deployed in Kubernetes (hosted in azure) 2 pinot server instances with 5GB heap + 3GB for direct memory, 100GB persistance volume (segment deepstorage is configured) with a k8s memory limit of 10G. 1 controller instance with 1GB heap for JVM, k8s memory limit 2G. 1 broker instance with 4GB heap, k8s memory limit 5G. My servers are crashing with segment faults & OOM, as follows: Server 1:
Copy code
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007f4b79052422, pid=8, tid=0x00007f4ae8739700
#
# JRE version: OpenJDK Runtime Environment (8.0_292-b10) (build 1.8.0_292-b10)
# Java VM: OpenJDK 64-Bit Server VM (25.292-b10 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# v  ~StubRoutines::jbyte_disjoint_arraycopy
#
# Core dump written. Default location: /opt/pinot/core or core.8
#
[thread 139959708407552 also had an error]
# An error report file with more information is saved as:
# /opt/pinot/hs_err_pid8.log
#
# If you would like to submit a bug report, please visit:
#   <http://bugreport.java.com/bugreport/crash.jsp>
#
Aborted (core dumped)
Server 2:
Copy code
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Start a Pinot [SERVER]-SendThread(pinot-zookeeper:2181)"
2021/06/08 16:35:05.338 ERROR [LLRealtimeSegmentDataManager_HitExecutionView_3mo__1__3__20210608T1552Z] [HitExecutionView_3mo__1__3__20210608T1552Z] Could not build segment
java.lang.IllegalArgumentException: Self-suppression not permitted
	at java.lang.Throwable.addSuppressed(Throwable.java:1072) ~[?:1.8.0_292]
	at org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:132) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-f15225f9c8abe8d9efa52c31c00f0d7418b368eb]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:783) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-f15225f9c8abe8d9efa52c31c00f0d7418b368eb]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:717) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-f15225f9c8abe8d9efa52c31c00f0d7418b368eb]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:628) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-f15225f9c8abe8d9efa52c31c00f0d7418b368eb]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
Caused by: java.lang.OutOfMemoryError: Java heap space
AsyncLogger error handling event seq=1, value='null': java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
Exception in thread "HitExecutionView_3mo__3__3__20210608T1552Z" java.lang.OutOfMemoryError: Java heap space
2021/06/08 16:35:05.395 ERROR [LLRealtimeSegmentDataManager_HitExecutionView_3mo__7__3__20210608T1553Z] [HitExecutionView_3mo__7__3__20210608T1553Z] Could not build segment
java.lang.IllegalArgumentException: Self-suppression not permitted
	at java.lang.Throwable.addSuppressed(Throwable.java:1072) ~[?:1.8.0_292]
	at org.apache.pinot.segment.local.segment.index.converter.SegmentV1V2ToV3FormatConverter.copyIndexData(SegmentV1V2ToV3FormatConverter.java:160) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-f15225f9c8abe8d9efa52c31c00f0d7418b368eb]
	at org.apache.pinot.segment.local.segment.index.converter.SegmentV1V2ToV3FormatConverter.convert(SegmentV1V2ToV3FormatConverter.java:86) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-f15225f9c8abe8d9efa52c31c00f0d7418b368eb]
	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.convertFormatIfNecessary(SegmentIndexCreationDriverImpl.java:370) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-f15225f9c8abe8d9efa52c31c00f0d7418b368eb]
	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:303) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-f15225f9c8abe8d9efa52c31c00f0d7418b368eb]
	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:256) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-f15225f9c8abe8d9efa52c31c00f0d7418b368eb]
	at org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:131) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-f15225f9c8abe8d9efa52c31c00f0d7418b368eb]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:783) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-f15225f9c8abe8d9efa52c31c00f0d7418b368eb]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:717) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-f15225f9c8abe8d9efa52c31c00f0d7418b368eb]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:628) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-f15225f9c8abe8d9efa52c31c00f0d7418b368eb]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
Caused by: java.lang.OutOfMemoryError: Java heap space
m

Mayank

06/08/2021, 5:04 PM
Pinot does support scaling, but it is not fully auto at the moment. You can add server nodes and invoke rebalance api
You seem to be running into
java.lang.OutOfMemoryError: Java heap space
p

Pedro Silva

06/08/2021, 5:05 PM
I'm injecting 33M records into the table, this is something like 15GB in parquet. However the 100GB of disk in server 1 + 76GB in server 2 are used up. With 4.6M records, pinot reports 56GB in uncompressed table data. Is this normal?
m

Mayank

06/08/2021, 5:06 PM
56GB of heap?
p

Pedro Silva

06/08/2021, 5:07 PM
in
Reported Size
as reported by the table summary UI in Pinot. I don't know if it is heap or disk.
m

Mayank

06/08/2021, 5:07 PM
I am guessing you used raw index for the string columns?
p

Pedro Silva

06/08/2021, 5:08 PM
The field with maxLength: 2147483647 is json indexed. Everything else is the default
This is the table indexing config:
Copy code
"tableIndexConfig": {
      "invertedIndexColumns": [],
      "rangeIndexColumns": [],
      "jsonIndexColumns": [
        "inputForUiControls"
      ],
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "bloomFilterColumns": [],
      "loadMode": "MMAP",
      "streamConfigs": {
        "streamType": "kafka",
        "stream.kafka.topic.name": "test.data.hitexecutionview_3mo",
        "stream.kafka.broker.list": "dckafka.dc-kafka.svc.cluster.local:9092",
        "stream.kafka.consumer.type": "lowlevel",
      
"stream.kafka.consumer.prop.auto.offset.reset": "smallest",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
        "realtime.segment.flush.threshold.rows": "1600000"
      },
      "noDictionaryColumns": [],
      "onHeapDictionaryColumns": [],
      "varLengthDictionaryColumns": [],
      "enableDefaultStarTree": false,
      "enableDynamicStarTreeCreation": false,
      "segmentPartitionConfig": {
        "columnPartitionMap": {
          "externalHitExecutionId": {
            "functionName": "Murmur",
            "numPartitions": 16
          }
        }
      }
m

Mayank

06/08/2021, 5:09 PM
By default string columns are padded to make them equal length in the dictionary. If you did not explicitly set no-dict column, then my guess is that is where the space is being spent. Do you have access to
metadata.properties
file inside the segment directory on the server? If so, can you share?
yeah
"noDictionaryColumns": [],
can you share metadata.properties file?
p

Pedro Silva

06/08/2021, 5:09 PM
let me check
I can't seem to find the metadata.properties file, where would it be found by default?
m

Mayank

06/08/2021, 5:11 PM
One the server's dataDir (where it stores segments), go inside one of the segment dir
p

Pedro Silva

06/08/2021, 5:12 PM
any segment will do?
m

Mayank

06/08/2021, 5:12 PM
Sure. Try to pick biggest one
p

Pedro Silva

06/08/2021, 5:14 PM
Server 1 is crashed, can't recover it but in server 2 all metadata.properties files are 35k, here is an example of the content: https://pastebin.com/nKEWvn28
m

Mayank

06/08/2021, 5:15 PM
which column has the INT_MAX length?
p

Pedro Silva

06/08/2021, 5:15 PM
inputForUiControls
m

Mayank

06/08/2021, 5:16 PM
column.inputForUiControls.lengthOfEachEntry = 33655
Your segment has 200k rows. With this padded string, the dictionary for this column becomes 6,7GB
p

Pedro Silva

06/08/2021, 5:18 PM
This is the schema: https://pastebin.com/saDFwEu7 Did I configure something wrong?
Is that 6,7G per segment?
m

Mayank

06/08/2021, 5:18 PM
6.7G for this one segment
what's the disk size you see for this segment (du -sh)
p

Pedro Silva

06/08/2021, 5:19 PM
4.0G
m

Mayank

06/08/2021, 5:19 PM
hmm
p

Pedro Silva

06/08/2021, 5:19 PM
Copy code
root@pinot-server-1:/var/pinot/server/data# du -sh ./index/HitExecutionView_3mo_REALTIME/HitExecutionView_3mo__5__3__20210608T1551Z/v3/
4.0G	./index/HitExecutionView_3mo_REALTIME/HitExecutionView_3mo__5__3__20210608T1551Z/v3/
m

Mayank

06/08/2021, 5:21 PM
There are a few things to do, since you are running out of heap
p

Pedro Silva

06/08/2021, 5:22 PM
Also space in server 0:
Copy code
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb         93G   93G   99M 100% /var/pinot/server/data
m

Mayank

06/08/2021, 5:22 PM
Where is the space being used? All in segments?
p

Pedro Silva

06/08/2021, 5:23 PM
Supposedly, yes, can't access the pod it's crashed
m

Mayank

06/08/2021, 5:23 PM
How did you get the metadata then?
p

Pedro Silva

06/08/2021, 5:23 PM
I accessed the other server, I have 2 (server 0 is crashed, server 1 is ok)
not the same volumes
s

Subbu Subramaniam

06/08/2021, 8:54 PM
from the stack, it seems that they are runin gout of heap during segment build
m

Mayank

06/08/2021, 9:10 PM
Closing the thread here - one string column had huge variations in length of entries and almost 99% of the 4GB segment size was dictionary padding. Converting it to no-dictionary column reduced the size by an order of magnitude.
✔️ 1