Hi, I know we can flush a segment based on the siz...
# general
m
Hi, I know we can flush a segment based on the size, number of rows or time since creation. I wonder if there is a way to only trigger a flush at a certain time of the day, say midnight? I am asking because I notice it can take minutes to flush a segment, during which Pinot stops consuming new messages and hence there would be a delay of minutes. This may throw the users off. We might be doing it totally wrong and any suggestions would be appreciated!
j
Currently we don't have such support, but this is a great feature to add. Basically we can add an api on controller to trigger the segment commit. This requires some segment commit protocol change though. This feature will also help with schema evolution feature which can only be picked up by the next consuming segment.
Can you please file a github issue about this feature request? Thanks
m
đź‘Ť 2
In addition, may I ask in your experience, how long it takes to flush a 5m-row segment? I don't have the size info at hand but can find it tomorrow if you need. Would like to see if there is anything we can do to make it faster in the meantime.
j
It depends. With different columns and indexes, the segment creation time could be very different. Normally it might take ~3 minutes to build the segment.
You should be able to find the segment creation log on the consuming server
s
Please run the realtime provisioning helper. It is documented. Paste the output here, and we may be able to suggest something
m
@User i did before but didn’t find it very helpful. I can try redoing it later. That said, how fast could it be after tuning it to the bone in your experience? Like Jackie said above, normally it would take 3 mins but an ideal state for us would be a few seconds in the 99th percentile
s
Please re-do it and send us the output. There is no standard for the time taken to build a segment. It is a function of how large your segment is (which includes number of rows, cardinality of columns, number of columns, length of string columns, the type and number of indices you have in there, etc.). In general, DO NOT use number of rows as the limit. It is better to use segment size as the limit. I hope you are doing that. If so, can you share what the size is, and can you reduce the size? It is also possible that all partitions on the machine are completing the segment at the same time. How many partitions do you have on any one machine? It is possible that your machine is (a) swapping or (b) GC-ing or both. What is your heap size? Also, what is your QPS? How much delay can you tolerate for incoming freshness of data? Do you know that Pinot provides a way to measure the freshness of data in your query response? There are knobs that you can tune before starting to propose protocol changes (that can cause unnecessary race conditions in the code and issues for other installations). What is your ingestion rate (per partition)? both in number of messages and number of bytes?
m
Thanks for the suggestions. Will send you the output when I have a chance. We are aware of all these aspects you summarized (It's a really summary and perhaps you want to put it up here https://docs.pinot.apache.org/operators/operating-pinot/tuning for other!). An ideal state for us would be several seconds in the 99th percentile - do you think this is achievable for a single table with an ingestion rate of 1k msg/s with a message size of 1kb?
Oh by the way, this link https://pinot.readthedocs.io/en/latest/tuning_realtime_performance.html referred the tuning page is now broken
s
ik msgs/s per partition? How many partitions? We have use cases ingesting at 50k msgs per partition.
m
In this test case, only 1 partition. It's not the ingestion latency but the segment build/flush latency we are discussing here. How often do you flush and how long does the segment build take for ingesting from a 50k msg/s partition? Some benchmark numbers would be really helpful.
s
@User (1). What kind of ingestion latency can you tolerate? (i.e. I am assuming here that the reason you dont want segment build to take long is because we stop consuming during that time. Are there other reasons? If so, let us know). (2) I have tried to explain with my prev comment that there is no universal benchmark. It depends on a variety of factors that I have mentioned, It will be best to kmow some answers form you. Ingestion rate, schema, indexing techniques used. etc. Output from realtime helper will also be useful to tune your use case
m
It would be great if we can do within a second but we can tolerate a latency of 2 or 3 seconds. Here is the output from the helper, which honestly i don't know how this can help us decide the flush settings in this case.
Copy code
RealtimeProvisioningHelper -tableConfigFile ./temp_schema.yml -numPartitions 1 -pushFrequency null -numHosts 1 -numHours 1,2,3 -sampleCompletedSegmentDir ./temp__0__4__20210728T1624Z/ -ingestionRate 1000 -maxUsableHostMemory 128G -retentionHours 1

 

Note:

 

* Table retention and push frequency ignored for determining retentionHours since it is specified in command
* See <https://docs.pinot.apache.org/operators/operating-pinot/tuning/realtime>

 

Memory used per host (Active/Mapped)

 

numHosts --> 1               |
numHours
 1 --------> 3.61G/70.62G    |
 2 --------> NA              |
 3 --------> NA              |

 

Optimal segment size

 

numHosts --> 1               |
numHours
 1 --------> 1.43G           |
 2 --------> NA              |
 3 --------> NA              |

 

Consuming memory

 

numHosts --> 1               |
numHours
 1 --------> 3.61G           |
 2 --------> NA              |
 3 --------> NA              |

 

Total number of segments queried per host (for all partitions)

 

numHosts --> 1               |
numHours
 1 --------> 1               |
 2 --------> NA              |
 3 --------> NA              |
Currently each flushed segment takes 10+ mins to build
s
I had asked a bunch of questions along with this (pls see my earlier msg for the full list) : How many partitions do you have? What is your table config? How many servers do you have hosting this table (you have given 1 as input to the helper. Do you ahve only one host?)? In any case, I have never seen segment completion happen within 2 to 3 seconds. I am curious why u have a limitatoin of 3 seconds, and how you can tolerate more than 3 seconds if you complete segments at midnight
m
Yes 1 portion and 1 server right now. We are in the process of evaluating if Pinot can satisfy our requirements. We can tolerate more at midnight but we will probably run out of memory already if we don’t flush intraday…
I will manually test with a smaller segment size (50MB or 100MB) and see how fast Pinot can get. It would nice if the helper could print some stats on it.
And there is nothing special about our table config:
Copy code
"segmentsConfig": {
    "schemaName": "temp",
    "timeColumnName": "MSGTIME",
    "replicasPerPartition": "1",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "2",
    "segmentPushType": "APPEND",
    "completionConfig": {
      "completionMode": "DOWNLOAD"
    }
  },
  "tableIndexConfig": {
    "invertedIndexColumns": [
      "AAA","BBB"
    ],
    "loadMode": "MMAP",
    "nullHandlingEnabled": false,
    "streamConfigs": {
      "realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.segment.size": "50M",
"streamType": "kafka",
      "stream.kafka.consumer.type": "lowLevel",
      "stream.kafka.topic.name": "temp",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.broker.list": "kafka:9092",
      "stream.kafka.consumer.prop.auto.offset.reset": "largest"
    }
s