Hi I know we can flush a segment based on the size number of Apache Pinot #general

Hi, I know we can flush a segment based on the siz...

Map

07/22/2021, 10:01 PM

Hi, I know we can flush a segment based on the size, number of rows or time since creation. I wonder if there is a way to only trigger a flush at a certain time of the day, say midnight? I am asking because I notice it can take minutes to flush a segment, during which Pinot stops consuming new messages and hence there would be a delay of minutes. This may throw the users off. We might be doing it totally wrong and any suggestions would be appreciated!

Jackie

07/22/2021, 10:08 PM

Currently we don't have such support, but this is a great feature to add. Basically we can add an api on controller to trigger the segment commit. This requires some segment commit protocol change though. This feature will also help with schema evolution feature which can only be picked up by the next consuming segment.

Jackie

07/22/2021, 10:08 PM

Can you please file a github issue about this feature request? Thanks

Map

07/22/2021, 11:57 PM

Sure thing. Here you go: https://github.com/apache/incubator-pinot/issues/7192

👍 2

Map

07/22/2021, 11:59 PM

In addition, may I ask in your experience, how long it takes to flush a 5m-row segment? I don't have the size info at hand but can find it tomorrow if you need. Would like to see if there is anything we can do to make it faster in the meantime.

Jackie

07/23/2021, 12:45 AM

It depends. With different columns and indexes, the segment creation time could be very different. Normally it might take ~3 minutes to build the segment.

Jackie

07/23/2021, 12:45 AM

You should be able to find the segment creation log on the consuming server

Subbu Subramaniam

07/23/2021, 2:15 AM

Please run the realtime provisioning helper. It is documented. Paste the output here, and we may be able to suggest something

Map

07/23/2021, 7:49 PM

@User i did before but didn’t find it very helpful. I can try redoing it later. That said, how fast could it be after tuning it to the bone in your experience? Like Jackie said above, normally it would take 3 mins but an ideal state for us would be a few seconds in the 99th percentile

Subbu Subramaniam

07/23/2021, 8:27 PM

Please re-do it and send us the output. There is no standard for the time taken to build a segment. It is a function of how large your segment is (which includes number of rows, cardinality of columns, number of columns, length of string columns, the type and number of indices you have in there, etc.). In general, DO NOT use number of rows as the limit. It is better to use segment size as the limit. I hope you are doing that. If so, can you share what the size is, and can you reduce the size? It is also possible that all partitions on the machine are completing the segment at the same time. How many partitions do you have on any one machine? It is possible that your machine is (a) swapping or (b) GC-ing or both. What is your heap size? Also, what is your QPS? How much delay can you tolerate for incoming freshness of data? Do you know that Pinot provides a way to measure the freshness of data in your query response? There are knobs that you can tune before starting to propose protocol changes (that can cause unnecessary race conditions in the code and issues for other installations). What is your ingestion rate (per partition)? both in number of messages and number of bytes?

Map

07/23/2021, 8:39 PM

Thanks for the suggestions. Will send you the output when I have a chance. We are aware of all these aspects you summarized (It's a really summary and perhaps you want to put it up here https://docs.pinot.apache.org/operators/operating-pinot/tuning for other!). An ideal state for us would be several seconds in the 99th percentile - do you think this is achievable for a single table with an ingestion rate of 1k msg/s with a message size of 1kb?

Map

07/23/2021, 8:55 PM

Oh by the way, this link https://pinot.readthedocs.io/en/latest/tuning_realtime_performance.html referred the tuning page is now broken

Subbu Subramaniam

07/24/2021, 12:03 AM

ik msgs/s per partition? How many partitions? We have use cases ingesting at 50k msgs per partition.

Map

07/27/2021, 8:39 PM

In this test case, only 1 partition. It's not the ingestion latency but the segment build/flush latency we are discussing here. How often do you flush and how long does the segment build take for ingesting from a 50k msg/s partition? Some benchmark numbers would be really helpful.

Subbu Subramaniam

07/27/2021, 10:17 PM

@User (1). What kind of ingestion latency can you tolerate? (i.e. I am assuming here that the reason you dont want segment build to take long is because we stop consuming during that time. Are there other reasons? If so, let us know). (2) I have tried to explain with my prev comment that there is no universal benchmark. It depends on a variety of factors that I have mentioned, It will be best to kmow some answers form you. Ingestion rate, schema, indexing techniques used. etc. Output from realtime helper will also be useful to tune your use case

Map

07/28/2021, 9:02 PM

It would be great if we can do within a second but we can tolerate a latency of 2 or 3 seconds. Here is the output from the helper, which honestly i don't know how this can help us decide the flush settings in this case.

Copy code

RealtimeProvisioningHelper -tableConfigFile ./temp_schema.yml -numPartitions 1 -pushFrequency null -numHosts 1 -numHours 1,2,3 -sampleCompletedSegmentDir ./temp__0__4__20210728T1624Z/ -ingestionRate 1000 -maxUsableHostMemory 128G -retentionHours 1

 

Note:

 

* Table retention and push frequency ignored for determining retentionHours since it is specified in command
* See <https://docs.pinot.apache.org/operators/operating-pinot/tuning/realtime>

 

Memory used per host (Active/Mapped)

 

numHosts --> 1               |
numHours
 1 --------> 3.61G/70.62G    |
 2 --------> NA              |
 3 --------> NA              |

 

Optimal segment size

 

numHosts --> 1               |
numHours
 1 --------> 1.43G           |
 2 --------> NA              |
 3 --------> NA              |

 

Consuming memory

 

numHosts --> 1               |
numHours
 1 --------> 3.61G           |
 2 --------> NA              |
 3 --------> NA              |

 

Total number of segments queried per host (for all partitions)

 

numHosts --> 1               |
numHours
 1 --------> 1               |
 2 --------> NA              |
 3 --------> NA              |

Map

07/28/2021, 9:02 PM

Currently each flushed segment takes 10+ mins to build

Subbu Subramaniam

07/28/2021, 9:28 PM

I had asked a bunch of questions along with this (pls see my earlier msg for the full list) : How many partitions do you have? What is your table config? How many servers do you have hosting this table (you have given 1 as input to the helper. Do you ahve only one host?)? In any case, I have never seen segment completion happen within 2 to 3 seconds. I am curious why u have a limitatoin of 3 seconds, and how you can tolerate more than 3 seconds if you complete segments at midnight

Map

07/28/2021, 9:38 PM

Yes 1 portion and 1 server right now. We are in the process of evaluating if Pinot can satisfy our requirements. We can tolerate more at midnight but we will probably run out of memory already if we don’t flush intraday…

Map

07/28/2021, 9:47 PM

I will manually test with a smaller segment size (50MB or 100MB) and see how fast Pinot can get. It would nice if the helper could print some stats on it.

Map

07/28/2021, 10:21 PM

And there is nothing special about our table config:

Copy code

"segmentsConfig": {
    "schemaName": "temp",
    "timeColumnName": "MSGTIME",
    "replicasPerPartition": "1",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "2",
    "segmentPushType": "APPEND",
    "completionConfig": {
      "completionMode": "DOWNLOAD"
    }
  },
  "tableIndexConfig": {
    "invertedIndexColumns": [
      "AAA","BBB"
    ],
    "loadMode": "MMAP",
    "nullHandlingEnabled": false,
    "streamConfigs": {
      "realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.segment.size": "50M",
"streamType": "kafka",
      "stream.kafka.consumer.type": "lowLevel",
      "stream.kafka.topic.name": "temp",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.broker.list": "kafka:9092",
      "stream.kafka.consumer.prop.auto.offset.reset": "largest"
    }

Subbu Subramaniam

07/28/2021, 10:33 PM

Set segment size instead of time-based. https://docs.pinot.apache.org/operators/operating-pinot/tuning/realtime#tuning-realtime-performance

Open in Slack

Previous Next