Hello all, I was wondering if someone can provide...
# random
j
Hello all, I was wondering if someone can provide some insights on this. We are building a low latency processor using flink and we use kinesis streams and the managed apache flink in AWS to do that. We are using the python datastream API and it seems that there is some buffering of events happening in flink that adds a few seconds of latency. Based on the flink documentation, STREAMING mode should process the events immediately and we are not using any window functions. So we are a bit lost as to why the events are not processed the streamed to the sink immediately. This is a sample log that records the eventtime in flink by injecting a time event via a map operator on the event.
Copy code
event1 - data event_time=2023-10-05T07:28:18.985605, latency=6505, flink_received_latency=1184, flink_to_kcl_latency=5321
event2 - data event_time=2023-10-05T07:28:20.046061, latency=5444, flink_received_latency=123, flink_to_kcl_latency=5321
event3 - data event_time=2023-10-05T07:28:21.102405, latency=4388, flink_received_latency=1066, flink_to_kcl_latency=3322
event4 - data event_time=2023-10-05T07:28:22.166583, latency=3324, flink_received_latency=2003, flink_to_kcl_latency=1321
> event_time - is the event time when the kinesis record added via a client script. > flink_received_latency - is the latency between the flink event time (added via a map operator) and data event. > flink_to_kcl_latency - is the latency between the flink event time (added via a map operator) and the time it took for it to be sent to sink and read by a kcl app. > latency - overall latency We used the enhanced fanout as well but it didn't help with the latency. Setting the max batch size on the kinesis sink seems to help reduce the latency to around (1.5 - 3 seconds) but this is still not enough in our use case. A full sample script is in github for your reference - https://github.com/jp6rt/pyflink1-15-kinesis-latency/tree/main/app I hope someone can provide some insights on this.
r
I think you might want to read through this: • https://www.alibabacloud.com/blog/everything-you-need-to-know-about-pyflink_599959 ◦ across the stack between checkpointing, python needing to buffer to process from stream or to sink, udfs - latency might be introduced across the processes for better reliability. ◦ the link also talks about thread mode, which might help for ur scenario.
j
Thanks for the tips @RootedLabs, I've yet to try the thread mode. What seemed to help for us was to reduce the python.fn-execution.bundle.time to less than a second. :)
1
r
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/python/python_config/#python-fn-execution-bundle-time • you are on point, thanks this is good to know. • the example on the page also seem to suggest having it part of the config code snippet.
1
i think this config may also help: python.fn-execution.bundle.size
1
j
Will also check this. Thanks a lot 🙌