Currently we are using pyflink1 16 1 to read data from `flin Apache Flink #troubleshooting

Currently we are using pyflink1.16.1 to read data ...

windwheel

08/30/2024, 1:12 AM

Currently we are using pyflink1.16.1 to read data from

flink-connector-clickhouse

. Due to the historical reasons of the company's framework, pyflink has to be used. Since in batch mode, pyflink1.16.1 only supports writing udf through pandas udf. . I wrote a multi-column switching function, which ran stably in SQL. I performed memory-based tuning twice. Unfortunately, the parameters of my first tuning were lost. But I roughly remember that they were adjusted

Copy code

taskmanager.memory.process.size: 4gb
taskmanager.memory.network.fraction: 0.1
taskmanager.memory.managed.fraction: 0.4

make it effective But when I tuned for the second time, when pyflink executed the

over window aggration

operator The operator is always in the INITIALIZING state and no data flows in. The parameters are as follows

Copy code

taskmanager.memory.process.size: 4gb
taskmanager.memory.network.fraction: 0.3
taskmanager.memory.managed.fraction: 0.45
taskmanager.memory.jvm-overhead.fraction: 0.1
taskmanager.memory.framework.off-heap.size: 128mb
taskmanager.memory.managed.consumer-weights: OPERATOR:60,STATE_BACKEND:60,PYTHON:40

Since there is too little information about pyflink on the Internet, after reading the source code, judge based on the logs Log: Obtained shared Python process of size 536870920 bytes It may be that the python interpreter process estimated by pyflink based on managed memory requires too much memory. The machine does not have too much memory and cannot start the python interpreter process. What makes me curious is that the total memory of managed memory is configured as 4g. Why is the estimated memory so large? What configuration will I have to do to make it receive data properly and send it downstream? Slack Conversation

Open in Slack

Previous Next