Hey everyone,
we are using the
iceberg sink to write data to an iceberg table(using hive catalog, storage S3) in
table-format v2, upsert enabled. We have written ~25 million records on a table (daily partitioned) and the Table sort order is on 3 fields(all string, having max 32 char length).
We are in the POC phase only, and the data is written successfully(only 25m records in 1 day partition, no other data). We have checkpoint of 60s, and around ~350 files are written in the S3. Each file size range from ~10kb to ~3mb.
Issue: When we are trying to read data from this table using trino, our trino workers occupy whole memory(4 workers, each 20gb) and Major GC trigger frequently and eventually they die. Earlier, I thought we might have a problem with trino infra.
So I wrote a flink job to read this data simple query like (select a, b, c, count(1) from table group by a,b,c; This query will have max 5-6 rows). And surprisingly, we face the same issue of heap OOM in TM.
Then I came across this doc,
https://iceberg.apache.org/docs/latest/flink-actions/ to rewrite small files into larger one's. So I wrote another flink job with 4 parallelism, TM (memory 16 GB, 8 core CPU) to perform this re-write action. This job runs for some time and eventually, the TM dies. I can see TM CPU going 100%, heap also ~95%.
I went through a bunch of github issues like
https://github.com/apache/iceberg/issues/6104 but not really able to figure out what is causing the issue.
Flink Version: 1.16.0
Iceberg version: 1.4.1