whats the relationship between paimon and other unified batc Apache Flink #troubleshooting

whats the relationship between paimon and other un...

Sharath Gururaj

09/12/2023, 2:38 PM

whats the relationship between paimon and other unified batch/streaming framework like flink, spark etc? am I correct in saying that paimon seems to offer a unified api which can use any runtime engine like flink, spark, etc. Just to give an analogy, paimon relates to flink,spark in the same was slf4j relates to log4j,logback.. is that accurate to say?

Giannis Polyzos

09/12/2023, 2:45 PM

Hi @Sharath Gururaj 👋 paimon started as a streaming storage layer for apache flink (prev. known as flink table store). since then it evolved a lot and its implementation is open so other frameworks like spark, starrocks, trino etc. can leverage it. You can think of it similar to apache hudi and apache iceberg but designed specifically for streaming first architectures. It's already used heavily in production, its quite stable and the project aspires to graduate the incubator within the next year.

Giannis Polyzos

09/12/2023, 2:48 PM

there is a strong integration with flink and the project offers amazing features for cdc, upserts, joins etc. you can even use it to replace some kafka workloads when second-level latency is not required.

Sharath Gururaj

09/12/2023, 2:51 PM

the project offers amazing features for cdc, upserts, joins etc.

but this is where it gets a bit confusing.. even apache flink is supposed to offer these out of the box.. upserts offered as part of dynamic table and joins are already well supported in flink.. disclaimer.. sorry if these seems to be noob questions. I'm new to flink.. i've been reading the flink documentation extensively for the past few weeks but only familiar with paimon over the past couple of days

Giannis Polyzos

09/12/2023, 2:53 PM

Flink offers all the above, but doesn't really offer storage. All the above features mentioned aim to leverage cheap storage like s3 and perform these kind of operation. Like building a streaming data warehouse on cheap storage. A Dynamic Table materializes data and performs computations but the intermediate results are not actually stored somewhere and are not directly queyrable. But what apache paimon offers is a missing storage layer, you can leverage and directly query dynamic tables.

Sharath Gururaj

09/12/2023, 2:56 PM

i see.. got it.. so it is in that sense that paimon is similar to hudi and iceberg.. just to confirm my understanding, Flink dynamic tables materializes the data in its shared rocksDB storage and stores the metadata in hive metastore.. whereas paimon directly uses S3 etc for storage.. is that correct?

🙏 1

🎯 1

Giannis Polyzos

09/12/2023, 2:57 PM

correct

🙇 1

Open in Slack

Previous Next