whats the relationship between paimon and other un...
# troubleshooting
s
whats the relationship between paimon and other unified batch/streaming framework like flink, spark etc? am I correct in saying that paimon seems to offer a unified api which can use any runtime engine like flink, spark, etc. Just to give an analogy, paimon relates to flink,spark in the same was slf4j relates to log4j,logback.. is that accurate to say?
g
Hi @Sharath Gururaj πŸ‘‹ paimon started as a streaming storage layer for apache flink (prev. known as flink table store). since then it evolved a lot and its implementation is open so other frameworks like spark, starrocks, trino etc. can leverage it. You can think of it similar to apache hudi and apache iceberg but designed specifically for streaming first architectures. It's already used heavily in production, its quite stable and the project aspires to graduate the incubator within the next year.
there is a strong integration with flink and the project offers amazing features for cdc, upserts, joins etc. you can even use it to replace some kafka workloads when second-level latency is not required.
s
the project offers amazing features for cdc, upserts, joins etc.
but this is where it gets a bit confusing.. even apache flink is supposed to offer these out of the box.. upserts offered as part of dynamic table and joins are already well supported in flink.. disclaimer.. sorry if these seems to be noob questions. I'm new to flink.. i've been reading the flink documentation extensively for the past few weeks but only familiar with paimon over the past couple of days
g
Flink offers all the above, but doesn't really offer storage. All the above features mentioned aim to leverage cheap storage like s3 and perform these kind of operation. Like building a streaming data warehouse on cheap storage. A Dynamic Table materializes data and performs computations but the intermediate results are not actually stored somewhere and are not directly queyrable. But what apache paimon offers is a missing storage layer, you can leverage and directly query dynamic tables.
s
i see.. got it.. so it is in that sense that paimon is similar to hudi and iceberg.. just to confirm my understanding, Flink dynamic tables materializes the data in its shared rocksDB storage and stores the metadata in hive metastore.. whereas paimon directly uses S3 etc for storage.. is that correct?
πŸ™ 1
🎯 1
g
correct
πŸ™‡ 1