:question: [ PyFlink vs Flink Deltas ] :wave: Hey ...
# troubleshooting
d
[ PyFlink vs Flink Deltas ] 👋 Hey team, quick question! Can anyone help me understand the current differences between
pyflink
and
javaflink
flink
especially in the table API and datastream APIs? If there's any documentation that tracks the evolution or roadmap, I'd appreciate a link! Thanks! 🙏
👍 1
m
Where are you getting
javaflink
from?
d
sorry, should not call it
javaflink
, should just be
pyflink
vs
flink
.
m
I guess the difference is that PyFlink is a Python implementation, which is wrapped around Flink's Java APIs
d
Thank you for your prompt response. We are evaluating pyflink vs flink adoption for our company. I have a few more questions to clarify. One of the important factor is, how far lag behind pyFlink is compared to java APIs. For example, we are not seeing the DataStream join API in the pyFlink doc (correct me if I'm wrong). • How far behind pyFlink dataStream API is compared to flink java Data Stream API? • How far behind pyFlink table API is compared to flink java table API? We would greatly value your response, as your insights are crucial in guiding our decision-making process. Would be great if there is any pointers to the official documentation around those 2 essential questions.
m
There is no pySpark DataStream or Table API
I’m assuming you mean PyFlink vs Flink’s Java implementation
I think that the majority of all APIs are available in PyFlink. They are basically wrappers around the existing Java APIs
d
yes, yes, yes, sorry for the typo, I mean PyFlink
awesome!!! Glad to hear that majority of all APIs are avaialble in PyFlink!!
May I ask, what are the things that only available in Java APIs, but not in PyFlink yet? I assume even that is the case, PyFlink are actively catching up with the gap? Is there any reason that ppl would choose Java APIs over PyFlink, given the context that they are in a relatively large company which contains various kinds of streaming pipeline needs.
m
Because the Java APIs existed earlier
d
If that is the only reason, then it can be ignored, we are just started to adopting, nothing exists for most of the usecases.
I think your answer make sense! Maybe it is the documentation, somewhat makes prior research ppl think PyFlink lags behind Java API a lot 🤔. But seems we already caught up with most of Java APIs till today. So PyFlink should be a good choice, given that it can integrate with ML, existing python UDF, panda, seamlessly
For example, in this Data Stream Join API documentation, it listed out java and scala as language examples, but not listing out pyFlink as a choice, which makes ppl think pyFlink does not support DataStream joining. And this detailed API doc as well, not listing join
m
That’s because the documentation for Python is separate from the Java API
Like I said, the Python implementation is a wrapper around Java, so the functionality is there
🙌 1
d
awesome!!! That is my understanding as well..
I will take it as it is just due to the documentation's missing, the actual functionality is already there.
The following seems are cleared declared in the doc as not supported in pyFlink though • Keyed window join in a common window, interval join on keyed streamAsync IONo Testing Support • Data types - byteString, etc
m
There's still a lot of development happening on this CC @Dian Fu @Xingbo Huang
🙌 1
🎉 1
d
This is our current research doc on delta between PyFlink vs Flink (PyFlink != Flink doc). Feel free to comment on it. Would be happy to hear about the roadmap, etc., if possible. : )
🙌 1
b
Added comment:
Minor: No ConfluentRegistryAvroDeserializationSchema support
We need to provide the schema manually.
1