Hi, is there any support for PubSub in the Python ...
# troubleshooting
j
Hi, is there any support for PubSub in the Python DataStream API? I see that there is a Java connector however I don’t know if cross-language usage exists in Flink as it does in Beam.
d
@Jakub Janowski PubSub is still not supported in Python API. It shares the same connector implementation between Java and Python API. The Python connector implementation is just a wrapper of the Java implementation. To use PubSub in Python DataStream API, you could do the following: • Provide a wrapper in Python API, could refer to how the other connectors are supported for more details: https://github.com/apache/flink/tree/master/flink-python/pyflink/datastream/connectors • During submitting the job, specifying PubSub connector JAR file (and also the recursive dependencies if there are, an alternative option is built a fat jar which contains the dependencies) per the following documentation: https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/python/dependency_management/#jar-dependencies
j
Thank you @Dian Fu, do you know if PubSub support in Python API is planned?
d
@Jakub Janowski I’m not aware and it depends on if there are contributors who’d like to contribute. The implementation itself should be very easy.
j
Just checked and there is even a PR for it: https://github.com/apache/flink/pull/20627
d
Yes, the PR isn’t get updated. It seems that the contributor wasn’t working on it any more.
m
The PR is also outdated, since the GCP PubSub connector isn't in the Flink main repo anymore
j
And where should the Python implementations be now? I checked the kafka and rabbitmq connector repos and I don’t see any python code there.
m
I think there's still an open end in the PyFlink implementation so that PyFlink is decoupled for the actual Flink connector releases. But I need @Dian Fu his input on that 🙂
d
@Jakub Janowski @Martijn Visser The implementations are currently still in the PyFlink repo instead of separate connector repos. I prefer to keep the way it is for the following reasons: • Python implementation for connectors is a very thin layer(just wrapper of the API). • Moving them to separate connector repos may introduce a lot of burden, e.g. building Python tests for each connector, publishing each connector to PyPI, etc
m
@Dian Fu As long as you’re aware that external connector changes potentially can break the Flink build I have no hard objections. Let’s see how things work over time
d
@Martijn Visser Many thanks for the reminder :) Let’s just run it for now and see if there are problems. We could improve this if needed.
👍 1