```ERROR: Cannot install apache-flink-ml==2.2.0 an...
# troubleshooting
a
Copy code
ERROR: Cannot install apache-flink-ml==2.2.0 and apache-flink==1.17.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested apache-flink==1.17.0
    apache-flink-ml 2.2.0 depends on apache-flink==1.15.1

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit <https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts>
I'd like to use these together, but I cannot. (I could use newer ones.) (I'd like to get the KBinsDiscretizer.)
m
I don't think that Flink ML has yet been made compatible with Flink 1.17
a
Meh. Thanks for the answer anyway.
m
Let me ping some of the Flink ML maintainers on this
👍 1
a
Flink ML is something of the 'try if that works' for me. I would much rather have https://datasketches.apache.org if possible. 🙂
I created my own UDFs for percentiles, but I have a feeling they're somehow low performance due to interactions with the Flink APIs. Their calculation take 3-5x standard Flink AVG() or so, and they use much more memory for state storage.
They're using the Java t-digest reference implementation.
d
@Ari Huttunen most recent release of Flink ML 2.2.0 depends on Flink 1.15. The latest master branch of Flink ML depends on Flink 1.17. There are instructions to build Flink from source [1] and then run it in python [2] or java [3]. Would that work for you? [1] https://nightlies.apache.org/flink/flink-ml-docs-master/docs/development/build-and-install/ [2] https://nightlies.apache.org/flink/flink-ml-docs-master/docs/try-flink-ml/python/quick-start/ [3] https://nightlies.apache.org/flink/flink-ml-docs-master/docs/try-flink-ml/java/quick-start/
a
I can see about that in the coming days.
d
Regarding the performance, you are right that Flink ML 2.2 can be serval times slower than a simple UDF for use-cases with very low computation overhead. This is primarily due to the unnecessary overhead in Flink runtime, the overhead of conversion between table and datastream, and the overhead of Flink Table’s Row. Just FYI, we have investigated the performance overhead and fixed most of them. For example, we optimized runtime overhead in Flink 1.17 with [1] [2] and optimized the overhead of conversion between table/datastream in Flink 1.18 [3]. These PR can bring increase the Flink ML throughput by 3X for simple algorithms, The PR description has more details. Once Flink 1.18 is released, we will upgrade Flink ML to depend on Flink 1.18. .Then Flink ML’s throughput should be much higher than it is, probably more than 70% of the throughput of using a Java UDF directly. The only remaining reason why Flink ML can still be slower than a simple UDF, is due to the use/overhead of Flink Table’s Row. This optimization would require considerably more involved work. [1] https://github.com/apache/flink/pull/21576 [2] https://github.com/apache/flink/pull/21579 [3] https://github.com/apache/flink/pull/22262
👀 1
And regarding the version, there is indeed need for each Flink ML release to support multiple recent Flink versions (e.g. Flink 1.15, 1.16, 1.17). Each Apache Paimon release also supports multiple recent Flink versions. https://paimon.apache.org/docs/0.4/engines/flink/. We will follow Apache Paimon’s practice as example and support this in Flink ML. Then we will make another Flink ML release. This will likely be done in the coming month.
Hi @Ari Huttunen, FRY, Flink ML 2.3 has been released. The python package of Flink ML 2.3 supports Flink 1.17. And maven artifacts of Flink ML 2.3 supports Flink 1.15/1.16/1.17. If you want to build a Java project using Flink ML maven artifacts, you might need to specify the target Flink version in the maven artifactId. Feel free to checkout https://nightlies.apache.org/flink/flink-ml-docs-master/docs/try-flink-ml/java/build-your-own-project/ for example.
🙌 2