Hi team I have a question which didn t manage to find the an Apache Flink #troubleshooting

Hi team, I have a question which didn't manage to...

Ivan Burmistrov

04/04/2023, 3:14 PM

Hi team, I have a question which didn't manage to find the answer. Assume we have a Flink SQL job in streaming mode and defined a table on top of Kafka, and after that use this table multiple times in the query. For instance like this:

Copy code

SELECT * FROM myKafkaStream WHERE col1 = "a"
UNION ALL
SELECT * FROM myKafkaStream WHERE col1 = "b"

Is it a valid usage? I was under impression that it's not valid because it would effectively move the Kafka pointer twice. However tried it recently and it seems working, so I'm confused a bit

Martijn Visser

04/04/2023, 3:17 PM

Does https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/set-ops/ help?

Ivan Burmistrov

04/04/2023, 3:18 PM

It doesn't answer - all the examples use different tables, not the same one. And UNION ALL is just an example. The question is of the usage of the same stream multiple times in subqueries of the same query

Ivan Burmistrov

04/04/2023, 3:19 PM

I mean I can create multiple tables on top of Kafka of course (this is what we're doing now). The question is what happens when we use the same table

Martijn Visser

04/04/2023, 3:21 PM

If you run an

EXPLAIN PLAN

you will see the generated query plan. My expectation is that it will include both filters and then union these results

Ivan Burmistrov

04/04/2023, 4:09 PM

Hmmm interesting. So this is the snipped of the query plan. These represent 2

SELECT

from the same Kafka: the first one is

TableSourceScan

, but the second one is

Reused(reference_id=[1])

. So looks like it scans the stream once, but passes the events to both subqueries independently. Which means referencing the same table in the subqueries is correct - is it the right understanding @Martijn Visser? I don't remember why, but I have a strong memory it didn't work before and that's why we ended up creating multiple table definitions on top of the same source (and using multiple independent consumer groups, correspondingly) - can it be that this behavior was fixed / changed in the recent Flink versions?

Open in Slack

Previous Next