Apache Flink #troubleshooting

Sucheth Shivakumar

05/11/2023, 3:22 AM

I was checking the CoProcessFunction example mentioned here -- https://nightlies.apache.org/flink/flink-docs-stable/docs/learn-flink/etl/#connected-streams For some reason it doesnt filter for me and prints everything out But If I set env.setParallelism(1); It works as expected. Can someone point out what am i missing here ? Why doesn't t work if parallelism is not 1.

👀 1

Abhinav Ittekot

05/11/2023, 5:46 AM

hello, is it possible to change the compression algorithm for RocksDB state via flink config? I believe it's snappy by default.

Michael Kreis

05/11/2023, 11:06 AM

Hi everyone, I'm trying to setup the flink in standalone mode on openshift with the k8s operator. However I'm getting errors on the job manager when starting a job:

....is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on

According to this documentation an additional permission to

deployments/finalizers

is needed, which is not mentioned in you documentation. Is that just missing on the documentation or am I doing something wrong?

Amenreet Singh Sodhi

05/11/2023, 11:20 AM

Hi team, Is flink compatible with jdk 17?

Kirill Lyashko

05/11/2023, 11:33 AM

Hi everyone. During investigation FlinkOperator’s functionally I’ve faced follow issue: Upgrade mode is set to

last-state

. However when I changed the parallelism job master has failed once during startup, because it couldn’t reschedule state from latest checkpoint and then the job has been started without any state by operator. And I don’t see anything in logs what would explain such behaviour. Does anyone faced anything similar? I would be curious to know if it’s a feature or bug. And could be the behaviour configured to fail job in such cases?

Adam Fleishaker

05/11/2023, 1:52 PM

Hi all! My team has some M1 Macs for our developers, and we’re using Flink 1.13 for our applications. We use Flink Docker images for testing, but they crash intermittently with the M1s trying to run the AMD image. 1.15 is the ultimate goal since they have ARM images among other improvements, but would be a lift to cut our applications over to. Is there any way to get a native ARM image for 1.13 or has anyone made their own custom that we can use?

El Houssine Talab

05/11/2023, 2:35 PM

Hey folks, We are looking into performing deduplication + over aggregation (flink v1.15)? We tried the below but fell into error:

StreamPhysicalOverAggregate doesn't support consuming update and delete changes which is produced by node Deduplicate(keep=[LastRow], key=[home, room, sn], order=[ROWTIME])

Any thoughts? Thanks.

Copy code

WITH t1 AS (
    /* DEDUPLICATION */
    SELECT 
        home, 
        room, 
        sn, 
        temp,
        event_ts, 
        arrival_ts
    FROM ( 
        SELECT 
            *, 
            ROW_NUMBER() OVER (PARTITION BY home, room, sn ORDER BY event_ts DESC) AS row_num
        FROM telemetry
        WHERE 
            event_ts >= CURRENT_WATERMARK(event_ts)
            AND ABS(TIMESTAMPDIFF(SECOND, event_ts, arrival_ts)) <= 60
    )
    WHERE row_num = 1
)

/* OVER AGGREGATION */
SELECT 
    home, 
    room, 
    sn, 
    event_ts, 
    arrival_ts, 
    AVG(temp) OVER ( 
        PARTITION BY home, room
        ORDER BY event_ts
        RANGE BETWEEN INTERVAL '1' MINUTE PRECEDING AND CURRENT ROW
    ) as avg_temp
FROM t1

Sucheth Shivakumar

05/11/2023, 2:56 PM

https://apache-flink.slack.com/archives/C03G7LJTS2G/p1683775361922889 Can someone please help me on what is happening here

Conor McGovern

05/11/2023, 3:02 PM

Hi everyone, the flink docs for kafka state the following regarding upgrading the flink connector:

Copy code

Do not upgrade Flink and the Kafka Connector version at the same time.

What is the problem with upgrading both at the same time? Thanks for your help. (Link to the docs in question: https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/#upgrading-to-the-latest-connector-version)

Mingliang Liu

05/11/2023, 11:09 PM

Hi team, currently the

Job Manager -> Configuration

page shows different configurations under different sections. This is good for categorizing them. But the height of each section is too short, and we need to scroll too much to find a specific configurations. Also searching is not showing all of the matches unless we hit

Enter

multiple times or scroll. Is it a good idea to put different sections of JM configuration to different tabs, similar to

Running Jobs -> Checkpoints

Adam Richardson

05/12/2023, 4:45 AM

Hi there -- I was wondering if there was any roadmap items, design docs, or general thinking around support for Iceberg merge-on-read positional deletes in Flink. If I understand correctly Flink supports equality deletes only for Iceberg merge-on-read updates (whereas Spark supports positional deletes only). I'm trying to understand if that's just a feature gap in the current state for Flink or a more fundamental platform-level limitation. I'm investigating a solution for streaming ingestion into an Iceberg-based data lake, but have concerns about the read side performance for equality deletes

El Houssine Talab

05/12/2023, 8:06 AM

Any insights on how to do this?

Dheeraj Panangat

05/12/2023, 10:47 AM

Hi Team, Getting following error when trying to join 2 different tables with same table.

Copy code

Hash collision on user-specified ID "uid_split_monitor_tableRoot". Most likely cause is a non-unique ID. Please check that all IDs specified via `uid(String)` are unique.
	at org.apache.flink.streaming.api.graph.StreamGraphHasherV2.generateNodeHash(StreamGraphHasherV2.java:185)
	at org.apache.flink.streaming.api.graph.StreamGraphHasherV2.traverseStreamGraphAndGenerateHashes(StreamGraphHasherV2.java:110)

Our Eg:

Copy code

table1 = tableEnv.sqlQuery("");
table2 = tableEnv.sqlQuery("");

tableRoot = tableEnv.sqlQuery("");

table3= table1.join(tableRoot);
table4= table2.join(tableRoot)

Has anyone faces this ? Anything we are doing wrong here ? Appreciate any help. Thanks

Sumit Singh

05/12/2023, 11:11 AM

How to register JDBC MySQL Catalog ? I am using flinkSQL Client

aswin jose

05/12/2023, 11:59 AM

Apache flink tumbling window or any window always applying grouping and subgrouping as part of aggregation, but i have a situation to collect data in a tumbling window without grouping or aggregation in flink sql client. Is this possible in apache flink?

Kaushalya Samarasekera

05/12/2023, 2:06 PM

Hi, this is a very novice question. A basic operation is failing for me therefore seeking guidance. I'm creating the following table in Flink with kafka as a sink. I'm running the sql statements in sql-client in interactive mode. The topic

transactions

(1 partition, 1 replica) exists in kafka, running on a wurstmeister/kafka docker container.

Copy code

CREATE TABLE transactions
(
    `account_id` BIGINT,
    `amount`     BIGINT
) WITH (
      'connector' = 'kafka',
      'topic' = 'transactions',
      'properties.bootstrap.servers' = 'localhost:59092',
      'properties.group.id' = 'testGroup',
      'scan.startup.mode' = 'earliest-offset',
      'format' = 'json');

Then I run

Copy code

insert into transactions  values (1, 23);

After the above insert, I expected the record to appear as an event in the

transactions

topic but it does not. If I run select * from transactions, I can see the record in the table. Just not in kafka. Any pointers would be much appreciated.

Simon Dahlbacka

05/12/2023, 2:13 PM

I have a problem with flink sql function

to_date

, docs say

TO_DATE(string1[, string2])

, I am calling it using

Copy code

TO_DATE(CAST(`OAORDT` as string), 'yyyyMMdd') AS `orderDate`,

Invalid function call:\nTO_DATE(STRING, STRING NOT NULL) Expected signatures are:\nTO_DATE(*). Any Ideas what is wrong? flink 1.16.1

Nizar Hejazi

05/14/2023, 6:38 AM

Hello, I have few questions about Flink’s dynamic tables: • Flink’s dynamic tables can be converted into an upsert stream (requires a unique key). ◦ But docs mentioned: “Please note that only append and retract streams are supported when converting a dynamic table into a `DataStream`”. • Flink’s dynamic tables can be written to an external system. ◦ From slide 17 here, looks like we can save ~50% of traffic if downstream system supports upserting. ◦

WITH ('connector'='kafka', 'format' = 'debezium-json')

supports: ▪︎ insertion, update before, update after, and delete ◦ What does the community use as an external system for dynamic tables? • There are query restrictions (state size for continuous queries evaluated on unbounded streams that need to update perv. emitted results.) ◦ Does Flink has to maintain this state even when dynamic table is written to external system? ◦ Any benchmark for state size for different queries? Thanks

Nicholas Erasmus

05/15/2023, 8:11 AM

Hi everyone, I have a stateless application (Flink 1.16.1) that consumes at a rate of about 12 records per second and uses flatmap to write n records to one (Delta Table) sink and m records to another (Delta Table) sink. My Task Manager's heap memory usage seems incredibly high for such a simple task. Does anyone know why this would be? Or how I can go about decreasing this? I would think that a stateless application would have a way smaller memory footprint. Here's a pic of the memory usage

Barisa Obradovic

05/15/2023, 9:21 AM

We are having problems upgrading from flink 1.13 to flink 1.17, since we are running older version of zookeeper ( 3.4.x). Starting with Flink 1.15 , flink is using Apache Curator library from 2.x to 5.2. “Apache Curator is a Java/JVM client library for Apache ZooKeeper, What that means is described in https://curator.apache.org/breaking-changes.html, but short version is that Zookeeper 3.4.x is no longer supported from flink 1.15. Did anyone manage to use older version of apache curator in latest flink, so we can still use our old zookeeper ( which we can't upgrade at this time )

👀 1

Ron Ben Arosh

05/15/2023, 10:47 AM

Hi 🙂 I’m running Flink 1.17. I’m trying to use both reactive mode and unaligned checkpoints. When flink restart itself it fail to start with job manager exception:

Copy code

Cannot rescale the given pointwise partitioner.\nDid you change the partitioner to forward or rescale?

The filnk job currently contain only ‘forward’ arrows between operators. This job contain no broadcast and no explicit shuffles. How this issue can be solved? or more broadly: how can flink be both autoscaled and contain stable checkpoints?

Abhishek Joshi

05/15/2023, 11:27 AM

Hi Team, We are using flink table APIs to create a temporary table in flink using jdbc connector for the connection with the postgres. We came across the scenario where we want to update that temporary flink table at runtime as soon as we added a new entry into the underlying postgres table. And then perform some operations on that table. can anyone help us regarding this? Thanks

Rashmin Patel

05/15/2023, 12:09 PM

Hii all I want to understand savepoint v/s checkpoint in little more detail. What all extra things are stored in savepoint compared to checkpoint ? I can see there is a significant difference in their sizes.

田明刚

05/15/2023, 12:11 PM

Hi all

田明刚

05/15/2023, 12:14 PM

Hi all: I checkpoint data to hdfs ,run for 2 months : Caused by: org.apache.flink.util.SerializedThrowable: The directory item limit of /flink/checkpoints/da0e3d92f6eb46a18190463abafe1af2/9ea0a05029b62b35e2b98eebb53e4bd9/shared is exceeded: limit=1048576 items=1048576 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:1307)

Jean-Baptiste PIN

05/15/2023, 1:42 PM

Hi, is it possible that the flink-connector-mongodb is not publish on maven repo ?

Jean-Baptiste PIN

05/15/2023, 1:42 PM

I got 404 https://repo.maven.apache.org/maven2/org/apache/flink/flink-connector-mongodb/1.0.0-1.17/flink-connector-mongodb-1.0.0-1.17.pom

David Wisecup

05/15/2023, 3:07 PM

Is doing local development with Kafka possible? When I have the flink dependencies in my pom marked as provided I get this runtime error:

Caused by: java.lang.NoClassDefFoundError: org/apache/flink/api/common/serialization/DeserializationSchema

When I change the flink deps as a normal scope I get further, but get this runtime error:

Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make field private final java.lang.Object[] java.util.Arrays$ArrayList.a accessible: module java.base does not "opens java.util" to unnamed module @6366ebe0

Jiahao Wang Zhuo

05/15/2023, 4:12 PM

Hi, I am trying to create a

FileSink

using the parquet BulkFormat (on 1.16). I wanted to create a custom rolling policy based on the file size and processing time. I see on FLINK-13027 that this was supported for

StreamingFileSink

since Flink-1.10. However, when I try to extend the

CheckpointRollingPolicy

, it still rolls only on checkpoint times or

org.apache.flink.util.SerializedThrowable: java.lang.UnsupportedOperationException: Bulk Part Writers do not support "pause and resume" operations.

(if I set

shouldRollOnCheckpoint

to false). Would I be correct to say that custom rolling policies are not supported for

FileSink

but only for

StreamingFileSink

instead (which is marked as deprecated from the docs)? If so, do you have any plan on supporting that? Is there any blocker/challenge on that being possible?

Trystan

05/15/2023, 5:29 PM

if i’m creating a source for use in batch mode, do i need a

Split

as well as a

SplitState

? if i understand correctly, the

SplitState

would be the thing maintaining state as the

SourceReader

reads through the split… but if we’re in Batch execution mode, would it matter? if the source is not checkpointed (batch mode), wouldn’t the

SourceReader

need to resume from zero and reread the entire split? i feel like i’m missing something. concrete example: a dynamo scan. segments map to splits, but within a segment the results may be paginated. i could store that pagination key… but does it matter? if the task fails, wouldn’t all data in the immediate downstream persistence layer be wiped anyway? and if so i’d need to resume from zero, not the partially paginated result. the connection between batch recovery and the persisted intermediate storage isn’t super clear to me in this mode