Apache Flink #troubleshooting

Donatien Schmitz

05/17/2023, 8:34 AM

Hi, anyone having trouble downloading archives from https://archive.apache.org/dist/flink ? (Or even https://archive.apache.org/dist). Download is drastically slower than it is usually (tested yesterday). I'm in EU-West zone.

✅ 1

Iat Chong Chan

05/17/2023, 8:36 AM

Hi Team, I have a question about using JDBC as source in my Flink SQL/Table Application. Does JDBC connector support streaming mode? It seems the results from it is unordered by the partitioned column. Checking the implementation on github does not seem there is any ordering mechanism there.. Does that mean JDBC only supports batch job?

Iat Chong Chan

05/17/2023, 8:39 AM

The second question is that I wonder if temporal join is supported in batch mode? I tried to execute a job with temporal join in batch mode, but the sql planner reported: "unexpected correlate variable". I did probe around and the most related issue seems to be this one: https://issues.apache.org/jira/browse/FLINK-18825 Just want to confirm that it is not supported yet by the experts here!

Slackbot

05/17/2023, 9:23 AM

This message was deleted.

Elizaveta Batanina

05/17/2023, 1:28 PM

Hi Team! Can someone clarify how to correctly use flink 1.17 operator session deployments? If I deploy session deployment as in an example, then if I submit a job using flink cli, I will get error:

Copy code

"system:serviceaccount:my-namespace:flink" cannot get resource "services" in API group "" in the namespace "my-namespace".

I figured that if I use serviceAccount: flink-operator, I will not have this problem. However, in flink-operator docs it says that session deployment should use flink-operator service account. Thanks!

Michał Fijołek

05/17/2023, 2:00 PM

Hello :) We have a flink job (v 1.17) on k8s (using official flink-k8s-operator) that reads data from kafka and writes it to s3 using flink-sql using compaction. Job sometimes fails and continues from checkpoint just fine, but once a couple of days we experience a crash loop. Job cannot continue from the latest checkpoint and fails with such exception:

Copy code

java.util.NoSuchElementException
	at java.base/java.util.ArrayList$Itr.next(Unknown Source)
	at org.apache.flink.connector.file.table.stream.compact.CompactOperator.initializeState(CompactOperator.java:114)
	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:122)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:274)
	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:734)
	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:709)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:675)
	at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
	at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:921)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
	at java.base/java.lang.Thread.run(Unknown Source)

Anyone experienced something like this? here’s the relevant code: https://github.com/apache/flink/blob/release-1.17/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/table/stream/compact/CompactOperator.java#L114 It looks like

CompactOperator

is calling

next()

on iterator without checking

hasNext()

first - anyone know why? Why

context.getOperatorStateStore().getListState(metaDescriptor)

returns empty iterator? Is latest checkpoint broken in such case? We have identical job, but without compaction, and it works smoothly for a couple of weeks now. Here’s how the table look like. The whole job is just

select

from kafka and

insert

to s3.

Copy code

CREATE EXTERNAL TABLE IF NOT EXISTS hive.`foo`.`bar` (
  `foo_bar1` STRING,
  `foo_bar2` STRING,
  `foo_bar3` STRING,
  `foo_bar4` STRING
  )
  PARTITIONED BY (`foo_bar1` STRING, `foo_bar2` STRING, `foo_bar3` STRING)
  STORED AS parquet
  LOCATION '<s3a://my/bucket/>'
  TBLPROPERTIES (
    'auto-compaction' = 'true',
    'compaction.file-size' = '128MB',
    'sink.parallelism' = '8',
    'format' = 'parquet',
    'parquet.compression' = 'SNAPPY',
    'sink.rolling-policy.rollover-interval' = '1 h',
    'sink.partition-commit.policy.kind' = 'metastore'
  )

Any help appreciated, thanks :)

Trystan

05/17/2023, 7:41 PM

when testing in docker, you used to be able to just add another

localhost

line to

conf/workers

and it’d spin up a second TM. i know this worked ~2 years ago now it seems the second TM runs into a port collision and dies:

ERROR: transport error 202: bind failed: Address already in use

is this no longer an expected way to spin up two TMs in the same docker container?

Dave Voutila

05/17/2023, 10:48 PM

I'm trying to write a PyFlink (1.17) pipeline with a Kafka source that uses SlidingEventTimeWindows, but I can't get it to properly use event time (windows never seem to trigger). Processing time works fine. I even tried setting a custom TimestampAssigner to assign a timestamp from data in my events...which I know is being called because I put logging in it...but still never triggers on event time. Any ideas on what to try or how to debug this?

Rashmin Patel

05/18/2023, 9:37 AM

Hii All I have a basic doubt around watermark generation. Let's say I have a kafka topic with 10 partitions and if I provide source parallelism to be 15, with idleTimeout of 30 mins. In this case, how are the watermarks generated ? 1. Does it hold back watermark due 5 (15-10) idle slots till 30 mins ? 2. Or it identifies that kafka partition count is 10 and hence progress the watermark. I am observing that it shows

No Watermark (Watermarks are only available if EventTime is used)

for quite some time and then starts generating watermark.

kingsathurthi

05/18/2023, 11:11 AM

Hi All, have a doubt, What is the upgrade process for flinkdeployment without impacting the current running job?

Yaroslav Bezruchenko

05/18/2023, 1:42 PM

Does anyone have an example in PyFlink (1.17) how to send a dict to kafka using AvroRowSerializationSchema?

Amir Hossein Sharifzadeh

05/18/2023, 2:48 PM

Hi guys. I keep getting this error while trying to run maven: The following artifacts could not be resolved: org.apache.flink:flink-statebackend-rocksdb:jar:1.18-SNAPSHOT (absent): Could not find artifact org.apache.flink:flink-statebackend-rocksdb:jar:1.18-SNAPSHOT

Amir Hossein Sharifzadeh

05/18/2023, 2:48 PM

where:

<dependency>

<groupId>org.apache.flink</groupId>

<artifactId>flink-statebackend-rocksdb</artifactId>

<version>1.18-SNAPSHOT</version>

<scope>compile</scope>

</dependency>

Kevin Lam

05/18/2023, 5:27 PM

Has anyone encountered issues with watermark alignment coming out of a restart? Eg. watermark alignment is working well with sources not drifting, then after a restart we observe data skew as if watermark alignment is no longer active.

André Casimiro

05/18/2023, 8:25 PM

Hi, Documentation for

ContinuousEventTimeTrigger

says:

A Trigger that continuously fires based on a given time interval. This fires based on Watermarks.

Does that depend on the stream actually receiving data? I was hoping this Trigger would fire even if not but no luck. What's the difference between

ContinuousEventTimeTrigger

and

EventTimeTrigger

Guruguha Marur Sreenivasa

05/18/2023, 10:22 PM

Hi All, I'm trying to figure out Flink custom metrics and want to know if its possible to register a metric, say a counter metric, with tags. For ex: `avgflink.operator.currentEmitEventTimeLag{cluster name<CLUSTER>}`here, cluster_name behaves like a grouping tag. How do I register my own custom metric the same way?

slowratatoskr

05/19/2023, 12:17 AM

Anyone here using AWS Kinesis Data Analytics with Flink SQL?

👋 1

slowratatoskr

05/19/2023, 1:50 AM

@Jeremy Ber are you using it through the console via Studio notebooks?

Marco Villalobos

05/19/2023, 4:19 AM

I want to write data in parquet format from a stream job and then read it with the SQL API and SQL remote gateway or SQL Client (different job / session). The SQL API has no discussion about Avro or Protobuf, when dealing with Parquet, but the Stream API requires it in the Sink File Connector. Does this mean I must convert the stream to a table and sink it with SQL API? Please advise.

Sumit Singh

05/19/2023, 11:01 AM

Running SQL Client, Job is submitted and running but also getting below warning, Does anyone have any idea how to resolve this ?

Copy code

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.flink.api.java.ClosureCleaner (file:/home/ubuntu/flink-1.17.0/lib/flink-dist-1.17.0.jar) to field java.lang.String.value
WARNING: Please consider reporting this to the maintainers of org.apache.flink.api.java.ClosureCleaner
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

Hussain Abbas

05/19/2023, 11:36 AM

Hello Guys, We are facing issues with metaspace getting full this causes restart of task manager, its happening every 5mins, we increased it as well but has no effect. Do you know what can be the issue?

Sergey Postument

05/19/2023, 11:42 AM

~~I use an flink-kubernetes operator 1.5.0 but getting following error:~~ │ Error: <http://FlinkDeployment.flink.apache.org|FlinkDeployment.flink.apache.org> "my-app-name" is invalid: spec.flinkVersion: Unsupported value: "v1_17": supported values: "v1_13", "v1_14", "v1_15", "v1_16"
helm doesnt support updating CRD. so during flink operator update via helm dont forget to upd crd

slowratatoskr

05/19/2023, 12:55 PM

anyone here using Flink SQL Gateway in production? how do you use it with the flink k8s operator?

Amir Hossein Sharifzadeh

05/19/2023, 5:48 PM

From this tutorial (please look at the screenshot), the program never approaches to the second if:

Amir Hossein Sharifzadeh

05/19/2023, 5:48 PM

Amir Hossein Sharifzadeh

05/19/2023, 5:53 PM

Does anybody know how and where to implement the second block to iterate over

checkpointedState

? Because

ListState

does not have such

getSize()

method to figure when all data are being consumed…

Ishan

05/19/2023, 6:34 PM

Hi All, I am new to flink, trying to create an Iceberg table using flink. The doc here says you can create an iceberg table managed by Hadoop catalog. However flink doesn't doucment

hadoop

rest

catalog aywhere, am I missing something? The reason I ask is because I get validation exception when trying to create hadoop

catalog.

Catalog options are:

'catalog-type'='hadoop'

'type'='iceberg'

'warehouse'='.abfs://....'

at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)

at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)

at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:98)

at org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:84)

... 4 more

Caused by: org.apache.flink.table.api.ValidationException: Unable to create catalog 'hadoop_catalog'.

Catalog options are:

'catalog-type'='hadoop'

'type'='iceberg'

Ron Ben Arosh

05/21/2023, 12:01 PM

Hi 🙂 just added the following line to the config file

Copy code

execution.savepoint-restore-mode: CLAIM

But when running the service (Flink 1.17) the restore mode were not applied (saw under job manager section) How can I set restore mode to

CLAIM

田明刚

05/22/2023, 2:49 AM

Hi all: flink 1.15 write starRocks3.0（exactly-once ）exception： 55037,857 ERROR com.starrocks.data.load.stream.DefaultStreamLoader [] - Exception happens when sending data, thread: I/O client dispatch - 5d8de405-3ce5-4cc3-90b8-3b727f77c1b3 com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of unknown exception, db: example_db, table: detailDemo2UNIQUE, label: dcd8ec78-fb0d-4fc1-9cb7-066f44c9c61b at com.starrocks.data.load.stream.DefaultStreamLoader.send(DefaultStreamLoader.java:292) ~[flink-connector-starrocks-1.2.6_flink-1.15.jar:?] at com.starrocks.data.load.stream.DefaultStreamLoader.lambda$send$2(DefaultStreamLoader.java:118) ~[flink-connector-starrocks-1.2.6_flink-1.15.jar:?] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252] Caused by: com.starrocks.streamload.shade.org.apache.http.client.ClientProtocolException how can I solve the exception? Thanks in advence.

Season

05/22/2023, 7:40 AM

I want to unit test "ProcessWindowFunction", but i cannot found usecase. when i try "WindowOperator", it must use ReducingStateDescriptor, this will lead to my windows only leave one event, but i want to see all event in "ProcessWindowFunction" logic. does anyone know how to test ProcessWindowFunction, then i can test windows with all event

✅ 1