hi all trying to write from Apache Flink <http 1 18 to|1 18 Apache Flink #troubleshooting

hi all... trying to write from Apache Flink <1.18....

George Leonard

08/05/2024, 6:53 PM

hi all... trying to write from Apache Flink 1.18.1 to Paimon on Hadoop HDFS (3.3.5). the "reference/metadata objects/schema's created" and bucket0's created. but never any data... and if I look at docker compose logs, i see the following

Copy code

flink-taskmanager-2      | 2024-08-05 18:50:07,618 INFO  org.apache.flink.runtime.taskmanager.Task                    [] - Source: t_f_avro_salescompleted_x[16] (1/1)#251 (e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251) switched from CREATED to DEPLOYING.
flink-taskmanager-2      | 2024-08-05 18:50:07,618 INFO  org.apache.flink.runtime.taskmanager.Task                    [] - Loading JAR files for task Source: t_f_avro_salescompleted_x[16] (1/1)#251 (e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251) [DEPLOYING].
flink-taskmanager-2      | 2024-08-05 18:50:07,620 INFO  org.apache.flink.runtime.taskmanager.Task                    [] - Source: t_f_avro_salescompleted_x[16] (1/1)#251 (e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251) switched from DEPLOYING to INITIALIZING.
devlab-flink-jobmanager  | 2024-08-05 18:50:07,621 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: t_f_avro_salescompleted_x[16] (1/1) (e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251) switched from DEPLOYING to INITIALIZING.
devlab-flink-jobmanager  | 2024-08-05 18:50:07,633 INFO  org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source Source: t_f_avro_salescompleted_x[16] registering reader for parallel task 0 (#251) @ 172.19.0.12
flink-taskmanager-2      | 2024-08-05 18:50:07,648 INFO  org.apache.flink.runtime.taskmanager.Task                    [] - Source: t_f_avro_salescompleted_x[16] (1/1)#251 (e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251) switched from INITIALIZING to RUNNING.
devlab-flink-jobmanager  | 2024-08-05 18:50:07,648 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: t_f_avro_salescompleted_x[16] (1/1) (e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251) switched from INITIALIZING to RUNNING.
devlab-flink-jobmanager  | 2024-08-05 18:50:07,997 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: t_f_avro_salescompleted_x[16] (1/1) (e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251) switched from RUNNING to CANCELING.
devlab-flink-jobmanager  | 2024-08-05 18:50:07,997 INFO  org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Removing registered reader after failure for subtask 0 (#251) of source Source: t_f_avro_salescompleted_x[16].
flink-taskmanager-2      | 2024-08-05 18:50:07,997 INFO  org.apache.flink.runtime.taskmanager.Task                    [] - Attempting to cancel task Source: t_f_avro_salescompleted_x[16] (1/1)#251 (e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251).
flink-taskmanager-2      | 2024-08-05 18:50:07,997 INFO  org.apache.flink.runtime.taskmanager.Task                    [] - Source: t_f_avro_salescompleted_x[16] (1/1)#251 (e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251) switched from RUNNING to CANCELING.
flink-taskmanager-2      | 2024-08-05 18:50:07,997 INFO  org.apache.flink.runtime.taskmanager.Task                    [] - Triggering cancellation of task code Source: t_f_avro_salescompleted_x[16] (1/1)#251 (e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251).
flink-taskmanager-2      | 2024-08-05 18:50:07,998 INFO  org.apache.flink.runtime.taskmanager.Task                    [] - Source: t_f_avro_salescompleted_x[16] (1/1)#251 (e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251) switched from CANCELING to CANCELED.
flink-taskmanager-2      | 2024-08-05 18:50:07,998 INFO  org.apache.flink.runtime.taskmanager.Task                    [] - Freeing task resources for Source: t_f_avro_salescompleted_x[16] (1/1)#251 (e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251).
flink-taskmanager-2      | 2024-08-05 18:50:08,001 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Un-registering task and sending final execution state CANCELED to JobManager for task Source: t_f_avro_salescompleted_x[16] (1/1)#251 e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251.
devlab-flink-jobmanager  | 2024-08-05 18:50:08,002 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: t_f_avro_salescompleted_x[16] (1/1) (e2c7bb3077c1d5c22e0bccba65bfc32b_bc764cd8ddf7a0cff126f51c16239658_0_251) switched from CANCELING to CANCELED.

George Leonard

08/06/2024, 9:27 AM

@D. Draco O'Brien.

D. Draco O'Brien

08/06/2024, 2:53 PM

Ok, so logs indicate that the task responsible for reading from the source t_f_avro_salescompleted_x and presumably writing to Paimon is being cancelled right away after it starts running.

D. Draco O'Brien

08/06/2024, 2:54 PM

it switches from RUNNING to CANCELLED almost immediately

D. Draco O'Brien

08/06/2024, 2:56 PM

I don’t see errors in the log that caused this but you might check preceding log entries for errors or exceptions

D. Draco O'Brien

08/06/2024, 2:57 PM

I would check Flink job configurations for timeouts to make sure they are long enough to prevent early cancellation.

D. Draco O'Brien

08/06/2024, 2:58 PM

Also check the jobs restart strategy and failure threshold settings

George Leonard

08/06/2024, 2:59 PM

ok, will google. these are the same Flink job and task manager that i was using to write to iceberg on S3/MinIO. I can't imagine it being permission as the various table definition objects are created..

D. Draco O'Brien

08/06/2024, 3:01 PM

How about availability of t_f_avro_salescompleted_x - are you sure data can be read ok from it?

D. Draco O'Brien

08/06/2024, 3:02 PM

beyond these factors you should do standard checks on resource constraints CPU, memory etc as insufficient resources could also cause this type of failure

George Leonard

08/06/2024, 3:03 PM

will go look... the difference between this one and the previous is the output stream/leg, the input, aka topics and then flink virtual tables are all the same.

D. Draco O'Brien

08/06/2024, 3:04 PM

I will take a quick look at Flink 1.18.1 to see if there are any known issues with HDFS 3.3.5 but I think that’s supposed to be compatible.

George Leonard

08/06/2024, 3:04 PM

that seems to be the sweet combination/versions.

D. Draco O'Brien

08/06/2024, 3:05 PM

Probably worth a quick check if there are any known issues. Resource constraints are a bit unlikely given its more or less the same setup as before

George Leonard

08/06/2024, 3:06 PM

the dif of this build is pushing out to a hadoop dfs cluster, so thats extra resources, previous version was to a single node/container MinIO S3 service, but can't see it being resources, this MBP is not exactly slow.

D. Draco O'Brien

08/06/2024, 3:07 PM

source is Kafka right?

George Leonard

08/06/2024, 3:08 PM

yes, Kafka topic => Flink virtual table => paimon on hdfs

George Leonard

08/06/2024, 3:08 PM

as per attached

creCat.sql creFlink_2.sql

D. Draco O'Brien

08/06/2024, 3:09 PM

So your quite sure that the sales_completed_source is available and producing data right?

D. Draco O'Brien

08/06/2024, 3:09 PM

if not you would write a job just to test this aspect out

George Leonard

08/06/2024, 3:09 PM

interesting, originally i thought somethine wrong with CTAS, so tried the CTAT where 1=2, followed by the insert into <>

George Leonard

08/06/2024, 3:09 PM

but now thinking nothing wrong with CTAS... it's something else

George Leonard

08/06/2024, 3:10 PM

busy rebuilding stack... docker had a issue... the resource saver is not our friend. will try and breakdown the steps a bit and see... this is all 1000 foot into a project/PoV

D. Draco O'Brien

08/06/2024, 3:14 PM

Btw, do you have Paimon logging setup as well?

D. Draco O'Brien

08/06/2024, 3:15 PM

Maybe it’s worth a quick look at Paimon configuration as well to doublecheck this setup …

George Leonard

08/06/2024, 3:16 PM

paimon is just a open table format, stored on hdfs, not much to enable.

D. Draco O'Brien

08/06/2024, 3:16 PM

well … are you using Paimon connector?

D. Draco O'Brien

08/06/2024, 3:17 PM

I guess it’s the Hadoop connector or what?

George Leonard

08/06/2024, 3:17 PM

no... using jar file in flink lib, and then the catalog definition as per above sql file and then the table creates as per above.

D. Draco O'Brien

08/06/2024, 3:18 PM

I think you might need the Flink connector for Hadoop ..

Copy code

<dependencies>
    <dependency>
        <groupId>org.apache.paimon</groupId>
        <artifactId>paimon-flink</artifactId>
        <version>${paimon.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-hadoop-fs</artifactId>
        <version>${flink.version}</version>
    </dependency>
</dependencies>

D. Draco O'Brien

08/06/2024, 3:19 PM

when using Paimon

George Leonard

08/06/2024, 3:19 PM

will need to dig, from the paimon quick start for flink it did not imply anything like that.

George Leonard

08/06/2024, 3:19 PM

some serious google time...

George Leonard

08/06/2024, 3:21 PM

been following https://paimon.apache.org/docs/master/flink/sql-ddl/

George Leonard

08/06/2024, 3:22 PM

prob have to "recognise" i've been scanning over the paimon bits and havent read in detail...

D. Draco O'Brien

08/06/2024, 3:23 PM

ok, are you using Paimon also as the Catalog store for Flink?

George Leonard

08/06/2024, 3:23 PM

yes..

George Leonard

08/06/2024, 3:23 PM

as the first try...

George Leonard

08/06/2024, 3:23 PM

for paimon i'm uding the hdfs catalog

George Leonard

08/06/2024, 3:23 PM

for flink i'm using the previous proven hms

D. Draco O'Brien

08/06/2024, 3:24 PM

I see. According to https://www.alibabacloud.com/help/en/flink/developer-reference/apache-paimon-connector it would appear that you only need the connector in the case where you are using another catalog.

George Leonard

08/06/2024, 3:24 PM

this could be the cause, but it's a far fetch one.

D. Draco O'Brien

08/06/2024, 3:25 PM

Yeah well I breaking it up a bit into separate jobs will show what’s happening

George Leonard

08/06/2024, 3:25 PM

figured, get it first to work like this before i try and move catalog into hms.

George Leonard

08/06/2024, 3:25 PM

been building up the stack slowly, making sure one bit works before used in a 2nd bit

D. Draco O'Brien

08/06/2024, 3:25 PM

Yep I am curious to find out how well it supports catalogs.

👍 1

George Leonard

08/06/2024, 3:27 PM

i got a 90% work done to create a 2 node hive environment, a hive server and a meta server backed by postgres. once i got the simple hdfs cat working i will try the stand alone hms, and if that works then try my hive envionment.

George Leonard

08/06/2024, 3:27 PM

that way i know if it's me (most prob) or the tech...

George Leonard

08/06/2024, 3:27 PM

especially as alllllot of this is brand new to me.

D. Draco O'Brien

08/06/2024, 3:28 PM

Looking forward to seeing the results on that.

George Leonard

08/06/2024, 3:29 PM

heheheheh, 😉 my small little blog has turned into a multipart monster,

George Leonard

08/06/2024, 3:29 PM

right now trying to automate more of the deployment/building steps...

George Leonard

08/06/2024, 3:29 PM

will post link to project git here shortly, iv anyone is bored .

George Leonard

08/06/2024, 3:36 PM

https://github.com/georgelza/MongoCreator-GoProducer-avro.git

3 Views

Open in Slack

Previous Next