Lukasz Krawiec
09/09/2024, 8:47 PMFileSource.forRecordStreamFormat(new Format(), new Path(path))
.monitorContinuously(Duration.ofMinutes(15))
.build()
In production, my job fails to start, despite increasing timeouts to 10 minutes, with one of the last lines in the logs being
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Starting split enumerator for source Source: S3.
Having debugged a bit locally, with a much smaller dataset, I believe the problem to be that for SplitEnumerator to start, it wants to first list all existing paths in S3 subdirectory. (of which there are many, 100k+ subdirectories)
Is it possible to configure the enumerator to incrementally discover new s3 files, aka, allowing it to start without fully discovering entire s3 structure?
Alternatively, can anyone share their experience with going around this problem?Lukasz Krawiec
10/02/2024, 3:18 AMD. Draco O'Brien
10/02/2024, 6:46 AMD. Draco O'Brien
10/02/2024, 6:52 AMD. Draco O'Brien
10/02/2024, 6:54 AMD. Draco O'Brien
10/02/2024, 6:57 AMD. Draco O'Brien
10/02/2024, 6:59 AMD. Draco O'Brien
10/02/2024, 7:02 AM