Alice Suo
07/22/2024, 6:44 AMDanny Cranmer
07/22/2024, 7:51 AMkeyBy
you are grouping data based on the key and sending to downstream operators. If the parallelism in the downstream operator is >1 you will lose ordering globally (the file in S3), but retain it for each key/group. Retaining order across parallel processes does not scale well.Alice Suo
07/22/2024, 7:53 AMD. Draco O'Brien
07/22/2024, 10:39 AMD. Draco O'Brien
07/22/2024, 10:40 AMD. Draco O'Brien
07/22/2024, 10:41 AMD. Draco O'Brien
07/22/2024, 10:41 AMD. Draco O'Brien
07/22/2024, 10:42 AMD. Draco O'Brien
07/22/2024, 10:43 AM