Sandrine Bédard
08/26/2024, 4:34 PMtrackable_job
). It then triggers another workflow, called Sync (also in Cadence), which sends data to downstream services
3. The Sync workflow receives a store ID as input (among other things), waits for 5-min for an aggregation window. After 5-min, it reads all items from trackable_job
with the given store ID, and calls external APIs to sync data
The issues with the current architecture are:
1. The 5-min aggregation window results in a lot of items being read at once from trackable_job
(up to 170K), making the DB query to fetch items very long and inefficient
2. Multiple Cadence workflows read from trackable_job
, making things worse from a DB perspective
3. We track success/failed syncs in the trackable_job
table, but we don't do anything with it (e.g., publishing events to clients), so this table is used as a queue only
In my design, I'm considering 2 options:
• Option 1: Kafka + Cadence
◦ Replace trackable_job
with a proper Kafka queue (partitioned by store ID)
◦ Modify the Sync workflow to pull from that queue. I've read Cadence isn't super easy to set up with Kafka. For example, if queues are partitioned, we need to define which Cadence worker reads from which queue. Is that true?
◦ We must have grouping logic to group items by store ID, and then make the API call downstream (requirement from external dependencies)
• Option 2: Kafka + Flink
◦ Replace trackable_job with a proper Kafka queue (partitioned by store ID)
◦ Move business logic of Sync workflow into Flink. Set up Flink tasks to maximize parallelism with Kafka
◦ Add a time/count window through Flink to control to aggregation window
◦ Group events in Flink by store ID
Do you think my problem is a good use-case for Flink, and what do you think of option 2? Thanks a lot!Ken Krugler
08/26/2024, 4:41 PMSandrine Bédard
08/26/2024, 5:59 PMSandrine Bédard
08/26/2024, 6:07 PMKen Krugler
08/26/2024, 10:08 PMtrackable_job
table, and feed that stream of updates to a Flink workflow that calculates the 5 minute tumbling window aggregation (keyed by store). This assumes the DB you’re using for that table is supported by Flink CDC. Since you mentioned “Multiple Cadence workflows read from” that same table, this seems easiest.
If you wanted to get rid of the trackable_job
table, you could write all inventory updates directly to a Paimon table, and have a Flink job that handles the “decoration” (I assume enrichment) currently being done by the Cadence InventoryUpdate workflow. This could write results to another Paimon table that’s the source for the aggregation Flink workflow, or this same job could also do the aggregation work if you didn’t need an enriched version of the inventory updates stored for other workflows.
There is some latency between when a Paimon table is updated and when that update becomes visible to a Flink workflow that’s reading from this same table, and that’s the “1 minute” I was referring to. You can go lower, that just feels like a reasonably safe value to use.Sandrine Bédard
08/26/2024, 11:36 PMKen Krugler
08/27/2024, 2:07 AMSandrine Bédard
08/27/2024, 4:30 PMKen Krugler
08/27/2024, 4:31 PM