Hey Ido, Eduardo's method works great! We went with option 2. Ran Argo workflows with a shared attached disk to store the intermediate artefacts.
Running workflow pipelines is the first step, orchestrating it with proper resources is the next challenge. We're using these workflow pipelines to serve users so potentially, at spikes, for every 1 CPU/RAM resource we have, we might have workflows that require 3-5 CPU/RAM resources total. To serve all of the workflow requests, we'll need an intelligent queue control system.
The current challenge we are figuring out is that Argo workflow has no queues. This causes two problems for us. Firstly, if the pod's resources are fully utilized, the incoming workflows cannot be accepted. Secondly, assuming we use some form of queue system, if the pod's resources are partially utilised but are unable to fit bigger workflows, we haven't found an "intelligent broker system" that is able to fit smaller workflows (smaller datasets = smaller workflows) that are in the queue.
E.g.
Queue = 1, 2, 3, 4, 5
Pod 1 = 3/5 resources used -> I could fit in 1 / 2 here but not 3, 4, 5
Pod 2 = 2/5 resources used -> I could fit in 1 / 2 / 3 here but not 4, 5
Would you suggest any approach?
@Shamiul Islam Shifat for your reference