@adamant-action-27578 meet Samhita, Flyte maintainer and dedicated engineer helping build Flyte integrations.
@tall-lock-23197 Eric is a DLRover maintainer and we had a chat about a potential DLRover Agent for Flyte, for high performance distributed training, among other use cases.
Eric, feel free to ask questions here. I'll be partially available next week but let us know if you need help.
t
tall-lock-23197
10/07/2024, 7:17 AM
hi Eric! we already have a pytorch elastic plugin that enables distributed training on flyte. dlrover would be an amazing addition! i believe the plugin should be a backend plugin rather than an agent. let me know if you'd like me to send over some pull requests for reference.