Besides NFS, is there any ideas or design changes that could lead to a deployment that can scale horizontally? Clones should probably continue to be local and ephemeral but a central store for plans is probably necessary. I’m not sure if Redis makes sense but any NoSQL DB could work.
04/03/2023, 6:30 PM
basically once boltdb dependency is gone and you have a workers Queue you could scale horizontally
Lyft forked atlantis and uses Temporal to do workers queues and such
But wouldn’t you still have empty plans though if an apply request isn’t routed to the same place? Plans are stored on disk unless I’m mistaken.
If it’s simple enough would we want to build a workers queue component directly inside Atlantis and shift towards that or is temporal more preferable?
04/03/2023, 8:29 PM
I do not know if temporal is more preferable, that is what lift used
I think one of the ideas behind was that was not vendor locking
because you could do this using SQS or eventbridge etc but that will lock you down to aws
04/21/2023, 5:43 PM
I would like to chime in this question to also ask for something I am observing with Atlantis in my setup to achieve horizontal scaling:
My scaling-POC setup:
• Atlantis run in GKE with multiple deployments.
• Atlantis data dir is in a persistent volume mount using GCP filestore with read-write-many (shared)
• Atlantis lock is GCP MemoryStore Redis (shared)
With those above, I am currently able to scale our Atlantis horizontally with multiple active Atlantis. We do use workspaces, and multiple github TF repos too.
I am quite surprised that this is all possible...
CC: @PePe Amengual
I will need to stress test this setup. But so far so good...
using the filestore, and redis lock, the atlantis does not seem to touch the boltdb lock file at all, which is a good thing. All Atlantis internal locking protection via redis are working fine. Cross planning are protected fine.