Does anyone have strong thoughts on containers vs VMs for a multi-tenant data plane? For cases where a SaaS is running trusted code, K8s seems like a no brainer, but if you're running user-code, it's hard to ignore the security benefits of VMs. I know there are projects like gVisor, Kata, and Firecracker that can integrate with Kubernetes to make the runtime more secure. Is there anyone out there intentionally using VMs instead of containers for security reasons?
b
Bill Tarr
07/29/2023, 2:39 PM
Hi @Lucas Stephens - interested in your use case here, there is always some "it depends" to these decisions, and some options are a bit platform specific (I'm from AWS, so I know those better, but there are usually equivalents).
I've got customers who run SaaS on VMs, but not specifically for untrusted code isolation. I could see doing it, especially for very simple cases... Without data to support, k8 still seems the largest platform for SaaS in my experience.
If it helps, I did a talk at re:Invent last year on
Supporting extensibility in SaaS environments▾
(which at it's core is managing untrusted code). The image is for EKS, so the Fargate version only works if your on AWS, but it's a strong option for isolating code. Even without it, you have a range of isolation options for the code in k8... You can use taints to constrain it to segregated compute, you can use namespaces with Cilium or Calico for network policy hardening at the least. vClusters being another, as are the ones you mention. I'd see if any of those met my security requirements before I went to VMs. The EKS Best Practices Guides is useful for breaking down the tenant isolation options as well..
I do think Serverless containers is the ideal solution for untrusted code (Fargate being a flavor too) but Lambdas (or other FaaS) are even better if they meet your compute requirements, especially if your use case includes segregating different units contributed code from other.
Anyway, would be happy to chat if you are interested!
🙌 1
l
Lucas Stephens
07/29/2023, 3:51 PM
Thanks for the reply! All of the things you’ve mentioned I’ve also looked into, so it’s good to hear that I’m on the right track. I’m pretty confident we can isolate storage with encryption keys per tenant, and network using network policies like you mentioned (and also a namespace per tenant model) With compute I think it’s most desirable to share nodes across multiple tenants in order to achieve better utilization - dedicating a node group per tenant seems too expensive. Shuffle sharding across node groups can help maintain logical isolation from a failure perspective. What I’m specifically concerned about is malicious untrusted code breaking out of a container - and it seems like the options are to use a runtime like gVisor or Firecracker to run the containers in a sandbox, or to just skip containers and use VMs altogether
Lucas Stephens
07/29/2023, 3:52 PM
Lambdas are interesting too, but in the case of running user code, we’d have to force users to write their applications a specific way, and there’s not a desire to do that
b
Bill Tarr
07/30/2023, 1:56 AM
Yup, I think you're on the right track for sure. All the options we are discussing are legit, just need to map them to your exact requirements. I'd be curious to know what type of code your users are contributing. I talked to a lot of companies doing public-facing user contributed code, some did Lambda, and some put pretty stringent constraints - Auth0 actions are Node.js, but also cover very specific use cases... as you say depends what you want to achieve. I'd still consider Fargate still, but depends on your use case.
I haven't seen shuffle sharding on k8. I have on EC2s of course, but my initial positions would be to see what k8 schedulers including Karpenter could do for you, but always depends 🙂
Bill Tarr
07/30/2023, 2:00 AM
Storage is a whole different can of worms. Encryption key per tenant works for some storage, like S3 natively, but can get tricky with some shared constructs like schema per tenant in Postgres on RDS for example.... always options though.
l
Lucas Stephens
07/31/2023, 4:15 PM
This would be for private user code - a general PaaS that you can deploy any arbitrary container to; like another Heroku
I like Fargate a lot personally, but I don't think it's a good idea to build on a proprietary solution like that & be locked in to AWS. The next natural question for me would be "how does Fargate work under the hood?" 😆 Does it convert containers to Firecracker VMs?
Karpenter is also something I've used before and really like, but since it only works with EKS & EC2, I'd probably go for cluster-autoscaler instead
b
Bill Tarr
08/01/2023, 2:34 AM
yeah, I see where you are coming from. I have customers building multi-cloud SaaS solutions by necessity, they do tend to try to keep everything lowest possible denominator, but I don't think that's always the best choice.
I'm not sure if I'd draw the line at Fargate if I was you and was building on AWS, It would be a bit superior to Firecracker by itself (Firecracker is in there 🙂) for isolating containers on AWS, but your container isn't sticky to it, same container could be deployed to kata running somewhere else later, your infra isn't going to be identical elsewhere anyway. Kind of like the EKS vs self-managed k8 argument for me - EKS is cheaper and better for most use cases on AWS, with underlying k8 version or specialized plugins being outliers, and really doesn't create lock-in by itself IMO.
l
Lucas Stephens
08/01/2023, 2:44 AM
Hmm yeah, I see what you are saying - ideally the runtime on our platform is abstracted away anyway. Honestly the big draw of Kube for me is the operator model - being able to define a set of CRDs + controllers to rollout applications basically gives us a nice abstraction for our platform for free; our control plane can then just call the operator & handle all of our user management stuff. I came across Virtual Kubelet recently as well (though the Fargate provider is inactive), which seems like a really interesting project. More and more I really believe Kelsey Hightower's assertion that Kube is just becoming an API for deploying things
b
Bill Tarr
08/02/2023, 10:08 PM
I like where your head (and Kelsey Hightower) is at from a k8 perspective, it's really just a deployment mechanism. I'm not sure of the fit, but I'm a fan of Crossplane in the space for using CRDs to deploy your cloud infra, but not sure that is exactly the problem you are mapping to at this stage. Let me know how things progress for you, seems like an interesting problem you are solving for.
l
Lucas Stephens
08/02/2023, 10:54 PM
Crossplane is also something I really, really like - my product has both user code & cloud infrastructure, so I'm thinking I can create a set of operators to roll it all out smoothly with a single API. I'll keep you updated! Thanks for rubber-ducking with me, I'm really grateful for this Slack because sometimes it's hard to find others who know about these esoteric types of problems!
b
Bill Tarr
08/02/2023, 11:16 PM
Happy to have the opportunity to 🦆 (I feel like there should be a rubber duck). Thank @Gwen Shapira - she is the founder of the feast 🙂