What are you guys thinking about in terms of your ...
# contributing-to-airbyte
r
What are you guys thinking about in terms of your Kube support? I really like the idea of applying the YAMLs individually since Airbyte uses the DB as the coordination layer. Are you thinking Helm in the future or something? I'm not using k8s right now, but I'd like to when you're off alpha there.
u
Personally, I find Helm charts kind of opaque and since each Airbyte piece is nice and well-defined, applying them separately feels nice. But I haven't done it yet, so...
u
Happy to just read some in-progress doc you guys have.
u
I mention this because right now I'm totally using Airbyte on a pet EC2 instance with everything running in docker-compose there 😱
u
u
We do have individual manifest YAMLs available right now via Kustomize, with some caveats.
u
At some point we may look to use a tool like Helm for better packaging (especially around upgrading) but our first priority for Kube is to address the current limitations.
u
Looking at the current limitations section on that doc page, which things would have to be resolved for you to want to try Airbyte on Kube?
u
• I don't want to mess around with taints and tolerations to run this, so same node is a no-go, blocker • Don't want high inter-pod traffic. i.e. okay for logs, not okay for every record to go through. Blocker • The manifests thing I don't care about. I know how to write kube YAMLs, and I can
kubectl apply
into namespaces. Not a blocker • I don't understand why UI operation latency will be any higher. Not a blocker but I don't understand this. • Definitely don't want to overburden kube logs. Need it for other stuff. Blocker. • I'm never going to manually delete anything, but I also don't care if a bunch of extra pods in 'completed' state are around. Not a blocker.
u
New Helm allows me to generate YAMLs so I don't mind that.
u
In any case, I'm still using your software without k8s support so who knows if it even matters.
u
I think most of your concerns align with how we were thinking about it.
u
Why are taints/tolerations a blocker for you? Are you running on limited node sizes?
u
This tool is a time saver for me. At the point where I am applying excessive management on it, it is no longer a time saver. My kube cluster is likewise a time saver in that I see it as just a pool of workers capable of executing work, and a scheduler capable of scheduling work. I throw work units at it and it does the work. When I start having to manage my scheduler, it changes my relationship with my kube cluster. In this case, I can't even just let a node die because I will lose syncs in that time. I don't want my kube cluster to behave like this where I have defacto pet pods on a pet node pool. In that situation, I will happily stick to a pet EC2 instance with docker-compose.
u
tu thanks
u
I understand right now that Airbyte spawns containers, right? Does your future k8s design involve longer lived pods that accept work units or does it involve spawning a pod per sync job?
u
I like the former only because I have the ability to tune (using k8s primitives) the max/min pod count to the deployment.
u
I wouldn't want some sort of thundering herd after I bring up the
airbyte-scheduler
where it decides to schedule one pod per work unit and borks my cluster.
u
But if I can control
max_simultaneous_worker_pods
as a param, that's fine too.
u
We like the former too. However it’s unclear that we can do that on kube since we need to run different containers for different sources/destinations.
u
A max simul. param will likely be the mid-term solution