What are you guys thinking about in terms of your Kube suppo Airbyte #contributing-to-airbyte

What are you guys thinking about in terms of your ...

Roshan George

02/08/2021, 9:10 PM

What are you guys thinking about in terms of your Kube support? I really like the idea of applying the YAMLs individually since Airbyte uses the DB as the coordination layer. Are you thinking Helm in the future or something? I'm not using k8s right now, but I'd like to when you're off alpha there.

user

02/08/2021, 9:11 PM

Personally, I find Helm charts kind of opaque and since each Airbyte piece is nice and well-defined, applying them separately feels nice. But I haven't done it yet, so...

user

02/08/2021, 9:11 PM

Happy to just read some in-progress doc you guys have.

user

02/08/2021, 9:12 PM

I mention this because right now I'm totally using Airbyte on a pet EC2 instance with everything running in docker-compose there 😱

user

02/08/2021, 9:17 PM

Have you looked at https://docs.airbyte.io/deploying-airbyte/on-kubernetes yet?

user

02/08/2021, 9:18 PM

We do have individual manifest YAMLs available right now via Kustomize, with some caveats.

user

02/08/2021, 9:19 PM

At some point we may look to use a tool like Helm for better packaging (especially around upgrading) but our first priority for Kube is to address the current limitations.

user

02/08/2021, 9:20 PM

Looking at the current limitations section on that doc page, which things would have to be resolved for you to want to try Airbyte on Kube?

user

02/08/2021, 11:22 PM

• I don't want to mess around with taints and tolerations to run this, so same node is a no-go, blocker • Don't want high inter-pod traffic. i.e. okay for logs, not okay for every record to go through. Blocker • The manifests thing I don't care about. I know how to write kube YAMLs, and I can

kubectl apply

into namespaces. Not a blocker • I don't understand why UI operation latency will be any higher. Not a blocker but I don't understand this. • Definitely don't want to overburden kube logs. Need it for other stuff. Blocker. • I'm never going to manually delete anything, but I also don't care if a bunch of extra pods in 'completed' state are around. Not a blocker.

user

02/08/2021, 11:22 PM

New Helm allows me to generate YAMLs so I don't mind that.

user

02/08/2021, 11:22 PM

In any case, I'm still using your software without k8s support so who knows if it even matters.

user

02/08/2021, 11:23 PM

I think most of your concerns align with how we were thinking about it.

user

02/08/2021, 11:24 PM

Why are taints/tolerations a blocker for you? Are you running on limited node sizes?

user

02/09/2021, 12:02 AM

This tool is a time saver for me. At the point where I am applying excessive management on it, it is no longer a time saver. My kube cluster is likewise a time saver in that I see it as just a pool of workers capable of executing work, and a scheduler capable of scheduling work. I throw work units at it and it does the work. When I start having to manage my scheduler, it changes my relationship with my kube cluster. In this case, I can't even just let a node die because I will lose syncs in that time. I don't want my kube cluster to behave like this where I have defacto pet pods on a pet node pool. In that situation, I will happily stick to a pet EC2 instance with docker-compose.

user

02/09/2021, 12:03 AM

tu thanks

user

02/09/2021, 12:05 AM

I understand right now that Airbyte spawns containers, right? Does your future k8s design involve longer lived pods that accept work units or does it involve spawning a pod per sync job?

user

02/09/2021, 12:05 AM

I like the former only because I have the ability to tune (using k8s primitives) the max/min pod count to the deployment.

user

02/09/2021, 12:06 AM

I wouldn't want some sort of thundering herd after I bring up the

airbyte-scheduler

where it decides to schedule one pod per work unit and borks my cluster.

user

02/09/2021, 12:07 AM

But if I can control

max_simultaneous_worker_pods

as a param, that's fine too.

user

02/09/2021, 12:07 AM

We like the former too. However it’s unclear that we can do that on kube since we need to run different containers for different sources/destinations.

user

02/09/2021, 12:08 AM

A max simul. param will likely be the mid-term solution

2 Views

Open in Slack

Previous Next