Hi, we are at the moment setting up our ETL stack ...
# ask-community-for-troubleshooting
m
Hi, we are at the moment setting up our ETL stack (airflow,airbyte,dbt). We got it up and working on VMs already. Now we are moving everything to GKE and this comes with one question: Should we setup ONE GKE cluster (including.: (airbyte dbt & airflow) or 3 clusters (1 airbyte cluster, 1 dbt cluster, 1 airflow cluster) Happy to get your ideas and best practice advice on this... btw. this is an awesome community! only reading here helped us a lot already
βœ… 1
j
Following. I setup a EKS cluster with 2 node groups - one for the airbyte-core pods and one for the airbyte worker pods using NodeSelectors. Been working out good so far but would love to get more ideas!
πŸ‘ 1
u
IMHO: 1 cluster with multiple nodes is perfect, is easier to handle the network between airbyte/airflow/dbt. If you need High-Availability you can setup 2 master nodes.
πŸ‘ 1
s
I would be surprised if multiple clusters are actually needed. Operationally it seems so much nicer to have one cluster assuming no crazy noisy neighbor issues, which depending on the scale of your workloads might happen. But generally holding the required amount of compute fixed, I think you should be fine with one cluster? I would at least start that way and validate it’s not working and why before moving on to multiple clusters πŸ™‚
πŸ‘ 1
m
Thank you guys for all your tips. Very much appreciated! I have one question left: would you rather install apache airflow on the gke cluster or would you just use GCP Cloud Composer?
Hi guys as requested by @Davin Chia (Airbyte) i'm starting a new thread as we are trying to get airbyte on GKE autopilot, Any help appreciated !
u
@abhi there is some questions about Airflow/Airbyte should we improve our docs Airbyte vs Airflow OR having a Operator Guide more advance using Airflow with best-practices?