Hi everyone Using Kubernetes deployment and having increased Airbyte #feedback-and-requests

Channels

advice-data-architecture

advice-data-ingestion

advice-data-orchestration

advice-data-privacy

advice-data-quality

advice-data-transformation

advice-data-visualization

advice-data-warehouses

advice-reverse-etl

airbyte-cloud-pricing

airbyte-dbt-packages

airbyte-enterprise-pricing

airbyte-for-power-users

airbyte-plus-airflow

airbyte-plus-dagster

airbyte-udemy-course

ask-community-for-troubleshooting

cloud-master-build-failure

code-contributions-reviews

community-strategy

connector-builds

connector-build-statuses

connector-development

contributing-to-airbyte

databricks-airbyte316

events-and-conferences

example-of-channel

explo-dentalxchange-trial

external-teradata--11

external-teradata-source

ext-tabular-iceberg

feedback-and-requests

hacktoberfest-2022

help-api-cli-orchestration

help-connector-development

infra-dev-alerts

infra-dev-alerts-webhook

license-questions

low-code-migrations

movedata-a-different-way-to-work

movedata-airbyte-connection-management-scale

movedata-airbyte-orchestration-in-gcp

movedata-better-data-testing-with-the-data-error-generating-process

movedata-bring-your-own-infra

movedata-building-a-real-time-user-facing-dashboard-with-airbyte-redpanda-and

movedata-building-connectors-in-minutes

movedata-cicd-for-data-building-devtest-data-environments-with-lakefs

movedata-conference-2023

movedata-data-analysts-are-setup-to-fail-and-its-our-fault

movedata-data-engineering-is-software-engineering-and-software-engineering-is-da

movedata-dataops-on-the-open-modern-data-stack

movedata-data-orchestration-is-not-just-running-jobs-on-a-schedule

movedata-dev-first-open-source-data-observability-solution

movedata-ducky-data-crunching-on-the-laptop---the-pendulum-swings

movedata-five-causes-of-data-quality-issues

movedata-future-evolution-of-the-data-stack

movedata-guardrails-not-stop-signs

movedata-if-you-build-it-will-they-come-how-to-activate-your-modern-data-stack

movedata-in-2023-data-trust-will-be-more-important-than-ever-before

movedata-ingesting-data-with-airbyte-into-a-high-speed-data-lakehouse-using-trin

movedata-keynote-building-the-foundations-of-data-movement

movedata-let-your-data-team-choose-their-own-tools

movedata-mobilize-the-worlds-data

movedata-modern-data-management---how-to-achieve-data-discovery

movedata-moving-data-reliably-ingestion-observability-are-better-together

movedata-navigating-the-modern-data-stack-with-open-source-headless-bi

movedata-open-source-communities-shape-modern-data-stacks

movedata-prefect-ing-self-hosted-airbyte

movedata-prep-your-pipelines---reverse-etl-and-the-coming-great-flood

movedata-questions

movedata-speakers

movedata-speakers-2025

movedata-streaming-made-modern

movedata-the-best-data-warehouse-is-a-lakehouse

movedata-the-data-ecosystem-is-ready-for-etl-to-be-dead

movedata-the-end-of-the-pipeline

movedata-traditional-data-catalogs-will-be-replace-by-active-metadata-platforms

movedata-using-airbyte-on-day-1of-our-startup

movedata-using-airbyte-to-build-your-dremio-open-data-lakehouse

movedata-who-needs-a-traditional-data-warehouse-when-you-can-get-real-time-analy

movedata-why-bi-is-not-enough

new-channel-test

office-hour-12-october

oss-master-build-failure

oss-master-build-failure

p0-amazon-ads-03-30

p0-cloud-syncs-failing-dec-15

p0-harvest-bad-requests

p0-source-airtable-stream-failures

p1-auto-detect-schema-issues-2-1-23

partner-productpair-hightouch

prod1-alerts-test

proj-concurrent-cdk

proj-integration-tests

proj-python-cdk-upgrades

troubleshooting

understanding-airbyte

well-seeded-accounts

write-for-the-community

Hi everyone ! Using Kubernetes deployment and hav...

# feedback-and-requests

c

Clovis Masson

11/25/2021, 10:58 AM

Hi everyone ! Using Kubernetes deployment and having increased the number of worker, I noticed that, depending on the node where workers were started, performances could be affected. For instance, I have currently 4 workers replicas distributed on 3 nodes (

node A : 1 worker

,

node B : 1 worker

and

node C : 2 workers

). When syncing, if my

source-worker

and

destination-worker

are correctly distributed, I’m able to process about 25M rows in one hour (with a well distributed CPU load). However, if by any chance

source-worker

and

destination-worker

are both started in

node C

, then the node’s CPU goes up to 190% (against 10% and 10% for the two others) and time processing is much more slower as I’m only able to process about 15M rows within an hour. Not sure if it's an actual request as I don't know if there is an existing strategy to avoid this situation but is there a way to force the parallelization of workers on different nodes to maximize performance ?

a

Andik Achmad

11/29/2021, 8:20 AM

Hi, we currently don’t have a way to force parallelisation. We view this as part of K8s scheduling so this is opaque today

u

user

11/29/2021, 8:21 AM

Within the next month, we will be working on combining the source/worker containers into a single pod. This should keep networking to a single node and should make things more efficient

u

user

11/29/2021, 8:21 AM

we are still benchmarking and working on things

u

user

11/29/2021, 9:32 AM

Great ! Thank you for the feedback 🙂

2 Views