What kubernetes pods does an Airbyte sync job spin...
# ask-ai
s
What kubernetes pods does an Airbyte sync job spin up? What is their purpose?
k
In an Airbyte deployment on Kubernetes, a sync job involves spinning up several Kubernetes pods, each with a specific purpose. Here are the key pods involved in a sync job and their roles: 1. Check Job Pod: This pod is responsible for verifying the connection to the source and destination before the actual sync process begins. It ensures that the credentials and configurations are correct and that the source and destination are reachable. 2. Sync Job Pod: This is the main pod responsible for the data synchronization process. It reads data from the source and writes it to the destination. The sync job typically involves two workers: - Source Worker: Reads data from the source. - Destination Worker: Writes data to the destination. 3. Normalize Job Pod: After the data is synced, this pod is responsible for normalizing the data. Normalization involves transforming the raw data into a more structured and usable format, often aligning it with the schema expected by the destination. These pods are created dynamically based on the job requirements and are managed by the Airbyte scheduler, which ensures that the pods are distributed across different nodes in the Kubernetes cluster. For more detailed information, you can refer to the [Airbyte documentation on scaling](https://docs.airbyte.com/operator-guides/scaling-airbyte#what-to-scale) and the [forum discussion on Kubernetes pods](https://discuss.airbyte.io/t/1944). If you encounter issues with pods not being created or jobs failing, it might be related to resource allocation or Kubernetes node scaling, as discussed in the [GitHub issue](https://github.com/airbytehq/airbyte/issues/20014) and [Airbyte discussion](https://github.com/airbytehq/airbyte/discussions/35301).
s
I see an orchestrator pod, what is that
@kapa.ai i see an orchestrator pod, what is that
k
An orchestrator pod in Airbyte is part of the container orchestration process that helps manage and execute job tasks. When enabled, the worker process delegates certain responsibilities to the orchestrator process. This decoupling allows for better management and tracking of job states. Here’s a brief overview of how the orchestrator pod works: 1. Delegation of Responsibilities: The worker process delegates responsibilities to the orchestrator process. This includes managing the execution state of jobs. 2. State Tracking: Workers constantly poll a shared Cloud Storage location for job state updates. The orchestrator process writes status marker files to this location, such as
NOT_STARTED
,
INITIALIZING
,
RUNNING
,
SUCCESS
, and
FAILURE
. 3. Error Handling: If the orchestrator process encounters issues, it writes a
FAILURE
marker. If it succeeds, it writes a job summary as part of the
SUCCESS
marker file. 4. Source of Truth: The Cloud Storage store is treated as the source-of-truth for the execution state. The orchestrator pod is automatically enabled when running Airbyte on Kubernetes using Helm Charts. For users running Airbyte on Docker, there are certain pitfalls to be aware of due to the lack of this orchestration feature. For more detailed information, you can refer to the [Airbyte documentation on Container Orchestrator](https://docs.airbyte.com/understanding-airbyte/jobs#container-orchestrator). Relevant knowledge sources: - [Documentation on Container Orchestrator](https://docs.airbyte.com/understanding-airbyte/jobs#container-orchestrator)