Hi airbyte devs! I have been following the project...
# contributing-to-airbyte
g
Hi airbyte devs! I have been following the project for a while but today I got the chance to do a deeper dive. Kudos to you all 💯 At first when I saw that airbyte is using temporal I expected all the connector code to be implemented and registered as temporal activities (using temporal python sdk) deployed independently in containers. I would have imagined the Airbyte python CDK would have wrapped around the third party connector code as a temporal activity. This would mean a temporal workflow can invoke connector activities (spec, check, discover, read) and they would be picked up by a temporal worker, retried if they fail, be cancellable ect.. But looks like temporal activities are used to run containerized connector code but the underlying connector code does not communicate with temporal directly anymore. I think for each TemporalAttemptExecution a new container is created and the connector code is executed inside the newly recreated container. If the connector code fails in the container temporal does not know about it, and therefore can't retry the activity, handle cancellations and all the other nice things temporal does.. Also some of the container/process communication needs to be re-invented as this comes for free with temporal. For example there now needs to be manual logic to cancel a running connector process. Did I get this right? This makes me think that maybe I am overestimating temporal’s benefits? Is there a backstory in why connector code is not implemented as temporal activities? My guess would be that temporal python sdk was not mature enough (maybe still not) when airbyte was built.
Python is one of the most popular languages for writing connectors. Temporal only has Go/Java/PHP/Typescript SDKs (there's an unofficial Python SDK and an official one in the works). The direction we went for choosing a containerized CLI as the interface was deliberate. It allows contributors to implement connectors in any language and without needing to know any other tools/platforms. We didn't implement it as a server, a temporal worker, etc specifically because we didn't want to make that pre-requisite knowledge for contributors to implement a connector. Since all of the connector operations are containerized CLIs, the approach we've taken is to launch and external process (such as a pod on Kubernetes or just a container on Docker) to run the CLI, and we have some other process handle passing data between a source and destination. We do use Temporal to handle retries, cancellations, etc (that propagate to the source/destination for a sync), but it's at the level of that external process.
Thank you @Jared Rhizor (Airbyte) for taking time and answering my question!
Temporal provides resiliency at Workflow level. The activities themselves need to be able to handle failures and be idempotent as well. For example activity running the container can occasionally save progress of the state to the DB. on the replay of the workflow even if a new container is created . replication can continue from that point. Temporal ensures the continuation of the workflow even if the workers fail as in it does not run all the previously successfully returned activities.