Airbyte #movedata-airbyte-orchestration-in-gcp

Chris Rose (Airbyte)

12/06/2022, 3:54 AM

message has been deleted

Zach Brak

12/06/2022, 2:14 PM

Hey all welcome to the channel! I wanted to preface with a couple of goals that lead us down this path: Our top requirements: • Create our own connectors with an established development framework. • Move as much data as we need, without row count limits. • Gain organizational knowledge in GKE, Composer, and Terraform deployments. We learned a lot from this project, and the result had some pretty amazing outcomes! Costs and my thoughts on capacity & cost: • Pricing (with orchestration) - ~$1k/mo cloud costs. ◦ Standalone to move data only is ~$600/mo • How much data are we moving? ◦ At peak we had this deployment moving ~100GiB/mo ingestion. (This excludes later-stage transforms). ◦ My estimate is ~200M rows/month (based on final product table sizes and average row count/GiB). • We observed no operational bottlenecks, and at load were only really limited by the poll rate or return speed of 3rd party APIs. • We observed a performance improvement when moving data in JSON raw to GCS, after using BigQuery (denormalized) connector for a time. I've worked with other players in the market of moving data: • Stitchdata • Tibco Scribe • Azure Data Factory • Google Cloud Fusion • Hevo Data I've been in pricing talks with some of the big players in 2018/2019: • TIBCO • Dell Boomi Nowhere else have I seen the ability to move this data at the same volumes and price that we achieved here! airbyte Hit up @Yashkumar Makwana for any questions on developing for connectors for Airbyte!

Zach Brak

12/07/2022, 6:26 PM

message has been deleted

Zach Brak

12/08/2022, 2:44 PM

Have been getting a few requests for the source material of our talk, so here it is!

Pliancy - Airbyte Orchestration in GCP.pdf

Chris Rose (Airbyte)

04/26/2023, 7:11 PM

archived the channel