Hey all welcome to the channel! I wanted to preface with a couple of goals that lead us down this path:
Our top requirements:
• Create our own connectors with an established development framework.
• Move as much data as we need, without row count limits.
• Gain organizational knowledge in GKE, Composer, and Terraform deployments.
We learned a lot from this project, and the result had some pretty amazing outcomes!
Costs and my thoughts on capacity & cost:
• Pricing (with orchestration) - ~$1k/mo cloud costs.
◦ Standalone to move data only is ~$600/mo
• How much data are we moving?
◦ At peak we had this deployment moving ~100GiB/mo ingestion. (This excludes later-stage transforms).
◦ My estimate is ~200M rows/month (based on final product table sizes and average row count/GiB).
• We observed no operational bottlenecks, and at load were only really limited by the poll rate or return speed of 3rd party APIs.
• We observed a performance improvement when moving data in JSON raw to GCS, after using BigQuery (denormalized) connector for a time.
I've worked with other players in the market of moving data:
• Stitchdata
• Tibco Scribe
• Azure Data Factory
• Google Cloud Fusion
• Hevo Data
I've been in pricing talks with some of the big players in 2018/2019:
• TIBCO
• Dell Boomi
Nowhere else have I seen the ability to move this data at the same volumes and price that we achieved here! airbyte
Hit up @Yashkumar Makwana for any questions on developing for connectors for Airbyte!
z
Zach Brak
12/07/2022, 6:26 PM
message has been deleted
z
Zach Brak
12/08/2022, 2:44 PM
Have been getting a few requests for the source material of our talk, so here it is!