Hi Guys,
I'm relatively new at Airbyte, have it running in EC2, pulling data from various Oracle/Postgres/Custom sources into Snowflake. Up until now this has been a POC and I haven't needed to think too much about the infrastructure side, but now we are looking to push it into production I'd like some advice 🙂
Background: I will be deploying:
• On AWS EC2/Docker
• 50+ Source connectors (35 Oracle, 15 Custom via API/JSON, 3 Postgres
• 1 Destination connector (Snowflake)
• Each of these connections will be staggered (mostly to not overwhelm the source vCPU)
1. Is there a preferred instance family best suited for Airbyte (currently I have it on the T family)?
2. Re resourcing, I'm unclear how to calculate the Disk & CPU required based on existing pipelines. I have some existing connections which have ran I can calculate from if I know a basic formula - is there such a thing?
3. So far I have used the in-docker Postgres database. Is there any way to port across existing config/state from the local machine (other than some clunky export/import)?