AWS deployment question: Where can we found in Ai...
# ask-community-for-troubleshooting
m
AWS deployment question: Where can we found in Airbyte docs a relationship between EC2 instance type and number of jobs that will run in order to know when to scale the machine? Doc says:
Copy code
For long-running Airbyte installations, we recommend a t2.large instance
What is the concept of long-running? That same doc says t3.medium is enough for testing purposes. Currently we have a t3.medium instance and RDS db.t3.micro. So I would like to know which setup should be used regarding EC2 and RDS in order to avoid issues due to the instance type.
1
o
loading...
a
Hi @Miquel Rius sizing your RDS database mainly depends on the number of connections you want to handle and of the frequency of the job run. The more connections you have or the more job you run the bigger the database needs to be in term of disk size. For the EC2 instance itself, the limiting factor is currently memory. Do you have a rough estimate of the data volume you plan to handle?
m
Hi @[DEPRECATED] Augustin Lafanechere Regarding data volume: A rough and not accurated estimation of data in our external_data schemas are 90gb. Some sources are not used anymore but it could be a good estimation of what we can finally have when all migration is completed to airbyte. 1. Is there anywhere which I can find the relationship between data volume instance type? Because it’s not only depends on volume also frequency and event coincidence of different jobs running at the same time, right? Regarding RDS, I understand as much frequency and number of connections bigger RDS needed, but which is the minimum or the tresholds of each instance?
A rough and not accurated estimation of data in our external_data schemas are 90gb. Some sources are not used anymore but it could be a good estimation of what we can finally have when all migration is completed to airbyte.
a
Regarding RDS, I understand as much frequency and number of connections bigger RDS needed, but which is the minimum or the tresholds of each instance?
I'd suggest starting with a SSD of ~50GB
1. Is there anywhere which I can find the relationship between data volume instance type?
No because it really depends of the shape of your data. If you have a lot of small records the sync will have a "light" memory footprint on your instance. If you have "fat" rows the sync buffer will grow in term of memory. Feel free to check this documentation section for more details. I think that
t3.medium
could be a bit small for a starter, if you want to keep t3 types I'd suggest you rather chose
t3.xlarge
m
Thank you @Augustin Lafanechere (Airbyte) . Regarding the dic you attached its impossible to open the url. Can u resend it again?