Hello friends in my company we did an initial POC ...
# getting-started
s
Hello friends in my company we did an initial POC with DataHub and we are very excited. Now we are moving to a more ambitious test so we want to setup it in an EC2 in AWS and start using it more broadly across the company. Is there any specification for an ideal minimum setup for production use? Which is the ideal size for an EC3? Is the
datahub docker quickstart
setup enough or I should consider other ways?
i
Hello Victor, We do not have an EC2 deployment guide. Typically we deploy datahub as a kubernetes setup. You can check our helm charts https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/values.yaml to see how much we recommend starting out. Once a system is deployed it depends heavily on your usage. For that we just provision metrics systems and dashboarding like grafana to continuously monitor de system and scale as needed.
That said, experience has given me the intuition that if you use managed systems (say AWS RDS, MSK & OpenSearch) for the stateful systems/databases
You can then be fairly conservative with frontend (4GB, 1- cores) and give GMS something like (8GB, 2-4 cores) and be quite well served
b
+1!
GMS and actions pods are by far the most CPU / memory intensive
But still not too much. I've seen 1 CPU and 4GB ram more than enough for GMS up to about 100k data assets
s
Thank you @incalculable-ocean-74010 I will be pushing to use K8s for this although I think I will have to start with an EC2 and a docker-compose. What makes me a bit afraid is starting with this setup adding a lot of info and then not being able to migrate this info if I change the setup to K8s although I think it should be possible.
i
You can always run the databases in managed services. Something like AWS RDS, MSK and OpenSearch, deploy only datahub specific containers and set connection variables to those managed services. Then if you migrate to kubernetes you can do the same, only deploy the stateless, datahub specific containers and benefit by having the state in the same exact place: AWS’s managed services
thank you 1