Hello. Has anyone got any advice about real-world ...
# all-things-deployment
p
Hello. Has anyone got any advice about real-world Elasticsearch cluster sizes and index sizes to share please? We're currently planning a deployment of DataHub from scratch, with at most a few 10s of thousands of datasets from Hive, Druid, Cassandra etc. I'm also considering hosting the graph database on Elasticsearch as well, as opposed to Neo4J. I'm looking at a 3-node Elasticsearch cluster for high-availability purposes, but I wondered if anyone could share any experiences of their experiences in sizing an Elasticsearch cluster for a similar workload, to make sure I'm not massively over or under speccing it. Thanks.
i
Hello Ben 👋 I don’t have exact sizes for you, it is always hard to spec these things out. Especially considering that Elasticsearch or Neo4j are going to be used for search capabilities so it depends a lot on how much usage DataHub is going to have… If I may make a suggestion I would recommend starting out with a managed ElasticSearch solution and delegate scaling to expert vendors at least initially until you hit a steady state.
Out of curiosity is this a theoretical question or are you moving DataHub to a prod scenario and worried about scaling?