Is there any sizing guidelines for deployment e g KAFKA and DataHub #getting-started

Is there any sizing guidelines for deployment? e.g...

rough-zoo-50278

11/09/2021, 11:06 AM

Is there any sizing guidelines for deployment? e.g. KAFKA and other dependencies? And differences between options: when do I need KAFKA, when Neo4J or Elastic? Any recommendations for a "evaluation" sized setup

plus1 2

big-carpet-38439

11/09/2021, 6:16 PM

@early-lamp-41924 Can provide the details about what we are hosting demo.datahubproject.io with! It should suffice for an evaluation

early-lamp-41924

11/09/2021, 6:23 PM

Hi! We created a datahub-prerequisites chart to help folks quickstart on kubernetes. We have tested that it is able to handle up to 300K records. Please refer to our kubernetes guide https://datahubproject.io/docs/deploy/kubernetes On the components side, kafka and elasticsearch is a required component for DataHub. We have implemented our graph storage using neo4j and elasticsearch. Some of the community members wanted the extensive capabilities of neo4j to do advanced graph operations. However, DataHub internally does not use any complicated graph queries, which is why we also gave an option to use elasticsearch as the graph storage layer, reducing required components by one. https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/quickstart-values-without-neo4j.yaml#L46 Refer to this chart on how to point to elasticsearch instead of neo4j. If you do so, you can set neo4j-community.enabled to false in the values.yaml for prerequisites chart.

❤️ 1

Open in Slack

Previous Next