Hey all, we are currently trying to deploy Datahub...
# all-things-deployment
a
Hey all, we are currently trying to deploy Datahub on a OnPrem Kubernetes Environment and not sure about some points: (1) External MySQL Size: Any suggestions how much CPU/RAM/Storage is expected? We have one Datahub Instance with 4k Tables and DBeaver shows only 16mb consumed. But not sure about CPU and RAM. The userbase would be just a few people. (2) Index&Graph Database: Is there any kind of highly recommended metadata we would lose if we don't host the elasticsearch as an external service and restore it from the mysql data? (3) MSSQL ODBC Connection: Currently we try to install the PyODBC + driver through Kubernetes postStart lifecycle running the pod as root, any suggestion if there is a more elegant way without building a custom actions container or manual shell login+install? Thanks!
šŸ“– 1
šŸ” 1
l
Hey there šŸ‘‹ I'm The DataHub Community Support bot. I'm here to help make sure the community can best support you with your request. Let's double check a few things first: āœ… There's a lot of good information on our docs site: www.datahubproject.io/docs, Have you searched there for a solution? āœ… button āœ… It's not uncommon that someone has run into your exact problem before in the community. Have you searched Slack for similar issues? āœ… button Did you find a solution to your issue? āŒ Sorry you weren't able to find a solution. I'm sending you some tips on info you can provide to help the community troubleshoot. Whenever you feel your issue is solved, please react āœ… to your original message to let us know!
a
I think the docs have answers to the expected resources for each pod https://datahubproject.io/docs/deploy/kubernetes/
As for the other questions I’m not entirely sure- could you elaborate on them a bit?
a
Hey Paul thanks for your response! Sadly I can't find any information about the prequerities just 7GB Ram to run the whole datahub in minikube, also no hints inside the values file for mysql A bit more clarification to the other points: (2) Since we are on prem, we need to request machines from the it service to store the external databases. To make this a bit less complex we were wondering if we could just outsource the mysql and keep elasticsearch inside kubernetes. The documentation says if the delete the elasticsearch component dataset profiles and timeseries will be lost. If we don't care about dataset usage or changes over time at the moment, would there be other critical information which we couldn't restore from the mysql backup? (3) We were struggling to get MSSQL with encrypted connection running. Since we wanted to give a better user experience to the main users we wanted to make it possible through UI-Ingestion. To enhance the actions container we currently added a postStart parameter into the helm chart which installs the mssql driver and rewrites the script which is creating the ingestion venvs. There was no other way to make pyodbc available inside the mssql venv (there is a mysql-odbc option inside the setup.py but it is currently excluded). This seems a bit dirty and we were wondering if there is a better alternative.