I just setup DataHub in our OpenShift cluster and ...
# troubleshoot
I just setup DataHub in our OpenShift cluster and tried to create 2 Ingestion Sources (MongoDB and PostGresQL) and both of them error out here: "ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /config (Caused by " "NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2d5efda220>: Failed to establish a new connection: [Errno 111] " "Connection refused'))\n" '[2022-06-13 220247,811] INFO {datahub.entrypoints:176} - DataHub CLI version: 0.8.38 at ' '/tmp/datahub/ingest/venv-0cd8b528-5f0d-4489-a7b5-c91393ca674a/lib/python3.9/site-packages/datahub/__init__.py\n' '[2022-06-13 220247,811] INFO {datahub.entrypoints:179} - Python version: 3.9.9 (main, Dec 21 2021, 100334) \n' '[GCC 10.2.1 20210110] at /tmp/datahub/ingest/venv-0cd8b528-5f0d-4489-a7b5-c91393ca674a/bin/python3 on ' 'Linux-4.18.0-305.25.1.el8_4.x86_64-x86_64-with-glibc2.31\n' '[2022-06-13 220247,811] INFO {datahub.entrypoints:182} - GMS config {}\n', "2022-06-13 220248.526584 [exec_id=0cd8b528-5f0d-4489-a7b5-c91393ca674a] INFO: Failed to execute 'datahub ingest'", '2022-06-13 220248.527164 [exec_id=0cd8b528-5f0d-4489-a7b5-c91393ca674a] INFO: Caught exception EXECUTING ' 'task_id=0cd8b528-5f0d-4489-a7b5-c91393ca674a, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n' ' self.event_loop.run_until_complete(task_future)\n' ' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n' ' return f.result()\n' ' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n' ' raise self._exception\n' ' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n' ' result = coro.send(None)\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n' ' raise TaskError("Failed to execute \'datahub ingest\'")\n' "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]} Execution finished with errors. I don't see any locahost:8080 anywhere in my original values.yaml Helm chart config. I configured both ingestion sources from UI. Any thoughts on what is going on? BTW, I am using a proxy and so I had to set PIP_PROXY on acryl-datahub-actions pod.
hey Chris! would you mind posting your ingestion recipe here for us to check out?
Sure, this is the mongo recipe: source: type: mongodb config: connect_uri: 'mongodb://database1' enableSchemaInference: true useRandomSampling: true maxSchemaSize: 300
And PostGres: source: type: postgres config: host_port: 'postgresql:5432' database: dagster username: '${dagster-postgres-secret}' password: '${dagster-postgres-secret}' include_tables: true include_views: true profiling: enabled: false
Note I was confused about the secrets and what variable names to use so that might be wrong. The secret is named dagster-postgres-secret
Also, it seems to be constantly reruning pip install for all the deps during every poll event, is that expected?
Okay gotcha.. we’ve seen issues with secrets that have dashes for some reason. Would you mind updating your secret and recipe to use ‘DAGSTER_POSTGRES_SECRET’ and try? This definitely could be a red herring though so I apologize if that doesn’t help
Will do
Sorry that was indeed a red herring: result is same.
Another couple pieces of info: 1. I did not know what zookeeper is used for in DataHub and so I commented that out since I use RedPanda for my Kafka service. 2. The datahub-upgrade-job had 3 pods running but only 1 succeeded. The other 2 errored out but are now gone so I am not sure why they failed.
@bulky-soccer-26729 any more thoughts on where to look? Any logs that show where the config is coming from during ingestor launch?
shoot this must have slipped by me I'm so sorry for the delay!
thinking a bit more right now
so you don't have
explicitly set in your recipes, right?
Correct - none specified
could you try setting a
and let's try http://datahub-gms:8080 instead of localhost:8080. This might be a docker thing
Should I add one back?
will do
👍 1
Ok that was it! Now it seems to be collecting data. Fails at the end of the ingestion with apparently a killed process. Is there a timeout somewhere that can be set to extend the ingestion time?
okay that's good! one thing solved at least lol
would you mind posting the logs for the failure as well?
The MongoDB ingesting ends this way. '[2022-06-14 213335,623] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit normalized.XXXX-2B\n' '/usr/local/bin/run_ingest.sh: line 26: 2819 Killed ( python3 -m datahub ingest -c "$4/$1.yml" )\n', "2022-06-14 213337.546728 [exec_id=a1cae71f-eb17-4591-be41-900a9a792f38] INFO: Failed to execute 'datahub ingest'", '2022-06-14 213337.547390 [exec_id=a1cae71f-eb17-4591-be41-900a9a792f38] INFO: Caught exception EXECUTING ' 'task_id=a1cae71f-eb17-4591-be41-900a9a792f38, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n' ' self.event_loop.run_until_complete(task_future)\n' ' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n' ' return f.result()\n' ' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n' ' raise self._exception\n' ' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n' ' result = coro.send(None)\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n' ' raise TaskError("Failed to execute \'datahub ingest\'")\n' "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]} Execution finished with errors.
hm so obviously something is going on with this line where it gets killed
( python3 -m datahub ingest -c "$4/$1.yml" )
and you said it's ingesting some data but gets killed part way through?
i'm wondering if this is a memory issue as well, as far as I know this shouldn't time out
Yes you got it. Let me try increasing mem. Our instance is severly restricted by default.
nice hopefully that does it for you!
thank you 1
Yay both now Succeeded!
great news!