I just setup DataHub in our OpenShift cluster and ...
# troubleshoot
e
I just setup DataHub in our OpenShift cluster and tried to create 2 Ingestion Sources (MongoDB and PostGresQL) and both of them error out here: "ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /config (Caused by " "NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2d5efda220>: Failed to establish a new connection: [Errno 111] " "Connection refused'))\n" '[2022-06-13 220247,811] INFO {datahub.entrypoints:176} - DataHub CLI version: 0.8.38 at ' '/tmp/datahub/ingest/venv-0cd8b528-5f0d-4489-a7b5-c91393ca674a/lib/python3.9/site-packages/datahub/__init__.py\n' '[2022-06-13 220247,811] INFO {datahub.entrypoints:179} - Python version: 3.9.9 (main, Dec 21 2021, 100334) \n' '[GCC 10.2.1 20210110] at /tmp/datahub/ingest/venv-0cd8b528-5f0d-4489-a7b5-c91393ca674a/bin/python3 on ' 'Linux-4.18.0-305.25.1.el8_4.x86_64-x86_64-with-glibc2.31\n' '[2022-06-13 220247,811] INFO {datahub.entrypoints:182} - GMS config {}\n', "2022-06-13 220248.526584 [exec_id=0cd8b528-5f0d-4489-a7b5-c91393ca674a] INFO: Failed to execute 'datahub ingest'", '2022-06-13 220248.527164 [exec_id=0cd8b528-5f0d-4489-a7b5-c91393ca674a] INFO: Caught exception EXECUTING ' 'task_id=0cd8b528-5f0d-4489-a7b5-c91393ca674a, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n' ' self.event_loop.run_until_complete(task_future)\n' ' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n' ' return f.result()\n' ' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n' ' raise self._exception\n' ' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n' ' result = coro.send(None)\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n' ' raise TaskError("Failed to execute \'datahub ingest\'")\n' "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]} Execution finished with errors. I don't see any locahost:8080 anywhere in my original values.yaml Helm chart config. I configured both ingestion sources from UI. Any thoughts on what is going on? BTW, I am using a proxy and so I had to set PIP_PROXY on acryl-datahub-actions pod.
1
b
hey Chris! would you mind posting your ingestion recipe here for us to check out?
e
Sure, this is the mongo recipe: source: type: mongodb config: connect_uri: 'mongodb://database1' enableSchemaInference: true useRandomSampling: true maxSchemaSize: 300
And PostGres: source: type: postgres config: host_port: 'postgresql:5432' database: dagster username: '${dagster-postgres-secret}' password: '${dagster-postgres-secret}' include_tables: true include_views: true profiling: enabled: false
Note I was confused about the secrets and what variable names to use so that might be wrong. The secret is named dagster-postgres-secret
Also, it seems to be constantly reruning pip install for all the deps during every poll event, is that expected?
b
Okay gotcha.. we’ve seen issues with secrets that have dashes for some reason. Would you mind updating your secret and recipe to use ‘DAGSTER_POSTGRES_SECRET’ and try? This definitely could be a red herring though so I apologize if that doesn’t help
e
Will do
Sorry that was indeed a red herring: result is same.
Another couple pieces of info: 1. I did not know what zookeeper is used for in DataHub and so I commented that out since I use RedPanda for my Kafka service. 2. The datahub-upgrade-job had 3 pods running but only 1 succeeded. The other 2 errored out but are now gone so I am not sure why they failed.
@bulky-soccer-26729 any more thoughts on where to look? Any logs that show where the config is coming from during ingestor launch?
b
shoot this must have slipped by me I'm so sorry for the delay!
thinking a bit more right now
so you don't have
sink
explicitly set in your recipes, right?
e
Correct - none specified
b
could you try setting a
sink
and let's try http://datahub-gms:8080 instead of localhost:8080. This might be a docker thing
e
Should I add one back?
will do
👍 1
Ok that was it! Now it seems to be collecting data. Fails at the end of the ingestion with apparently a killed process. Is there a timeout somewhere that can be set to extend the ingestion time?
b
okay that's good! one thing solved at least lol
would you mind posting the logs for the failure as well?
e
The MongoDB ingesting ends this way. '[2022-06-14 213335,623] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit normalized.XXXX-2B\n' '/usr/local/bin/run_ingest.sh: line 26: 2819 Killed ( python3 -m datahub ingest -c "$4/$1.yml" )\n', "2022-06-14 213337.546728 [exec_id=a1cae71f-eb17-4591-be41-900a9a792f38] INFO: Failed to execute 'datahub ingest'", '2022-06-14 213337.547390 [exec_id=a1cae71f-eb17-4591-be41-900a9a792f38] INFO: Caught exception EXECUTING ' 'task_id=a1cae71f-eb17-4591-be41-900a9a792f38, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n' ' self.event_loop.run_until_complete(task_future)\n' ' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n' ' return f.result()\n' ' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n' ' raise self._exception\n' ' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n' ' result = coro.send(None)\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n' ' raise TaskError("Failed to execute \'datahub ingest\'")\n' "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]} Execution finished with errors.
b
hm so obviously something is going on with this line where it gets killed
( python3 -m datahub ingest -c "$4/$1.yml" )
and you said it's ingesting some data but gets killed part way through?
i'm wondering if this is a memory issue as well, as far as I know this shouldn't time out
e
Yes you got it. Let me try increasing mem. Our instance is severly restricted by default.
b
nice hopefully that does it for you!
thank you 1
e
Yay both now Succeeded!
b
great news!