I just setup DataHub in our OpenShift cluster and tried to c DataHub #troubleshoot

I just setup DataHub in our OpenShift cluster and ...

echoing-pillow-41000

06/13/2022, 10:09 PM

I just setup DataHub in our OpenShift cluster and tried to create 2 Ingestion Sources (MongoDB and PostGresQL) and both of them error out here: "ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /config (Caused by " "NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2d5efda220>: Failed to establish a new connection: [Errno 111] " "Connection refused'))\n" '[2022-06-13 220247,811] INFO {datahub.entrypoints:176} - DataHub CLI version: 0.8.38 at ' '/tmp/datahub/ingest/venv-0cd8b528-5f0d-4489-a7b5-c91393ca674a/lib/python3.9/site-packages/datahub/__init__.py\n' '[2022-06-13 220247,811] INFO {datahub.entrypoints:179} - Python version: 3.9.9 (main, Dec 21 2021, 100334) \n' '[GCC 10.2.1 20210110] at /tmp/datahub/ingest/venv-0cd8b528-5f0d-4489-a7b5-c91393ca674a/bin/python3 on ' 'Linux-4.18.0-305.25.1.el8_4.x86_64-x86_64-with-glibc2.31\n' '[2022-06-13 220247,811] INFO {datahub.entrypoints:182} - GMS config {}\n', "2022-06-13 220248.526584 [exec_id=0cd8b528-5f0d-4489-a7b5-c91393ca674a] INFO: Failed to execute 'datahub ingest'", '2022-06-13 220248.527164 [exec_id=0cd8b528-5f0d-4489-a7b5-c91393ca674a] INFO: Caught exception EXECUTING ' 'task_id=0cd8b528-5f0d-4489-a7b5-c91393ca674a, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n' ' self.event_loop.run_until_complete(task_future)\n' ' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n' ' return f.result()\n' ' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n' ' raise self._exception\n' ' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n' ' result = coro.send(None)\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n' ' raise TaskError("Failed to execute \'datahub ingest\'")\n' "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]} Execution finished with errors. I don't see any locahost:8080 anywhere in my original values.yaml Helm chart config. I configured both ingestion sources from UI. Any thoughts on what is going on? BTW, I am using a proxy and so I had to set PIP_PROXY on acryl-datahub-actions pod.

✅ 1

bulky-soccer-26729

06/13/2022, 10:14 PM

hey Chris! would you mind posting your ingestion recipe here for us to check out?

echoing-pillow-41000

06/13/2022, 10:16 PM

Sure, this is the mongo recipe: source: type: mongodb config: connect_uri: 'mongodb://database1' enableSchemaInference: true useRandomSampling: true maxSchemaSize: 300

echoing-pillow-41000

06/13/2022, 10:17 PM

And PostGres: source: type: postgres config: host_port: 'postgresql:5432' database: dagster username: '${dagster-postgres-secret}' password: '${dagster-postgres-secret}' include_tables: true include_views: true profiling: enabled: false

echoing-pillow-41000

06/13/2022, 10:18 PM

Note I was confused about the secrets and what variable names to use so that might be wrong. The secret is named dagster-postgres-secret

echoing-pillow-41000

06/13/2022, 10:33 PM

Also, it seems to be constantly reruning pip install for all the deps during every poll event, is that expected?

bulky-soccer-26729

06/13/2022, 10:40 PM

Okay gotcha.. we’ve seen issues with secrets that have dashes for some reason. Would you mind updating your secret and recipe to use ‘DAGSTER_POSTGRES_SECRET’ and try? This definitely could be a red herring though so I apologize if that doesn’t help

echoing-pillow-41000

06/13/2022, 10:41 PM

Will do

echoing-pillow-41000

06/13/2022, 11:20 PM

Sorry that was indeed a red herring: result is same.

echoing-pillow-41000

06/14/2022, 10:54 AM

Another couple pieces of info: 1. I did not know what zookeeper is used for in DataHub and so I commented that out since I use RedPanda for my Kafka service. 2. The datahub-upgrade-job had 3 pods running but only 1 succeeded. The other 2 errored out but are now gone so I am not sure why they failed.

echoing-pillow-41000

06/14/2022, 8:44 PM

@bulky-soccer-26729 any more thoughts on where to look? Any logs that show where the config is coming from during ingestor launch?

bulky-soccer-26729

06/14/2022, 8:52 PM

shoot this must have slipped by me I'm so sorry for the delay!

bulky-soccer-26729

06/14/2022, 8:53 PM

thinking a bit more right now

bulky-soccer-26729

06/14/2022, 9:00 PM

so you don't have

sink

explicitly set in your recipes, right?

echoing-pillow-41000

06/14/2022, 9:00 PM

Correct - none specified

bulky-soccer-26729

06/14/2022, 9:01 PM

could you try setting a

sink

and let's try http://datahub-gms:8080 instead of localhost:8080. This might be a docker thing

echoing-pillow-41000

06/14/2022, 9:01 PM

Should I add one back?

echoing-pillow-41000

06/14/2022, 9:01 PM

will do

👍 1

echoing-pillow-41000

06/14/2022, 9:34 PM

Ok that was it! Now it seems to be collecting data. Fails at the end of the ingestion with apparently a killed process. Is there a timeout somewhere that can be set to extend the ingestion time?

bulky-soccer-26729

06/14/2022, 9:34 PM

okay that's good! one thing solved at least lol

bulky-soccer-26729

06/14/2022, 9:35 PM

would you mind posting the logs for the failure as well?

echoing-pillow-41000

06/14/2022, 9:45 PM

The MongoDB ingesting ends this way. '[2022-06-14 213335,623] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit normalized.XXXX-2B\n' '/usr/local/bin/run_ingest.sh: line 26: 2819 Killed ( python3 -m datahub ingest -c "$4/$1.yml" )\n', "2022-06-14 213337.546728 [exec_id=a1cae71f-eb17-4591-be41-900a9a792f38] INFO: Failed to execute 'datahub ingest'", '2022-06-14 213337.547390 [exec_id=a1cae71f-eb17-4591-be41-900a9a792f38] INFO: Caught exception EXECUTING ' 'task_id=a1cae71f-eb17-4591-be41-900a9a792f38, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n' ' self.event_loop.run_until_complete(task_future)\n' ' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n' ' return f.result()\n' ' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n' ' raise self._exception\n' ' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n' ' result = coro.send(None)\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n' ' raise TaskError("Failed to execute \'datahub ingest\'")\n' "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]} Execution finished with errors.

bulky-soccer-26729

06/14/2022, 9:52 PM

hm so obviously something is going on with this line where it gets killed

( python3 -m datahub ingest -c "$4/$1.yml" )

bulky-soccer-26729

06/14/2022, 9:52 PM

and you said it's ingesting some data but gets killed part way through?

bulky-soccer-26729

06/14/2022, 9:53 PM

i'm wondering if this is a memory issue as well, as far as I know this shouldn't time out

echoing-pillow-41000

06/14/2022, 9:55 PM

Yes you got it. Let me try increasing mem. Our instance is severly restricted by default.

bulky-soccer-26729

06/14/2022, 9:55 PM

nice hopefully that does it for you!

thank you 1

echoing-pillow-41000

06/14/2022, 10:02 PM

Yay both now Succeeded!

bulky-soccer-26729

06/14/2022, 10:09 PM

great news!

10 Views

Open in Slack

Previous Next