Have anyone experienced problems with ingesting da...
# ingestion
a
Have anyone experienced problems with ingesting data to datahub instance deployed on AWS/EKS (following datahub tutorial). I'm having problem with ingesting data from both Tableau and Okta. These flows work perfectly fine locally for the same recipes. The only thing that is different is where dh is deployed. Anyone had a similar problem?
h
Hi @adamant-rain-51672 what problem are you facing ? Is there any particular error that you see ?
a
For Tableau it complains about user credentials (the same creds work perfectly fine locally). For Okta, here's an error:
Copy code
~~~~ Execution Summary ~~~~

RUN_INGEST - {'errors': [],
 'exec_id': '894f9189-bbb0-4d44-8dfb-2a7056fd6e65',
 'infos': ['2022-09-05 19:28:16.998654 [exec_id=894f9189-bbb0-4d44-8dfb-2a7056fd6e65] INFO: Starting execution for task with name=RUN_INGEST',
           '2022-09-05 19:28:40.653581 [exec_id=894f9189-bbb0-4d44-8dfb-2a7056fd6e65] INFO: stdout=Requirement already satisfied: pip in '
           '/tmp/datahub/ingest/venv-894f9189-bbb0-4d44-8dfb-2a7056fd6e65/lib/python3.9/site-packages (21.2.4)\n'

[...PACKAGE INSTALLATION...]

           '[2022-09-05 19:28:39,965] INFO     {datahub.ingestion.run.pipeline:163} - Sink configured successfully. DataHubRestEmitter: configured '
           'to talk to <http://datahub-datahub-gms:8080>\n'
           '[2022-09-05 19:28:40,128] INFO     {datahub.cli.ingest_cli:119} - Starting metadata ingestion\n'
           '[2022-09-05 19:28:40,129] INFO     {datahub.cli.ingest_cli:123} - Source (okta) report:\n'
           "{'workunits_produced': '0',\n"
           " 'workunit_ids': [],\n"
           " 'warnings': {},\n"
           " 'failures': {},\n"
           " 'cli_version': '0.8.43',\n"
           " 'cli_entry_location': '/tmp/datahub/ingest/venv-894f9189-bbb0-4d44-8dfb-2a7056fd6e65/lib/python3.9/site-packages/datahub/__init__.py',\n"
           " 'py_version': '3.9.9 (main, Dec 21 2021, 10:03:34) \\n[GCC 10.2.1 20210110]',\n"
           " 'py_exec_path': '/tmp/datahub/ingest/venv-894f9189-bbb0-4d44-8dfb-2a7056fd6e65/bin/python3',\n"
           " 'os_details': 'Linux-5.4.209-116.363.amzn2.x86_64-x86_64-with-glibc2.31',\n"
           " 'filtered': []}\n"
           '[2022-09-05 19:28:40,130] INFO     {datahub.cli.ingest_cli:126} - Sink (datahub-rest) report:\n'
           "{'records_written': '0', 'warnings': [], 'failures': [], 'gms_version': 'v0.8.43'}\n"
           '[2022-09-05 19:28:40,418] ERROR    {datahub.entrypoints:188} - Command failed with There is no current event loop in thread '
           "'asyncio_0'.. Run with --debug to get full trace\n"
           '[2022-09-05 19:28:40,418] INFO     {datahub.entrypoints:191} - DataHub CLI version: 0.8.43 at '
           '/tmp/datahub/ingest/venv-894f9189-bbb0-4d44-8dfb-2a7056fd6e65/lib/python3.9/site-packages/datahub/__init__.py\n',
           "2022-09-05 19:28:40.654203 [exec_id=894f9189-bbb0-4d44-8dfb-2a7056fd6e65] INFO: Failed to execute 'datahub ingest'",
           '2022-09-05 19:28:40.654552 [exec_id=894f9189-bbb0-4d44-8dfb-2a7056fd6e65] INFO: Caught exception EXECUTING '
           'task_id=894f9189-bbb0-4d44-8dfb-2a7056fd6e65, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 121, in execute_task\n'
           '    self.event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
           '    return f.result()\n'
           '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
           '    raise self._exception\n'
           '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
           '    result = coro.send(None)\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
false
h
For tableau - are you using secrets ? Have you added same secrets in AWS/EKS as well
a
I'm using env vars defined in the UI