Hi Everyone I am running the quickstart on an Amazon Linux 2 DataHub #troubleshoot

Hi Everyone! I am running the quickstart on an Ama...

cuddly-butcher-39945

07/28/2022, 7:35 PM

Hi Everyone! I am running the quickstart on an Amazon Linux 2 T3 Large Instance. • DataHub CLI version: 0.8.41.2 • Python version: 3.7.10 (default, Jun 3 2021, 000201) • [GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] I've confirmed my Snowflake Credentials are correct, but keep getting this error on a manual ingestion.

Copy code

'Failed to configure source (snowflake) due to pipeline_name must be provided if stateful ingestion is enabled.\n',

I've never seen anything about a pipeline_name being needed. Please let me know if there is something you can think of to help me get my first ingestion to complete 🙂 More context around this error:

Copy code

'[2022-07-28 19:30:04,041] INFO     {datahub.cli.ingest_cli:99} - DataHub CLI version: 0.8.41\n'
           '[2022-07-28 19:30:04,091] INFO     {datahub.ingestion.run.pipeline:160} - Sink configured successfully. DataHubRestEmitter: configured '
           'to talk to <http://datahub-gms:8080>\n'
           '[2022-07-28 19:30:06,244] INFO     {datahub.ingestion.source_config.sql.snowflake:231} - using authenticator type '
           "'DEFAULT_AUTHENTICATOR'\n"
           '[2022-07-28 19:30:06,244] ERROR    {datahub.ingestion.run.pipeline:126} - pipeline_name must be provided if stateful ingestion is '
           'enabled.\n'
           '[2022-07-28 19:30:06,244] INFO     {datahub.cli.ingest_cli:115} - Starting metadata ingestion\n'
           '[2022-07-28 19:30:06,244] INFO     {datahub.cli.ingest_cli:133} - Finished metadata pipeline\n'
           '\n'
           'Failed to configure source (snowflake) due to pipeline_name must be provided if stateful ingestion is enabled.\n',
           "2022-07-28 19:30:08.102905 [exec_id=70cf693c-d13f-42a5-b0fd-04ca739e33b4] INFO: Failed to execute 'datahub ingest'",
           '2022-07-28 19:30:08.103382 [exec_id=70cf693c-d13f-42a5-b0fd-04ca739e33b4] INFO: Caught exception EXECUTING '
           'task_id=70cf693c-d13f-42a5-b0fd-04ca739e33b4, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
           '    self.event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
           '    return f.result()\n'
           '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
           '    raise self._exception\n'
           '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
           '    result = coro.send(None)\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 114, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}

mammoth-bear-12532

07/28/2022, 8:42 PM

Hey Joshua, when stateful_ingestion is turned on, we require a pipeline_name to be provided. This will soon be auto-generated by the UI, but for now you can add the pipeline_name to the yaml in your ingestion recipe

mammoth-bear-12532

07/28/2022, 8:42 PM

e.g.

cuddly-butcher-39945

07/28/2022, 8:43 PM

Thank You Shirshanka! Taking a look now

mammoth-bear-12532

07/28/2022, 8:44 PM

notice the pipeline_name is at the top-level

cuddly-butcher-39945

07/28/2022, 8:48 PM

Yes. I noticed there is not a placeholder in the form for a pipeline name. Thanks again.

7 Views

Open in Slack

Previous Next