Hi Everyone! I am running the quickstart on an Ama...
# troubleshoot
c
Hi Everyone! I am running the quickstart on an Amazon Linux 2 T3 Large Instance. • DataHub CLI version: 0.8.41.2 • Python version: 3.7.10 (default, Jun 3 2021, 000201) • [GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] I've confirmed my Snowflake Credentials are correct, but keep getting this error on a manual ingestion.
Copy code
'Failed to configure source (snowflake) due to pipeline_name must be provided if stateful ingestion is enabled.\n',
I've never seen anything about a pipeline_name being needed. Please let me know if there is something you can think of to help me get my first ingestion to complete 🙂 More context around this error:
Copy code
'[2022-07-28 19:30:04,041] INFO     {datahub.cli.ingest_cli:99} - DataHub CLI version: 0.8.41\n'
           '[2022-07-28 19:30:04,091] INFO     {datahub.ingestion.run.pipeline:160} - Sink configured successfully. DataHubRestEmitter: configured '
           'to talk to <http://datahub-gms:8080>\n'
           '[2022-07-28 19:30:06,244] INFO     {datahub.ingestion.source_config.sql.snowflake:231} - using authenticator type '
           "'DEFAULT_AUTHENTICATOR'\n"
           '[2022-07-28 19:30:06,244] ERROR    {datahub.ingestion.run.pipeline:126} - pipeline_name must be provided if stateful ingestion is '
           'enabled.\n'
           '[2022-07-28 19:30:06,244] INFO     {datahub.cli.ingest_cli:115} - Starting metadata ingestion\n'
           '[2022-07-28 19:30:06,244] INFO     {datahub.cli.ingest_cli:133} - Finished metadata pipeline\n'
           '\n'
           'Failed to configure source (snowflake) due to pipeline_name must be provided if stateful ingestion is enabled.\n',
           "2022-07-28 19:30:08.102905 [exec_id=70cf693c-d13f-42a5-b0fd-04ca739e33b4] INFO: Failed to execute 'datahub ingest'",
           '2022-07-28 19:30:08.103382 [exec_id=70cf693c-d13f-42a5-b0fd-04ca739e33b4] INFO: Caught exception EXECUTING '
           'task_id=70cf693c-d13f-42a5-b0fd-04ca739e33b4, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
           '    self.event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
           '    return f.result()\n'
           '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
           '    raise self._exception\n'
           '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
           '    result = coro.send(None)\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 114, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
m
Hey Joshua, when stateful_ingestion is turned on, we require a pipeline_name to be provided. This will soon be auto-generated by the UI, but for now you can add the pipeline_name to the yaml in your ingestion recipe
e.g.
c
Thank You Shirshanka! Taking a look now
m
notice the pipeline_name is at the top-level
c
Yes. I noticed there is not a placeholder in the form for a pipeline name. Thanks again.