Hello there, Tell me if my query is not in the goo...
# integrate-tableau-datahub
a
Hello there, Tell me if my query is not in the good channel. I'm currently working on Tableau Online ingestion in Datahub v0.12.0 and I Have 2 issues first issue is : on a small part of Tableau Cloud site I am facing a problem with the ingestion. When the stateful_ingestion parameter is enabled, the ingestion fails with the error below:
Message: 'Failed to commit changes for DatahubIngestionCheckpointingProvider.'
When disabling stateful_ingestion parameter, the ingestion works successfully, but another problem appears. dashboards are not deleted from datahub if they are not anymore present in Tableau. I see this parameter remove_stale_metadata that allow to remove deleted dahsboard but it is linked to the stateful_ingestion. how can I solve it? second issue is : When trying to make a full ingestion of our Tableau cloud site, ingestion fails with this warning "embeddedDatasourcesConnection": [
"[{'locations': None, 'message': 'Showing partial results. The request exceeded the 20000 node limit. Use pagination, additional filtering, or both in the query to adjust results.', 'errorType': None, 'extensions': {'severity': 'WARNING', 'code': 'NODE_LIMIT_EXCEEDED', 'properties': {'nodeLimit': 20000}}, 'path': None}]",
and theses error rising up and appears in the beguining of the ingestion log file
{
"error": "Unable to emit metadata to DataHub GMS",
"info": {
"message": "502 Server Error: Bad Gateway for url: <https://datahub-gms>..../aspects?action=ingestProposal",
"id": "urn:li:chart:(tableau,...)"
}
},
{
"error": "Unable to emit metadata to DataHub GMS",
"info": {
"message": "403 Client Error: Forbidden for url: <https://datahub-gms>...../entities?action=ingest",
"id": "urn:li:chart:(tableau,...)"
}
}
(urn have been masked as the url of my datahub site) I don't have the exact time needed for ingestion but it takes at least 45 minuts. What can be the causes for theses errors and what actions can I do to try to correct theses errors?
h
Hi @able-artist-56392 for first issue `Failed to commit changes for DatahubIngestionCheckpointingProvider`: could you share the detailed error stack that follows the message? For the second issue `NODE_LIMIT_EXCEEDED`: could you change the ingestion config to use
page_size: 1
and retry ? For the last issue `Unable to emit metadata to DataHub GMS`: I don't know the answer but are you suggesting that this is for intermittent requests only and the rest of data gets ingested properly ?
a
Hello Mayuri 1 -here if the error Traceback (most recent call last): File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/datahub/emitter/rest_emitter.py", line 282, in _emit_generic response.raise_for_status() File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://datahub-gms.../aspects?action=ingestProposal The above exception was the direct cause of the following exception:
Copy code
Traceback (most recent call last):
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 491, in process_commits
    committable.commit()
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/datahub/ingestion/source/state_provider/datahub_ingestion_checkpointing_provider.py", line 126, in commit
    self.graph.emit_mcp(
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/datahub/emitter/rest_emitter.py", line 261, in emit_mcp
    self._emit_generic(url, payload)
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/datahub/emitter/rest_emitter.py", line 296, in _emit_generic
    raise OperationalError(
datahub.configuration.common.OperationalError: ('Unable to emit metadata to DataHub GMS', {'message': '403 Client Error: Forbidden for url: <https://datahub-gms>.../aspects?action=ingestProposal'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/logging/__init__.py", line 1100, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.10/logging/__init__.py", line 943, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.10/logging/__init__.py", line 678, in format
    record.message = record.getMessage()
  File "/usr/local/lib/python3.10/logging/__init__.py", line 368, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/bin/datahub", line 8, in <module>
    sys.exit(main())
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/datahub/entrypoints.py", line 188, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 397, in wrapper
    res = func(*args, **kwargs)
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
    return func(ctx, *args, **kwargs)
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 197, in run
    ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/usr/local/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 131, in run_pipeline_to_completion
    pipeline.run()
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 434, in run
    self.process_commits()
  File "/tmp/datahub/ingest/venv-tableau-94b47a253025c09f/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 493, in process_commits
    logger.error(f"Failed to commit changes for {name}.", e)
Message: 'Failed to commit changes for DatahubIngestionCheckpointingProvider.'
Arguments: (OperationalError('Unable to emit metadata to DataHub GMS', {'message': '403 Client Error: Forbidden for url: <https://datahub-gms>.../aspects?action=ingestProposal'}),)
tell me if you need more 2- ok I will try it and tell you 3- yes this is for some data only. I can see data into Tableau platform in datahub
h
The data you see in datahub may have been ingested by some earlier ingestion. It is possible that the current ingestion recipe's sink configuration is not correct. Can you share your ingestion recipe (secrets removed) and the ingestion report from logs ?
a
Hello Mayuri, thanks for your answer and sorry for the delay Here is the log life and the recipe file from the ingestion. they have been annonymised Creation of the recipe has been downloaded from the ingestion run details and the configuration has been made by using the Tableau Ingestion UI Hope it could help you
h
Hey @able-artist-56392 , could you try using a more recent cli version - say
0.12.1.5
or
0.13.0
using advanced managed ingestion configs and confirm if the issue still persists ?
a
Hello Mayuri, Thanks for the information unfortunately moving to v0.13.0 datahub cli don't change anything. However, I continue the checks for the 403 error and there is a firewall rule activated on the same time. we are trying to change it between datahub and datahub gms part I will keep you informed