Hi everyone, I’m trying to use datahub ingest with...
# troubleshoot
a
Hi everyone, I’m trying to use datahub ingest with tableau, and I meet trouble:
Internal Server Error(s) while executing query
Ingest log:
Copy code
[2023-05-08 08:17:12,202] INFO     {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.2.2
[2023-05-08 08:17:12,256] INFO     {datahub.ingestion.run.pipeline:204} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms.com> with token: eyJh**********SOTI
/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py:2271: ConfigurationWarning: projects is deprecated and will be removed in a future release. Please removeit from your config.
  config = TableauConfig.parse_obj(config_dict)
[2023-05-08 08:17:12,511] WARNING  {datahub.ingestion.source.tableau:342} - project_pattern is not set but projects is set. projects is deprecated, please use project_pattern instead.
[2023-05-08 08:17:12,511] INFO     {datahub.ingestion.source.tableau:345} - Initializing project_pattern from projects
[2023-05-08 08:17:12,842] INFO     {tableau.endpoint.auth:50} - Signed into <https://my-tableau.org> as user with id d6948785-5cc9-4c58-8d7f-675a4e4f168b
[2023-05-08 08:17:12,842] INFO     {datahub.ingestion.source.tableau:616} - Authenticated to Tableau server
[2023-05-08 08:17:12,842] INFO     {datahub.ingestion.run.pipeline:221} - Source configured successfully.
[2023-05-08 08:17:12,843] INFO     {datahub.cli.ingest_cli:129} - Starting metadata ingestion
-[2023-05-08 08:17:12,864] INFO     {datahub.ingestion.source.tableau:596} - Initializing site project registry
[2023-05-08 08:17:12,865] INFO     {tableau.endpoint.projects:31} - Querying all projects on site
2023-05-08 08:17:13,188] INFO     {datahub.ingestion.source.tableau:517} - project(xxxx) is not allowed as per project_pattern
2023-05-08 08:17:13,188] INFO     {datahub.ingestion.source.tableau:517} - project(xxxx) is not allowed as per project_pattern
2023-05-08 08:17:13,188] INFO     {datahub.ingestion.source.tableau:517} - project(xxxx) is not allowed as per project_pattern
......
[2023-05-08 08:17:13,199] INFO     {datahub.ingestion.source.tableau:517} - project(Paid Search) is not allowed as per project_pattern
[2023-05-08 08:17:13,200] INFO     {tableau.endpoint.datasources:84} - Querying all datasources on site
[2023-05-08 08:17:13,306] INFO     {tableau.endpoint.datasources:84} - Querying all datasources on site
|[2023-05-08 08:17:13,416] INFO     {tableau.endpoint.workbooks:74} - Querying all workbooks on site
[2023-05-08 08:17:13,562] INFO     {tableau.endpoint.workbooks:74} - Querying all workbooks on site
\[2023-05-08 08:17:13,694] INFO     {tableau.endpoint.workbooks:74} - Querying all workbooks on site
[2023-05-08 08:17:13,807] INFO     {tableau.endpoint.metadata:61} - Querying Metadata API
-[2023-05-08 08:17:13,877] ERROR    {datahub.ingestion.run.pipeline:409} - Caught error
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/run/pipeline.py", line 361, in run
    self.preview_workunits if self.preview_mode else None,
  File "/usr/local/lib/python3.7/dist-packages/datahub/utilities/source_helpers.py", line 91, in auto_stale_entity_removal
    for wu in stream:
  File "/usr/local/lib/python3.7/dist-packages/datahub/utilities/source_helpers.py", line 42, in auto_status_aspect
    for wu in stream:
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 2305, in get_workunits_internal
    yield from self.emit_workbooks()
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 738, in emit_workbooks
    page_size_override=self.config.workbook_page_size,
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 718, in get_connection_objects
    offset,
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 676, in get_connection_object_page
    raise RuntimeError(f"Query {connection_type} error: {errors}")
RuntimeError: Query workbooksConnection error: [{'message': 'Internal Server Error(s) while executing query', 'extensions': None, 'path': None}]
[2023-05-08 08:17:13,895] INFO     {datahub.cli.ingest_cli:135} - Source (tableau) report:
{'aspects': {'container': {'containerProperties': 1, 'dataPlatformInstance': 1, 'status': 1, 'subTypes': 1}},
 'entities': {'container': ['urn:li:container:c6e27b6a2acce0003bc944ba693553f5']},
 'events_produced': 4,
 'events_produced_per_sec': 2,
 'failures': {},
 'running_time': '1.36 seconds',
 'soft_deleted_stale_entities': [],
 'start_time': '2023-05-08 08:17:12.531954 (1.36 seconds ago)',
 'warnings': {}}
[2023-05-08 08:17:13,895] INFO     {datahub.cli.ingest_cli:138} - Sink (datahub-rest) report:
{'current_time': '2023-05-08 08:17:13.895275 (now)',
 'failures': [],
 'gms_version': 'v0.9.5',
 'pending_requests': 0,
 'records_written_per_second': 2,
 'start_time': '2023-05-08 08:17:12.249157 (1.65 seconds ago)',
 'total_duration_in_seconds': 1.65,
 'total_records_written': 4,
 'warnings': []}
[2023-05-08 08:17:14,269] ERROR    {datahub.entrypoints:195} - Command failed: Query workbooksConnection error: [{'message': 'Internal Server Error(s) while executing query', 'extensions': None, 'path': None}]
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/datahub/entrypoints.py", line 182, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
    raise e
  File "/usr/local/lib/python3.7/dist-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
    return func(ctx, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datahub/cli/ingest_cli.py", line 198, in run
    loop.run_until_complete(run_func_check_upgrade(pipeline))
  File "/usr/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.7/dist-packages/datahub/cli/ingest_cli.py", line 158, in run_func_check_upgrade
    ret = await the_one_future
  File "/usr/local/lib/python3.7/dist-packages/datahub/cli/ingest_cli.py", line 150, in run_pipeline_async
    None, functools.partial(run_pipeline_to_completion, pipeline)
  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datahub/cli/ingest_cli.py", line 140, in run_pipeline_to_completion
    raise e
  File "/usr/local/lib/python3.7/dist-packages/datahub/cli/ingest_cli.py", line 132, in run_pipeline_to_completion
    pipeline.run()
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/run/pipeline.py", line 361, in run
    self.preview_workunits if self.preview_mode else None,
  File "/usr/local/lib/python3.7/dist-packages/datahub/utilities/source_helpers.py", line 91, in auto_stale_entity_removal
    for wu in stream:
  File "/usr/local/lib/python3.7/dist-packages/datahub/utilities/source_helpers.py", line 42, in auto_status_aspect
    for wu in stream:
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 2305, in get_workunits_internal
    yield from self.emit_workbooks()
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 738, in emit_workbooks
    page_size_override=self.config.workbook_page_size,
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 718, in get_connection_objects
    offset,
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 676, in get_connection_object_page
    raise RuntimeError(f"Query {connection_type} error: {errors}")
RuntimeError: Query workbooksConnection error: [{'message': 'Internal Server Error(s) while executing query', 'extensions': None, 'path': None}]
tableau ingest yaml
Copy code
# tableau
source:
    type: tableau
    config:
        connect_uri: '${TABLEAU_ADDRESS}'
        # site:
        platform_instance: acryl_instance
        # project_pattern:
        project_pattern: ["^default$", "^Project 2$", "^/Project A/Nested Project B$"]
        # projects: ["^default$", "^Project 2$", "^/Project A/Nested Project B$"]

        username: '${TABLEAU_USER}'
        password: '${TABLEAU_PASSWD}'

        page_size: 10

        ingest_tags: True
        ingest_owner: True
        stateful_ingestion:
            enabled: True
            remove_stale_metadata: true
and my datahub version: 0.10.0.7 tablue version: 2022.3.1 I want to figure out why this problem occurred and how to solve it. In fact, my Tableau service went through a version upgrade and was ingesting normally before the upgrade. Thank u very much!
l
Hey there 👋 I'm The DataHub Community Support bot. I'm here to help make sure the community can best support you with your request. Let's double check a few things first: 1️⃣ There's a lot of good information on our docs site: www.datahubproject.io/docs, Have you searched there for a solution? Yes button 2️⃣ It's not uncommon that someone has run into your exact problem before in the community. Have you searched Slack for similar issues? Yes button
m
It looks like the error comes from Tableau server.
a
Has anyone encountered a similar issue or any suggestions on how to troubleshoot it? In fact, viewing logs and troubleshooting issues in Tableau can be quite challenging, and I am not even sure what the error is. 😫
m
Can you run in debug mode? If you can somehow grab the graphql query, it is probably easier to debug.
a
Thanks for the suggestion @modern-artist-55754, but may I ask how to run in debug mode, via datahub action?
m
Are you using cli? Usually i just datahub ingest --debug (i think, you can try datahub --help in the cli)
a
After I enabled Debug and tried to ingest again, I discovered these peculiar logs.
Copy code
[2023-05-08 11:28:04,888] DEBUG    {datahub.ingestion.source.tableau:607} - Tableau data-sources {}
[2023-05-08 11:28:04,888] DEBUG    {datahub.ingestion.source.tableau:610} - Tableau workbooks {}
[2023-05-08 11:28:04,889] DEBUG    {datahub.ingestion.run.pipeline:58} -  sink wrote workunit urn:li:container:c6e27b6a2acce0003bc944ba693553f5-contai
nerProperties
[2023-05-08 11:28:04,889] DEBUG    {datahub.ingestion.run.pipeline:58} -  sink wrote workunit urn:li:container:c6e27b6a2acce0003bc944ba693553f5-status
[2023-05-08 11:28:04,889] DEBUG    {datahub.ingestion.run.pipeline:58} -  sink wrote workunit urn:li:container:c6e27b6a2acce0003bc944ba693553f5-dataPl
atformInstance
[2023-05-08 11:28:04,889] DEBUG    {datahub.ingestion.run.pipeline:58} -  sink wrote workunit urn:li:container:c6e27b6a2acce0003bc944ba693553f5-subTyp
es
[2023-05-08 11:28:04,890] DEBUG    {datahub.ingestion.source.tableau:644} - Query workbooksConnection to get 1 objects with offset 0
[2023-05-08 11:28:04,957] ERROR    {datahub.ingestion.run.pipeline:409} - Caught error
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/run/pipeline.py", line 361, in run
    self.preview_workunits if self.preview_mode else None,
  File "/usr/local/lib/python3.7/dist-packages/datahub/utilities/source_helpers.py", line 91, in auto_stale_entity_removal
    for wu in stream:
  File "/usr/local/lib/python3.7/dist-packages/datahub/utilities/source_helpers.py", line 42, in auto_status_aspect
    for wu in stream:
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 2305, in get_workunits_internal
    yield from self.emit_workbooks()
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 738, in emit_workbooks
    page_size_override=self.config.workbook_page_size,
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 718, in get_connection_objects
    offset,
  File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 676, in get_connection_object_page
    raise RuntimeError(f"Query {connection_type} error: {errors}")
RuntimeError: Query workbooksConnection error: [{'message': 'Internal Server Error(s) while executing query', 'extensions': None, 'path': None}]
Read in the source code of tableau.py, this means that the datasource_project_map and workbook_project_map maps are empty. And the error stack trace seems to indicate a problem with obtaining the connection object, with the code pointing to the parameters workbook_page_size and offset. What are the functions of these parameters, and do you have any suggestions?
m
Hm i think you need to put
site
parameter in yiur recipe. You will find site is the first thing in your tableau url. Otherwise it connects to site called "Default", which might not be the setup in your organisation. If you dont mind, can you paste the url of a workbook that you copied from your browser here? You blur out your domain and any thing sensitive.
a
Sure, its my table setting page url:
<https://tableau.selfhost.org/#/server/settings>
, and hers is my site setting & webpage screenshot of
<https://tableau.selfhost.org/#/site/>
Moreover, after adding
site: Default
in the recipe file and re-executing the ingest command, the error message
Copy code
Unable to login (invalid/expired credentials or missing permissions)
occurred.
When I added the ‘site’ parameter, the system showed that I couldn’t log in. When I removed the ‘site’ parameter, the system showed an internal error, at this point, using the debug mode was able to print out the project information inside Tableau (I guess that it means I was able to log in to Tableau normally), but I still cannot determine where the problem is. This is really difficult for me!
m
@acoustic-kite-241 is there any workbook in the Default site? Where are all your normal workbooks located?
a
is there any workbook in the Default site
Yup, click the default site, webpage turn into
<https://tableau.selfhost.org/#/projects/43>
and there are many projects.
Where are all your normal workbooks located?
Just in
<https://tableau.selfhost.org/#/projects/43>
i guess…Is that all right? my friend…
m
@acoustic-kite-241 which privileges do the user have? I have never tried tableau with user name and password. I have only used token based authentication.
In your deployment of tableau you can access to the graphql editor on the webbrowser, try to run the queries and see if there are anything abnormal
a
which privileges do the user have?
my user is admin. Thank you, Steve. I will try to execute the SQL and let you know the result.
Hi Steve, I checked the documentation about Tableau graphql, but I don’t know how to convert workbook_graphql_query to graphiQL , it doesn’t execute directly.