orange-coat-2879
05/12/2022, 2:28 AMInstalling collected packages: tableauserverclient
Successfully installed tableauserverclient-0.18.0
ubuntu@ip-172-31-16-11:~$ datahub ingest -c /home/ubuntu/datahub/tableau.yml
[2022-05-12 02:07:40,616] INFO {datahub.cli.ingest_cli:96} - DataHub CLI ver sion: 0.8.34.1
[2022-05-12 02:07:40,722] ERROR {datahub.entrypoints:165} - tableau is disabl ed; try running: pip install 'acryl-datahub[tableau]'
[2022-05-12 02:07:40,722] INFO {datahub.entrypoints:176} - DataHub CLI versi on: 0.8.34.1 at /home/ubuntu/.local/lib/python3.8/site-packages/datahub/__init__ .py
[2022-05-12 02:07:40,722] INFO {datahub.entrypoints:179} - Python version: 3 .8.13 (default, Apr 19 2022, 02:32:06)
[GCC 11.2.0] at /usr/bin/python3.8 on Linux-5.15.0-1005-aws-x86_64-with-glibc2.3 5
[2022-05-12 02:07:40,722] INFO {datahub.entrypoints:182} - GMS config {'mode ls': {}, 'versions': {'linkedin/datahub': {'version': 'v0.8.34', 'commit': '5cce 3acddcb46443c748bf2eb0b1e5e53994d936'}}, 'managedIngestion': {'defaultCliVersion ': '0.8.34.1', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpa ctAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'noCode': 'true'}
fresh-napkin-5247
05/12/2022, 1:33 PMwonderful-dream-38059
05/30/2022, 9:44 PMDetect Deleted Entities
is currently not supported. What does this mean in practice?
My reading of the docs makes me think they just persist past deletion, and are never removed. If that is the case has anyone done any design work to allow removal of stale records post deletion? I'd be happy to help contribute if not.wonderful-dream-38059
06/13/2022, 12:22 PMfaint-advantage-18690
07/12/2022, 8:00 AMpurple-analyst-83660
07/18/2022, 7:53 AMcareful-insurance-60247
08/10/2022, 3:00 PMmagnificent-lawyer-97772
08/25/2022, 2:03 PMmodern-artist-55754
08/29/2022, 4:01 PMPublishedDatasourcesConnection
& CustomSQLTablesConnection
doesn’t have page_size
implemented like workbook. https://github.com/datahub-project/datahub/blob/7e15947a372f6f627f29f5a1c783383d49[…]daf6/metadata-ingestion/src/datahub/ingestion/source/tableau.py
• The workbooksConnection is little complex ( i have some complex workbook and even with page_size
=1, it still exceed the node limit), I think we can refactor the EmbeddedDatasourcesConnection
to a seperate call like PublishedDatasourcesConnection
(at least it seems to help with my issue, although i still have some issue that i haven’t worked out yet). https://github.com/datahub-project/datahub/blob/7e15947a372f6f627f29f5a1c783383d49[…]tadata-ingestion/src/datahub/ingestion/source/tableau_common.pywitty-butcher-82399
08/31/2022, 6:16 AMcuddly-butcher-39945
09/26/2022, 11:40 PMsource:
type: tableau
config:
# Coordinates
connect_uri: <https://tableautest/#/home>
site:
projects: ["HOSPITALS"]
# Credentials
token_name: JGTest
token_value: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# Options
ingest_tags: True
ingest_owner: True
default_schema_map:
mydatabase: public
anotherdatabase: anotherschema
sink:
type: datahub-rest
config:
server: '<http://datahub-gms:8080>'
I have read through the debug log, but have not really found anything meaningful other than the generic message at the bottom stating...
ConnectionError: HTTPConnectionPool(host='datahub-gms', port=8080): Max retries exceeded with url: /config (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f14da1cdf50>: Failed to establish a new connection: [Errno -2] Name or service not known'))
I've also attached my debug log, Thanks!gifted-diamond-19544
09/27/2022, 1:29 PMpage_size
to 1, as instructed in the docs, however I am still getting the error. So what I did was, instead of trying to ingest all the Tableau projects on the same pipeline, I create various pipeline with just a subset of the projects, and scheduled them with a few minutes offset. This seems to be working, however it is kinda of cumbersome. I think it would be great to add an option to the Tableau ingestion recipe that specifies a time interval between the extraction of each Tableau project. I have tried this using the Python emiter (basically I put a sleep statement between the extraction of each project), and this solved the problem. However, since I am not using the UI, I don’t see an easy way to achieve this. Does anyone have any solution for this problem, when making the ingestion vie the UI? Thank you!average-dusk-91249
09/30/2022, 4:46 PMdatabase.schema.table_name
. Separately Snowflake ingestion has a hierarchal structure of database > schema > table_name
. My assumption is that connecting the lineage between something like Tableau Dataset: mydb.myschema.mytable
and Snowflake Dataset: mydb > myschema > mytable
would not show up in lineage graphs automatically.
For Tableau ingestion, I did see a section for default_schema_map
in the YAML settings. Don't know if something would need to change there to make the connection between a scenario like this work.
source:
type: tableau
config:
ingest_owner: true
default_schema_map:
mydatabase: public
anotherdatabase: anotherschema
connect_uri: '<https://tableau.site.com>'
password: '${tableau_password}'
ingest_tags: true
username: tableau_username
projects: null
pipeline_name: 'blah_blah'
average-dusk-91249
10/03/2022, 6:12 PMfull-engineer-98290
10/24/2022, 9:31 PMchilly-truck-63841
10/26/2022, 9:30 PMorange-intern-2172
03/13/2023, 10:03 AMacoustic-quill-54426
03/13/2023, 4:11 PMacoustic-quill-54426
03/13/2023, 5:22 PMValidation error of type FieldUndefined: Field 'projectLuid' in type 'Workbook' is undefined @ 'workbooksConnection/nodes/projectLuid'
that affects all versions previous to 2022.3 https://help.tableau.com/current/api/metadata_api/en-us/docs/meta_api_release_notes.htmllively-jackal-83760
05/24/2023, 11:11 AMhappy-belgium-57206
06/06/2023, 2:08 PMminiature-painter-94073
07/10/2023, 1:10 PMnumerous-address-22061
07/13/2023, 12:50 AMnumerous-address-22061
07/15/2023, 12:32 AManalytics.analytics.table1
and it cant figure out how to connect that Custom SQL to the actual snowflake Table which is database1.analytics.table1
. How can I help it along here? Id rather it not map upstream at all than just guess and generate a dataset that sits in datahub. Ideally id get it to map back to its actual snowflake table (which is already in datahub).fast-xylophone-28117
08/02/2023, 7:28 PM"tableau-login": [
"Unable to login (check your Tableau connection and credentials): HTTPSConnectionPool(host='172.25.160.82', port=443): Max retries exceeded with url: /api/2.4/auth/signin (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:997)')))"
With the same user, We can log in from UI fine.
We also confirmed firewall is not blocking anything, the curl into the Tableau server IP address runs well from the datahub action pod.
On the other hand, when I use exact same recipe on back end and run cli based ingestion manually, I get pass this error and get something else (some charts, tags, and projects ingested while some failed with this error.
{
"error": "Unable to emit metadata to DataHub GMS: java.lang.RuntimeException: Unknown aspect browsePathsV2 for entity container",
"info": {
"exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
"message": "java.lang.RuntimeException: Unknown aspect browsePathsV2 for entity container",
"status": 500,
"id": "urn:li:container:00eafb6262a384f1fc4e9582f576ba3d"
}
}
numerous-address-22061
08/09/2023, 4:10 PMnumerous-address-22061
08/17/2023, 5:13 PM[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - [2023-08-17, 05:06:24 PDT] ERROR {datahub.entrypoints:199} - Command failed: 'NoneType' object has no attribute 'get'
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - Traceback (most recent call last):
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 186, in main
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - sys.exit(datahub(standalone_mode=False, **kwargs))
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - return self.main(*args, **kwargs)
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - rv = self.invoke(ctx)
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - return _process_result(sub_ctx.command.invoke(sub_ctx))
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - return _process_result(sub_ctx.command.invoke(sub_ctx))
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - return ctx.invoke(self.callback, **ctx.params)
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - return __callback(*args, **kwargs)
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - return f(get_current_context(), *args, **kwargs)
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 448, in wrapper
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - raise e
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 397, in wrapper
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - res = func(*args, **kwargs)
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - return func(ctx, *args, **kwargs)
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 198, in run
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - return future.result()
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 182, in run_ingestion_and_check_upgrade
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - ret = await ingestion_future
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 140, in run_pipeline_to_completion
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - raise e
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 132, in run_pipeline_to_completion
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - pipeline.run()
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 367, in run
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - for wu in itertools.islice(
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 119, in auto_stale_entity_removal
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - for wu in stream:
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 143, in auto_workunit_reporter
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - for wu in stream:
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 208, in auto_browse_path_v2
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - for urn, batch in _batch_workunits_by_urn(stream):
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 346, in _batch_workunits_by_urn
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - for wu in stream:
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 156, in auto_materialize_referenced_tags
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - for wu in stream:
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 70, in auto_status_aspect
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - for wu in stream:
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/tableau.py", line 2590, in get_workunits_internal
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - yield from self.emit_sheets()
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/tableau.py", line 2028, in emit_sheets
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - yield from self.emit_sheets_as_charts(
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/tableau.py", line 2107, in emit_sheets_as_charts
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - project_luid: Optional[str] = self._get_workbook_project_luid(workbook)
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/tableau.py", line 1438, in _get_workbook_project_luid
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - if wb.get(tableau_constant.LUID) and self.workbook_project_map.get(
[2023-08-17, 05:06:24 PDT] {{pod_manager.py:235}} INFO - AttributeError: 'NoneType' object has no attribute 'get
numerous-address-22061
08/17/2023, 5:14 PMworkbook_project_map
not being set to a value?brainy-musician-50192
08/22/2023, 8:54 AMextract_column_level_lineage (boolean, default: true):
When enabled, extracts column-level lineage from Tableau Datasources
Does this mean lineage between tableau data source and tableau chart, and not between external table/view and tableau data source? If so, are there any future plans to add column lineage between Snowflake and Tableau?strong-author-11562
08/31/2023, 9:45 PM