clean-coat-28016
03/31/2022, 12:41 AMincalculable-forest-10734
03/31/2022, 3:15 AMnumerous-morning-88512
03/31/2022, 8:52 AMshy-fireman-88724
03/31/2022, 8:27 PMspark.sql()
and writes the data into another hive table. Even though the lineage appears, it has wrong names in the components, in the source it shows the S3 location and in the spark job it shows the method name as you can see in the image bellow. We expected to appear the schema_name.table_name
instead of the S3 location. Is there something more we can configure?
Another question: is the demo source code available somewhere?cold-hydrogen-10513
04/01/2022, 10:13 AM'[2022-03-30 16:24:11,842] ERROR {datahub.entrypoints:152} - File '
'"/tmp/datahub/ingest/venv-a8e48815-7f1f-4468-958c-3c2b1fcbf48e/lib/python3.9/site-packages/datahub/entrypoints.py", line 138, in main\n'
' 135 def main(**kwargs):\n'
' 136 # This wrapper prevents click from suppressing errors.\n'
' 137 try:\n'
'--> 138 sys.exit(datahub(standalone_mode=False, **kwargs))\n'
' 139 except click.exceptions.Abort:\n'
' ..................................................\n'
' kwargs = {}\n'
' datahub = <Group datahub>\n'
" click.exceptions.Abort = <class 'click.exceptions.Abort'>\n"
' ..................................................\n'
'\n'
'File "/tmp/datahub/ingest/venv-a8e48815-7f1f-4468-958c-3c2b1fcbf48e/lib/python3.9/site-packages/click/core.py", line 1130, in __call__\n'
' 1128 def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any:\n'
' (...)\n'
'--> 1130 return self.main(*args, **kwargs)\n'
' ..................................................\n'
' self = <Group datahub>\n'
' args = ()\n'
' t.Any = typing.Any\n'
" kwargs = {'standalone_mode': False,\n"
" 'prog_name': 'python3 -m datahub'}\n"
' ..................................................\n'
'\n'
'File "/tmp/datahub/ingest/venv-a8e48815-7f1f-4468-958c-3c2b1fcbf48e/lib/python3.9/site-packages/click/core.py", line 1055, in main\n'
' rv = self.invoke(ctx)\n'
'File "/tmp/datahub/ingest/venv-a8e48815-7f1f-4468-958c-3c2b1fcbf48e/lib/python3.9/site-packages/click/core.py", line 1657, in invoke\n'
' return _process_result(sub_ctx.command.invoke(sub_ctx))\n'
........................
'File "/tmp/datahub/ingest/venv-a8e48815-7f1f-4468-958c-3c2b1fcbf48e/lib/python3.9/site-packages/snowflake/sqlalchemy/snowdialect.py", '
'line 573, in <listcomp>\n'
' return [self.normalize_name(row[1]) for row in cursor]\n'
'File "/tmp/datahub/ingest/venv-a8e48815-7f1f-4468-958c-3c2b1fcbf48e/lib/python3.9/site-packages/snowflake/sqlalchemy/snowdialect.py", '
'line 204, in normalize_name\n'
' if name.upper() == name and not self.identifier_preparer._requires_quotes(name.lower()):\n'
'File "/tmp/datahub/ingest/venv-a8e48815-7f1f-4468-958c-3c2b1fcbf48e/lib/python3.9/site-packages/sqlalchemy/sql/compiler.py", line 3613, '
'in _requires_quotes\n'
' or value[0] in self.illegal_initial_characters\n'
'\n'
'IndexError: string index out of range\n'
'[2022-03-30 16:24:11,842] INFO {datahub.entrypoints:161} - DataHub CLI version: 0.8.31 at '
'/tmp/datahub/ingest/venv-a8e48815-7f1f-4468-958c-3c2b1fcbf48e/lib/python3.9/site-packages/datahub/__init__.py\n'
'[2022-03-30 16:24:11,842] INFO {datahub.entrypoints:164} - Python version: 3.9.9 (main, Dec 21 2021, 10:03:34) \n'
'[GCC 10.2.1 20210110] at /tmp/datahub/ingest/venv-a8e48815-7f1f-4468-958c-3c2b1fcbf48e/bin/python3 on '
'Linux-5.4.176-91.338.amzn2.x86_64-x86_64-with-glibc2.31\n'
"[2022-03-30 16:24:11,842] INFO {datahub.entrypoints:167} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': "
"'v0.8.31', 'commit': '2f078c981c86b72145eebf621230ffd445948ef6'}}, 'managedIngestion': {'defaultCliVersion': '0.8.31', 'enabled': True}, "
"'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, "
"'retention': 'true', 'noCode': 'true'}\n",
"2022-03-30 16:24:14.167925 [exec_id=a8e48815-7f1f-4468-958c-3c2b1fcbf48e] INFO: Failed to execute 'datahub ingest'",
'2022-03-30 16:24:14.168306 [exec_id=a8e48815-7f1f-4468-958c-3c2b1fcbf48e] INFO: Caught exception EXECUTING '
'task_id=a8e48815-7f1f-4468-958c-3c2b1fcbf48e, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
chilly-oil-22683
04/01/2022, 11:45 AMdata catalog
, database
, table
and view
. So what do you mean by schema
here?chilly-oil-22683
04/02/2022, 9:26 AMswift-breakfast-25077
04/02/2022, 11:22 AMhandsome-minister-84652
04/03/2022, 11:19 PMmammoth-fountain-32989
04/04/2022, 9:43 AMmost-waiter-95820
04/04/2022, 10:56 AMfew-grass-66826
04/04/2022, 3:03 PMquaint-window-7517
04/05/2022, 6:12 AMbrave-market-65632
04/05/2022, 6:41 AMinclude_table_lineage: False
works fine. However, when set to `True`the ingestion logs report the following error.
[2022-04-05 11:55:31,989] WARNING {snowflake.connector.vendored.urllib3.connectionpool:780} - Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))':
When I inspected the logs in datahub_datahub-actions_1
container, I see the following error.
[2022-04-05 06:22:33,752] ERROR {acryl_action_fwk.source.datahub_streaming:279} - ERROR
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/acryl_action_fwk/source/datahub_streaming.py", line 268, in _handle_mae
match=a.subscriptions()[0],
IndexError: list index out of range
On Snowflake history, I see that the metadata lineage query ran fine and returned ~495K records. Any help is appreciated. Thanks.nutritious-bird-45843
04/05/2022, 6:49 PMv0.8.32
, so we are doing some local tests, before deploying to stage and production environments. However, we are facing some issues regarding data indexing. For instance, Kafka topics appears inside Datasets
and also if we query using the search bar. However, clicking on Kafka connector, we receive No results found for ""
as if the query is searching for nothing. The same occurs for Hive connector.
The first image shows the home page and the second shows the issue that happens after clicking on Kafka.
One thing worth mentioning is that currently we have the same Kafka and Hive metadata ingested on other envs for Datahub version 28
and they retrieve the metadata when we click on the connectors.
Thanks in advance!plain-farmer-27314
04/05/2022, 6:58 PMplain-baker-30549
04/06/2022, 6:18 AMnutritious-bird-77396
04/06/2022, 2:42 PMdatahub ingest -c <recipe.yaml>
in debug mode in local?
Is there an additional option you pass in the command line?plain-farmer-27314
04/06/2022, 2:45 PMbillowy-flag-4217
04/06/2022, 3:58 PMacryl-datahub=0.8.31.4
and python=3.8
when attempting to ingest Looker metadata I get the following error.
TypeError: You should use `typing_extensions.TypedDict` instead of `typing.TypedDict` with Python < 3.9.2. Without it, there is no way to differentiate required and optional fields when subclassed.
Is it a requirement now to use python 3.9.2 for Looker ingestion, or is there another workaround?mysterious-lamp-91034
04/06/2022, 7:46 PM./gradlew :metadata-ingestion:testQuick
on v0.8.32
I am seeing
=========================== short test summary info ============================
FAILED tests/integration/looker/test_looker.py::test_looker_ingest - TypeErro...
FAILED tests/integration/looker/test_looker.py::test_looker_ingest_allow_pattern
FAILED tests/integration/lookml/test_lookml.py::test_lookml_ingest - TypeErro...
FAILED tests/integration/lookml/test_lookml.py::test_lookml_ingest_offline - ...
FAILED tests/integration/lookml/test_lookml.py::test_lookml_ingest_offline_platform_instance
FAILED tests/integration/lookml/test_lookml.py::test_lookml_ingest_api_bigquery
FAILED tests/integration/lookml/test_lookml.py::test_lookml_ingest_api_hive
FAILED tests/integration/lookml/test_lookml.py::test_lookml_bad_sql_parser - ...
FAILED tests/integration/lookml/test_lookml.py::test_lookml_github_info - Typ...
FAILED tests/integration/s3/test_s3.py::test_data_lake_local_ingest[folder_no_partition.json]
FAILED tests/integration/s3/test_s3.py::test_data_lake_local_ingest[folder_no_partition_exclude.json]
FAILED tests/integration/s3/test_s3.py::test_data_lake_local_ingest[folder_no_partition_filename.json]
FAILED tests/integration/s3/test_s3.py::test_data_lake_local_ingest[folder_no_partition_glob.json]
FAILED tests/integration/s3/test_s3.py::test_data_lake_local_ingest[folder_partition_basic.json]
FAILED tests/integration/s3/test_s3.py::test_data_lake_local_ingest[folder_partition_keyval.json]
FAILED tests/integration/s3/test_s3.py::test_data_lake_local_ingest[multiple_files.json]
FAILED tests/integration/s3/test_s3.py::test_data_lake_local_ingest[single_file.json]
==== 17 failed, 325 passed, 52 deselected, 30 warnings in 60.13s (0:01:00) =====
billions-twilight-48559
04/06/2022, 9:11 PMorange-coat-2879
04/06/2022, 10:01 PMlocalhost:1433
does not work for me. I have attached my recipe here. Anyone can help? I am not sure if I should place a real URL (http://........) in the host_port. Thanks for helping!thousands-room-91010
04/07/2022, 3:10 AMable-rain-74449
04/07/2022, 2:12 PMConfigurationError: datahub-kafka is disabled; try running: pip install 'acryl-datahub[datahub-kafka]'
datahub ingest -c example_to_datahub_kafka.yml --dry-run
see thread.
tried running pip install 'acryl-datahub[datahub-kafka]'
get
88a6420d827a/src/confluent_kafka/src/confluent_kafka.h:23:10: fatal error: 'librdkafka/rdkafka.h' file not found
#include <librdkafka/rdkafka.h>
^~~~~~~~~~~~~~~~~~~~~~
1 error generated.
error: command '/usr/bin/clang' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> confluent-kafka
note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
able-rain-74449
04/07/2022, 4:02 PMbrave-forest-5974
04/07/2022, 5:13 PMhandsome-football-66174
04/07/2022, 6:47 PMnumerous-eve-42142
04/07/2022, 7:48 PMtable_pattern:
allow:
- "db.schema.carrier"
...
profile_pattern:
allow:
- "db.schema.carrier"`
But this exemples ingest the tables:
• carrier
• carrier_damage
• carrieir_tower_team
There is some whey to strictly specify the tables i want?icy-piano-35127
04/07/2022, 8:05 PM