wooden-jackal-88380
09/14/2022, 11:06 AMsquare-bird-94136
09/14/2022, 12:48 PMsource:
type: "bigquery"
config:
project_id: ""
credential:
project_id: ""
private_key_id: ""
private_key: ""
client_email: ""
client_id: ""
profiling:
enabled: false
include_table_lineage: false
start_time: "2020-03-01T00:00:00Z"
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
Script output:
Source (bigquery) report:
{'entities_profiled': '0',
'event_ids': ['container-info...',
'container-platforminstance...',
'container-subtypes-...'],
'events_produced': '3',
'events_produced_per_sec': '2',
'failures': {},
'filtered': [],
'include_table_lineage': 'False',
'invalid_partition_ids': {},
'log_page_size': '1000',
'partition_info': {},
'profile_table_selection_criteria': {},
'running_time': '1.26 seconds',
'selected_profile_tables': {},
'soft_deleted_stale_entities': [],
'start_time': '2022-09-14 14:47:16.617896 (1.26 seconds ago).',
'table_metadata': {},
'tables_scanned': '0',
'upstream_lineage': {},
'use_date_sharded_audit_log_tables': 'False',
'use_exported_bigquery_audit_metadata': 'False',
'use_v2_audit_metadata': 'False',
'views_scanned': '0',
'warnings': {},
'window_end_time': '2022-09-14 12:47:16.343255+00:00 (1.53 seconds ago).',
'window_start_time': '2020-03-01 00:00:00+00:00 (2 years, 28 weeks and 3 days ago).'}
Sink (datahub-rest) report:
{'current_time': '2022-09-14 14:47:17.875747 (now).',
'failures': [],
'gms_version': 'v0.8.44',
'pending_requests': '0',
'records_written_per_second': '0',
'start_time': '2022-09-14 14:46:55.391435 (22.48 seconds ago).',
'total_duration_in_seconds': '22.48',
'total_records_written': '3',
'warnings': []}
quiet-smartphone-60119
09/14/2022, 2:53 PMsalmon-angle-92685
09/14/2022, 3:15 PMbillions-zebra-46597
09/14/2022, 6:04 PMquiet-school-18370
09/14/2022, 7:10 PMable-evening-90828
09/14/2022, 7:15 PMcool-boots-36947
09/14/2022, 7:39 PM'ProgrammingError: (snowflake.connector.errors.ProgrammingError) 090105 (22000): Cannot perform SELECT. This session does not have a '
"current database. Call 'USE DATABASE', or use a qualified name.\n"
'[SQL: \n'
'select table_catalog, table_schema, table_name\n'
'from information_schema.tables\n'
"where last_altered >= to_timestamp_ltz(1663086530849, 3) and table_type= 'BASE TABLE'\n"
' ]\n'
'(Background on this error at: <http://sqlalche.me/e/13/f405>)\n'
'[2022-09-14 16:28:52,024] INFO {datahub.entrypoints:187} - DataHub CLI version: 0.8.41 at '
'/tmp/datahub/ingest/venv-snowflake-0.8.41/lib/python3.9/site-packages/datahub/__init__.py\n'
'[2022-09-14 16:28:52,024] INFO {datahub.entrypoints:190} - Python version: 3.9.9 (main, Dec 21 2021, 10:03:34) \n'
'[GCC 10.2.1 20210110] at /tmp/datahub/ingest/venv-snowflake-0.8.41/bin/python3 on '
'Linux-5.4.196-108.356.amzn2.x86_64-x86_64-with-glibc2.31\n'
"[2022-09-14 16:28:52,024] INFO {datahub.entrypoints:193} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': "
"'v0.8.42', 'commit': '4f35a6c43dcd058e4e85b1ed7e4818100ab224e0'}}, 'managedIngestion': {'defaultCliVersion': '0.8.41', 'enabled': True}, "
"'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, "
"'datasetUrnNameCasing': False, 'retention': 'true', 'datahub': {'serverType': 'prod'}, 'noCode': 'true'}\n",
"2022-09-14 16:28:53.137401 [exec_id=2dc5382a-f673-489f-b9bf-4cf1328b7bf7] INFO: Failed to execute 'datahub ingest'",
'2022-09-14 16:28:53.137719 [exec_id=2dc5382a-f673-489f-b9bf-4cf1328b7bf7] INFO: Caught exception EXECUTING '
'task_id=2dc5382a-f673-489f-b9bf-4cf1328b7bf7, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 112, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
Here is recipe.
source:
type: snowflake
config:
username: xxxxx
password: xxxxx
role: xxxx
warehouse: xxxxx
check_role_grants: true
account_id: xxxxx
include_table_lineage: true
include_view_lineage: true
ignore_start_time_lineage: true
upstream_lineage_in_report: true
profiling:
enabled: true
stateful_ingestion:
enabled: true
database_pattern:
allow:
- SNOWFLAKE
schema_pattern:
allow:
- ACCOUNT_USAGE
pipeline_name: 'urnlidataHubIngestionSource:xxxxxxxxxxxxxxxxxxxxxxxxxxx'salmon-angle-92685
09/14/2022, 2:38 PMproud-table-38689
09/14/2022, 8:38 PM~~~~ Execution Summary ~~~~
RUN_INGEST - {'errors': [],
'exec_id': '194692c4-b85b-4915-afc4-f1ef0f7b7a1b',
'infos': ['2022-09-14 20:36:52.270206 [exec_id=194692c4-b85b-4915-afc4-f1ef0f7b7a1b] INFO: Starting execution for task with name=RUN_INGEST',
'2022-09-14 20:36:52.270619 [exec_id=194692c4-b85b-4915-afc4-f1ef0f7b7a1b] INFO: Caught exception EXECUTING '
'task_id=194692c4-b85b-4915-afc4-f1ef0f7b7a1b, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 121, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 71, in execute\n'
' validated_args = SubProcessIngestionTaskArgs.parse_obj(args)\n'
' File "pydantic/main.py", line 521, in pydantic.main.BaseModel.parse_obj\n'
' File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__\n'
'pydantic.error_wrappers.ValidationError: 1 validation error for SubProcessIngestionTaskArgs\n'
'debug_mode\n'
' extra fields not permitted (type=value_error.extra)\n']}
Execution finished with errors.
rough-activity-61346
09/15/2022, 2:12 AMgifted-knife-16120
09/15/2022, 4:03 AMproud-table-38689
09/15/2022, 5:11 AMlimited-forest-73733
09/15/2022, 7:55 AMbetter-dinner-64431
09/15/2022, 9:12 AMsalmon-angle-92685
09/15/2022, 9:49 AMredshift-usage
and redshift
. Can we use both when ingesting in order to have stats and tables metadata ? Or it's only one or other ?
Thank you in advance !salmon-angle-92685
09/15/2022, 11:56 AMsnowflake-beta
is of gathering feature both from snowflake
and snowflake-usage
? If I use the two combined is the same of using the -beta
one ?
I am asking because since the last one is in "beta" version, I am not so sure if I will implement it already on the company.
Thanks !square-bird-94136
09/15/2022, 12:27 PM~~~~ Execution Summary ~~~~
RUN_INGEST - {'errors': [],
'exec_id': 'fe59f987-686e-4078-8f83-eb1ddf63fc2f',
'infos': ['2022-09-15 12:22:12.253433 [exec_id=fe59f987-686e-4078-8f83-eb1ddf63fc2f] INFO: Starting execution for task with name=RUN_INGEST',
'2022-09-15 12:22:48.478154 [exec_id=fe59f987-686e-4078-8f83-eb1ddf63fc2f] INFO: stdout=venv setup time = 0\n'
'This version of datahub supports report-to functionality\n'
'datahub ingest run -c /tmp/datahub/ingest/fe59f987-686e-4078-8f83-eb1ddf63fc2f/recipe.yml --report-to '
'/tmp/datahub/ingest/fe59f987-686e-4078-8f83-eb1ddf63fc2f/ingestion_report.json\n'
'[2022-09-15 12:22:34,221] INFO {datahub.cli.ingest_cli:182} - DataHub CLI version: 0.8.44.2\n'
'[2022-09-15 12:22:34,243] INFO {datahub.ingestion.run.pipeline:175} - Sink configured successfully. DataHubRestEmitter: configured '
'to talk to <http://datahub-datahub-gms:8080>\n'
'[2022-09-15 12:22:46,903] ERROR {datahub.entrypoints:192} - \n'
'Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 196, in __init__\n'
' self.source: Source = source_class.create(\n'
' File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/sql/bigquery.py", line 989, in create\n'
' config = BigQueryConfig.parse_obj(config_dict)\n'
' File "pydantic/main.py", line 521, in pydantic.main.BaseModel.parse_obj\n'
' File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source_config/sql/bigquery.py", line 69, in __init__\n'
' super().__init__(**data)\n'
' File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__\n'
'pydantic.error_wrappers.ValidationError: 1 validation error for BigQueryConfig\n'
'include_view_lineage\n'
' extra fields not permitted (type=value_error.extra)\n'
'\n'
'The above exception was the direct cause of the following exception:\n'
'\n'
'Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 197, in run\n'
' pipeline = Pipeline.create(\n'
' File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 317, in create\n'
' return cls(\n'
' File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 202, in __init__\n'
' self._record_initialization_failure(\n'
' File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 129, in _record_initialization_failure\n'
' raise PipelineInitError(msg) from e\n'
'datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure source (bigquery)\n'
'[2022-09-15 12:22:46,903] ERROR {datahub.entrypoints:195} - Command failed: \n'
'\tFailed to configure source (bigquery) due to \n'
"\t\t'1 validation error for BigQueryConfig\n"
'include_view_lineage\n'
" extra fields not permitted (type=value_error.extra)'.\n"
'\tRun with --debug to get full stacktrace.\n'
"\te.g. 'datahub --debug ingest run -c /tmp/datahub/ingest/fe59f987-686e-4078-8f83-eb1ddf63fc2f/recipe.yml --report-to "
"/tmp/datahub/ingest/fe59f987-686e-4078-8f83-eb1ddf63fc2f/ingestion_report.json'\n",
"2022-09-15 12:22:48.478380 [exec_id=fe59f987-686e-4078-8f83-eb1ddf63fc2f] INFO: Failed to execute 'datahub ingest'",
'2022-09-15 12:22:48.478596 [exec_id=fe59f987-686e-4078-8f83-eb1ddf63fc2f] INFO: Caught exception EXECUTING '
'task_id=fe59f987-686e-4078-8f83-eb1ddf63fc2f, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 168, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
alert-fall-82501
09/15/2022, 1:43 PMmicroscopic-table-21578
09/15/2022, 2:03 PMbumpy-journalist-41369
09/15/2022, 3:08 PMgifted-barista-13026
09/15/2022, 3:21 PMgreen-lion-58215
09/15/2022, 5:59 PMflat-painter-78331
09/15/2022, 6:52 PMagreeable-farmer-44067
09/15/2022, 7:42 PMbrainy-table-99728
09/15/2022, 1:52 PMgreat-account-95406
09/15/2022, 2:16 PMbusy-dream-34673
09/16/2022, 11:22 AMproud-table-38689
09/16/2022, 6:29 PM<https://my-datahub-url/aspects?action=ingestProposal>
does not exist. Is there anything I need to install within DataHub for this to work?proud-table-38689
09/16/2022, 7:37 PM