brave-secretary-27487
10/13/2022, 12:16 PMbrave-secretary-27487
10/13/2022, 12:22 PMbulky-electrician-72362
10/13/2022, 12:33 PMdazzling-judge-80093
10/13/2022, 2:26 PMlittle-megabyte-1074
brave-secretary-27487
10/14/2022, 8:44 AMbrave-secretary-27487
10/14/2022, 8:45 AMbrave-secretary-27487
10/14/2022, 8:45 AMbrave-secretary-27487
10/14/2022, 8:52 AMdatahub.configuration.common.PipelineExecutionError: ('Source reported errors', BigQueryV2Report(events_produced=78, events_produced_per_sec=0, event_ids=['container-platforminstance-redacted-data-model-urn:li:container:41f24366a1728c03a192682b167e0310', 'container-subtypes-redacted-data-model-urn:li:container:41f24366a1728c03a192682b167e0310', 'container-subtypes-backoffice-urn:li:container:b15642ba529f5f3d02b15ddaaded692b', 'container-parent-container-backoffice-urn:li:container:b15642ba529f5f3d02b15ddaaded692b-urn:li:container:41f24366a1728c03a192682b167e0310', 'container-subtypes-bikepassport-urn:li:container:5641ea87d07db4da30527f1615bc75b8', 'container-parent-container-bikepassport-urn:li:container:5641ea87d07db4da30527f1615bc75b8-urn:li:container:41f24366a1728c03a192682b167e0310', 'container-platforminstance-dm_bike-urn:li:container:0b55a94dc64c5f8506592fd06e44841e', 'status-for-urn:li:container:21a9e98f22278215ac20d948e1971113', 'status-for-urn:li:container:21a9e98f22278215ac20d948e1971113', 'container-platforminstance-odoo-urn:li:container:eaa47f0af95e6bca81354fa024789d40', '... sampled of 78 total elements'], warnings={}, failures={'Stateful Ingestion': ['Fail safe mode triggered, entity difference percent:100.0 > fail_safe_threshold:{self.stateful_ingestion_config.fail_safe_threshold}']}, soft_deleted_stale_entities=[], tables_scanned=0, views_scanned=6, entities_profiled=0, filtered=['backoffice-datadump', 'odoo-aws-replication', 'rider-recap', 'auditlog_dataset.*', 'vm-datawarehouse-p', 'vm-datawarehouse-t'], query_combiner=None, num_total_lineage_entries={'redacted-data-model': 366}, num_skipped_lineage_entries_missing_data={'redacted-data-model': 30}, num_skipped_lineage_entries_not_allowed={}, num_lineage_entries_sql_parser_failure={'redacted-data-model': 138}, num_lineage_entries_sql_parser_success={}, num_skipped_lineage_entries_other={}, num_total_log_entries={}, num_parsed_log_entries={}, num_total_audit_entries={'redacted-data-model': 366}, num_parsed_audit_entries={'redacted-data-model': 366}, bigquery_audit_metadata_datasets_missing=None, lineage_failed_extraction=[], lineage_metadata_entries={}, lineage_mem_size={}, lineage_extraction_sec={}, usage_extraction_sec={'redacted-data-model': 2.14}, usage_failed_extraction=[], num_project_datasets_to_scan={'redacted-data-model': 8}, metadata_extraction_sec={'redacted-data-model.backoffice': 1.94, 'redacted-data-model.bikepassport': 3.02, 'redacted-data-model.bikeqc': 2.27, 'redacted-data-model.dm_bike': 1.9, 'redacted-data-model.dm_sales': 1.83, 'redacted-data-model.odoo': 4.3, 'redacted-data-model.ref_dimension': 4.93}, include_table_lineage=True, use_date_sharded_audit_log_tables=False, log_page_size=1000, use_v2_audit_metadata=None, use_exported_bigquery_audit_metadata=True, end_time=None, log_entry_start_time=None, log_entry_end_time=None, audit_start_time='2022-10-11T23:45:00Z', audit_end_time='2022-10-13T12:20:19Z', upstream_lineage={}, partition_info={}, profile_table_selection_criteria={}, selected_profile_tables={}, invalid_partition_ids={}, allow_pattern=None, deny_pattern=None, num_usage_workunits_emitted=0, query_log_delay=None, total_query_log_entries=None, num_read_events=None, num_query_events=None, num_filtered_read_events=None, num_filtered_query_events=None, num_operational_stats_workunits_emitted=0, read_reasons_stat=Counter(), operation_types_stat=Counter({'SELECT': 1})))
The configuration are as following
source:
type: bigquery-beta
config:
project_id_pattern:
allow:
- redacted-data-model
dataset_pattern:
deny:
- auditlog_dataset
include_table_lineage: 'True'
include_usage_statistics: 'True'
lineage_use_sql_parser: 'True'
use_exported_bigquery_audit_metadata: 'True'
bigquery_audit_metadata_datasets:
- redacted-data-model.auditlog_dataset
profiling:
enabled: 'True'
profile_table_level_only: 'True'
stateful_ingestion:
enabled: 'True'
dazzling-judge-80093
10/14/2022, 10:24 AMinclude_table_lineage
to False
and check if it still fails without lineage info?brave-secretary-27487
10/14/2022, 12:38 PMbrave-secretary-27487
10/14/2022, 12:39 PMbrave-secretary-27487
10/17/2022, 7:44 AMable-evening-90828
10/17/2022, 5:36 PMv0.9.0
, we also started seeing all the bigquery ingestion failing due to the following error and it wasn't clear from the logs what went wrong.
"failures": '
'{"Stateful Ingestion": ["Fail safe mode triggered, entity difference percent:94.73684210526316 > '
'fail_safe_threshold:{self.stateful_ingestion_config.fail_safe_threshold}"]}, "soft_deleted_stale_entities": [], '
dazzling-judge-80093
10/17/2022, 5:36 PMdazzling-judge-80093
10/17/2022, 5:37 PMable-evening-90828
10/17/2022, 5:37 PMbigquery
able-evening-90828
10/17/2022, 5:37 PMable-evening-90828
10/17/2022, 7:02 PMbrave-secretary-27487
10/18/2022, 6:30 AMbigquery-beta
connectorbrave-secretary-27487
10/18/2022, 6:31 AMdazzling-judge-80093
10/18/2022, 6:32 AMbrave-secretary-27487
10/18/2022, 6:38 AMbrave-secretary-27487
10/18/2022, 6:39 AMdazzling-judge-80093
10/18/2022, 6:39 AMdazzling-judge-80093
10/18/2022, 6:40 AMUnable to get tables for dataset dm_bike in project redacted-data-model, skipping. The error was: 'type' object is not subscriptable
brave-secretary-27487
10/18/2022, 6:40 AMbrave-secretary-27487
10/18/2022, 6:41 AM<https://datahubproject.io/docs/debugging/>
to see how to enable more logs but didn't find a good solution on that pagedazzling-judge-80093
10/18/2022, 6:41 AMbrave-secretary-27487
10/18/2022, 6:42 AMdazzling-judge-80093
10/18/2022, 1:04 PMbrave-secretary-27487
10/18/2022, 1:23 PMbrave-secretary-27487
10/18/2022, 1:24 PMbrave-secretary-27487
10/18/2022, 1:24 PMdazzling-judge-80093
10/18/2022, 2:43 PMdazzling-judge-80093
10/18/2022, 2:50 PMbrave-secretary-27487
10/18/2022, 2:56 PMshy-island-99768
10/19/2022, 11:53 AMdazzling-judge-80093
10/19/2022, 11:54 AMshy-island-99768
10/19/2022, 11:57 AMdazzling-judge-80093
10/19/2022, 1:30 PMbrave-secretary-27487
10/19/2022, 1:39 PMdazzling-judge-80093
10/19/2022, 1:40 PMable-evening-90828
10/19/2022, 5:44 PMbigquery
ingestion?dazzling-judge-80093
10/19/2022, 5:50 PMable-evening-90828
10/19/2022, 6:10 PM'infos': ['2022-10-19 17:56:42.170148 [exec_id=1538fcc1-ed5d-48f4-b43a-4e1784a1b5f3] INFO: Starting execution for task with name=RUN_INGEST',
'2022-10-19 18:01:37.548764 [exec_id=1538fcc1-ed5d-48f4-b43a-4e1784a1b5f3] INFO: stdout=+ task_id=1538fcc1-ed5d-48f4-b43a-4e1784a1b5f3\n'
'+ datahub_version=0.9.0.2rc1\n'
'+ plugins=bigquery\n'
Then later
"failures": {"Stateful Ingestion": ["Fail safe mode triggered, entity difference percent:94.73684210526316 > '
'fail_safe_threshold:{self.stateful_ingestion_config.fail_safe_threshold}"]}, "soft_deleted_stale_entities": [], '
dazzling-judge-80093
10/19/2022, 6:11 PMable-evening-90828
10/19/2022, 6:12 PMbigquery
connectorable-evening-90828
10/19/2022, 6:13 PMbigquery-beta
connector didn't work for me when I tried before. It failed without giving any meaningful error. But let me try again.dazzling-judge-80093
10/19/2022, 6:23 PMable-evening-90828
10/19/2022, 6:23 PM'/usr/local/bin/run_ingest.sh: line 40: 438 Killed ( datahub ${debug_option} ingest run -c "${recipe_file}" '
'${report_option} )\n'
'+ exit 1\n',
"2022-10-19 18:17:04.660039 [exec_id=f56c5436-ec2d-4343-b936-d5eec7aafc6e] INFO: Failed to execute 'datahub ingest'",
'2022-10-19 18:17:04.801238 [exec_id=f56c5436-ec2d-4343-b936-d5eec7aafc6e] INFO: Caught exception EXECUTING '
'task_id=f56c5436-ec2d-4343-b936-d5eec7aafc6e, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 227, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
able-evening-90828
10/19/2022, 6:24 PM'+ datahub_version=0.9.0.2rc1\n'
'+ plugins=bigquery-beta\n'
able-evening-90828
10/19/2022, 6:25 PM'[2022-10-19 18:16:57,398] WARNING {datahub.ingestion.source.bigquery_v2.bigquery_schema:449} - '
'<redacted project id>.metrics.cloudaudit_googleapis_com_activity_20221003 contains more than 300 columns, only processing 300 '
'columns\n'
dazzling-judge-80093
10/19/2022, 6:26 PMable-evening-90828
10/19/2022, 6:26 PMdazzling-judge-80093
10/19/2022, 6:27 PM438 Killed ( datahub ${debug_option} ingest run -c "${recipe_file}" '
I guess you need to grant more memory to the container as it got killedable-evening-90828
10/19/2022, 6:28 PMdazzling-judge-80093
10/19/2022, 6:29 PMkilled
message. It can be some other reason as well but this is the most common issue.able-evening-90828
10/19/2022, 6:29 PMdazzling-judge-80093
10/19/2022, 6:30 PMdazzling-judge-80093
10/19/2022, 6:30 PMable-evening-90828
10/19/2022, 6:31 PMdazzling-judge-80093
10/20/2022, 7:10 AMable-evening-90828
10/20/2022, 7:23 PMable-evening-90828
10/20/2022, 10:06 PMbigquery
connector with less than 256MB.
What is the reason the bigquery-beta
connector needs significantly more memory to run the same ingestion? Is it because I am running stateful ingestion?mammoth-bear-12532
mammoth-bear-12532
bigquery-usage
connector forable-evening-90828
10/24/2022, 4:49 PMmammoth-bear-12532
include_usage_statistics
able-evening-90828
10/24/2022, 6:58 PMtrue
by default. I just tried to set it to false and didn't see any memory spike any more.
However, the old error about stateful ingestion change over the threshold came back again.
'+ datahub_version=0.9.0.2rc1\n'
'+ plugins=bigquery-beta\n'
...
' Pipeline finished with at least 1 failures ; produced 242 events in 2 minutes and 8.31 seconds.\n'
'+ exit 1\n',
"2022-10-24 18:14:16.198997 [exec_id=18405a1c-d9c7-4c45-bbfe-5b3e2f0e9d65] INFO: Failed to execute 'datahub ingest'",
'2022-10-24 18:14:16.218294 [exec_id=18405a1c-d9c7-4c45-bbfe-5b3e2f0e9d65] INFO: Caught exception EXECUTING '
'task_id=18405a1c-d9c7-4c45-bbfe-5b3e2f0e9d65, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 227, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
...
"failures": {"Stateful Ingestion": ["Fail safe mode triggered, entity '
'difference percent:95.65217391304348 > fail_safe_threshold:{self.stateful_ingestion_config.fail_safe_threshold}"]}
mammoth-bear-12532
mammoth-bear-12532
config:
stateful_ingestion:
fail_safe_threshold: 100
able-evening-90828
10/25/2022, 7:02 PM