Using the new ingestion bigquery-beta version 0,9....
# troubleshoot
b
Using the new ingestion bigquery-beta version 0,9.0 (just released) I constantly get this error. This error also appeared when using the prerelease. I don't understand what the error or how to resolve it. -edit moved block of stacktrace/code to comments
There are some warning with the execution of the ingestion ``Unable to get tables for dataset odoo in project vanmoof-data-model, skipping. The error was: 'type' object is not subscriptable`
b
Thanks @brave-secretary-27487 for reporting this issue. @dazzling-judge-80093 can you have a look at this?
d
Interesting can you give me the whole error message?
l
Hi @brave-secretary-27487! Gentle reminder to please stick to our Slack Guidelines & post large blocks of code/stack trace in threads; it’s a HUGE help for us to keep track of unaddressed questions across our various support channels! teamwork
b
@little-megabyte-1074 Sorry didn't knew that will do in the future!
@dazzling-judge-80093 this is the complete stack trace
Copy code
datahub.configuration.common.PipelineExecutionError: ('Source reported errors', BigQueryV2Report(events_produced=78, events_produced_per_sec=0, event_ids=['container-platforminstance-redacted-data-model-urn:li:container:41f24366a1728c03a192682b167e0310', 'container-subtypes-redacted-data-model-urn:li:container:41f24366a1728c03a192682b167e0310', 'container-subtypes-backoffice-urn:li:container:b15642ba529f5f3d02b15ddaaded692b', 'container-parent-container-backoffice-urn:li:container:b15642ba529f5f3d02b15ddaaded692b-urn:li:container:41f24366a1728c03a192682b167e0310', 'container-subtypes-bikepassport-urn:li:container:5641ea87d07db4da30527f1615bc75b8', 'container-parent-container-bikepassport-urn:li:container:5641ea87d07db4da30527f1615bc75b8-urn:li:container:41f24366a1728c03a192682b167e0310', 'container-platforminstance-dm_bike-urn:li:container:0b55a94dc64c5f8506592fd06e44841e', 'status-for-urn:li:container:21a9e98f22278215ac20d948e1971113', 'status-for-urn:li:container:21a9e98f22278215ac20d948e1971113', 'container-platforminstance-odoo-urn:li:container:eaa47f0af95e6bca81354fa024789d40', '... sampled of 78 total elements'], warnings={}, failures={'Stateful Ingestion': ['Fail safe mode triggered, entity difference percent:100.0 > fail_safe_threshold:{self.stateful_ingestion_config.fail_safe_threshold}']}, soft_deleted_stale_entities=[], tables_scanned=0, views_scanned=6, entities_profiled=0, filtered=['backoffice-datadump', 'odoo-aws-replication', 'rider-recap', 'auditlog_dataset.*', 'vm-datawarehouse-p', 'vm-datawarehouse-t'], query_combiner=None, num_total_lineage_entries={'redacted-data-model': 366}, num_skipped_lineage_entries_missing_data={'redacted-data-model': 30}, num_skipped_lineage_entries_not_allowed={}, num_lineage_entries_sql_parser_failure={'redacted-data-model': 138}, num_lineage_entries_sql_parser_success={}, num_skipped_lineage_entries_other={}, num_total_log_entries={}, num_parsed_log_entries={}, num_total_audit_entries={'redacted-data-model': 366}, num_parsed_audit_entries={'redacted-data-model': 366}, bigquery_audit_metadata_datasets_missing=None, lineage_failed_extraction=[], lineage_metadata_entries={}, lineage_mem_size={}, lineage_extraction_sec={}, usage_extraction_sec={'redacted-data-model': 2.14}, usage_failed_extraction=[], num_project_datasets_to_scan={'redacted-data-model': 8}, metadata_extraction_sec={'redacted-data-model.backoffice': 1.94, 'redacted-data-model.bikepassport': 3.02, 'redacted-data-model.bikeqc': 2.27, 'redacted-data-model.dm_bike': 1.9, 'redacted-data-model.dm_sales': 1.83, 'redacted-data-model.odoo': 4.3, 'redacted-data-model.ref_dimension': 4.93}, include_table_lineage=True, use_date_sharded_audit_log_tables=False, log_page_size=1000, use_v2_audit_metadata=None, use_exported_bigquery_audit_metadata=True, end_time=None, log_entry_start_time=None, log_entry_end_time=None, audit_start_time='2022-10-11T23:45:00Z', audit_end_time='2022-10-13T12:20:19Z', upstream_lineage={}, partition_info={}, profile_table_selection_criteria={}, selected_profile_tables={}, invalid_partition_ids={}, allow_pattern=None, deny_pattern=None, num_usage_workunits_emitted=0, query_log_delay=None, total_query_log_entries=None, num_read_events=None, num_query_events=None, num_filtered_read_events=None, num_filtered_query_events=None, num_operational_stats_workunits_emitted=0, read_reasons_stat=Counter(), operation_types_stat=Counter({'SELECT': 1})))
The configuration are as following
Copy code
source:
  type: bigquery-beta
  config:
    project_id_pattern:
      allow:
      - redacted-data-model
    dataset_pattern:
      deny:
      - auditlog_dataset
    include_table_lineage: 'True'
    include_usage_statistics: 'True'
    lineage_use_sql_parser: 'True'
    use_exported_bigquery_audit_metadata: 'True'
    bigquery_audit_metadata_datasets:
    - redacted-data-model.auditlog_dataset
    profiling:
      enabled: 'True'
      profile_table_level_only: 'True'
    stateful_ingestion:
      enabled: 'True'
d
Please, can you try to set
include_table_lineage
to
False
and check if it still fails without lineage info?
b
@dazzling-judge-80093 Also doesn't work
@dazzling-judge-80093 Disabled the statefull ingestion and the ingestion works
a
After upgrading to
v0.9.0
, we also started seeing all the bigquery ingestion failing due to the following error and it wasn't clear from the logs what went wrong.
Copy code
"failures": '
                      '{"Stateful Ingestion": ["Fail safe mode triggered, entity difference percent:94.73684210526316 > '
                      'fail_safe_threshold:{self.stateful_ingestion_config.fail_safe_threshold}"]}, "soft_deleted_stale_entities": [], '
d
Are you using bigquery or bigquery-beta connector?
Which version did you use before?
a
bigquery
v0.8.45
If I update the datahub cli version for an UI ingestion to 0.8.45, then it runs fine.
b
I use the
bigquery-beta
connector
Should I adjust the statefull ingestion threshold?
d
When you disabled stateful ingestion did it ingested anything? This fail safe threshold is to make sure to not soft delete too many items at once if there is an issue with the ingestion/filter out too many datasets
b
@dazzling-judge-80093 I'm not really sure to be honest. I run the ingest but some changes are propogated to the UI and some say the last ingest was some time ago
@dazzling-judge-80093 Here are the logs for more info
d
I will add some more debug line into our code and if you are ok with that it would help to identify any issue if you could try it and send me the logs
This is the error message I would like to figure out where it comes and why
Copy code
Unable to get tables for dataset dm_bike in project redacted-data-model, skipping. The error was: 'type' object is not subscriptable
b
@dazzling-judge-80093 Ofcourse I would appreciate it! Also the lineage is picked up according to the logs but isn't digested. Can the logs also be turned on for that? I want to check why the lineage isn't working
@dazzling-judge-80093 Let me know how to provide more logs and I will run it. was checking
<https://datahubproject.io/docs/debugging/>
to see how to enable more logs but didn't find a good solution on that page
d
I will ping you when I will have it today
b
Thanks
d
@brave-secretary-27487 please, can you try to run an ingestion with this https://pypi.org/project/acryl-datahub/0.9.0.2rc0/ rc release and send me the logs? This should capture full stacktrace about the error we saw in the logs earlier.
b
Hey @dazzling-judge-80093 here is the log
Is the problem that we pull information from a project which we don't include in the ingestion also the problem that the lineage isn't working?
d
the two are connected, I’m going to provide a fix soon
this seems to be an issue with python 3.8 only
b
@dazzling-judge-80093 Thanks for the support much appriciated @shy-island-99768 here some context
s
@dazzling-judge-80093 I think I spotted some logic errors in the code related to the parse_sql part in bigquery lineage. I want to open a pullrequest to see if this really is an issue, but cannot push my branch. Can you help me get access to the repo?
d
You should be able to open a pull request from your fork
s
Thanks!
d
@brave-secretary-27487 we cut a new rc release which should resolve the issue above, please, can you test it? -> https://pypi.org/project/acryl-datahub/0.9.0.2rc1/
b
@dazzling-judge-80093 Run and got everything we wanted. Much appriciated! Thank you very much
d
awesome, thanks for confirming and reporting it
a
@dazzling-judge-80093 does this also fix the issue I had with
bigquery
ingestion?
d
@able-evening-90828 might be, can you check it?
a
@dazzling-judge-80093 Still got same error.
Copy code
'infos': ['2022-10-19 17:56:42.170148 [exec_id=1538fcc1-ed5d-48f4-b43a-4e1784a1b5f3] INFO: Starting execution for task with name=RUN_INGEST',
           '2022-10-19 18:01:37.548764 [exec_id=1538fcc1-ed5d-48f4-b43a-4e1784a1b5f3] INFO: stdout=+ task_id=1538fcc1-ed5d-48f4-b43a-4e1784a1b5f3\n'
           '+ datahub_version=0.9.0.2rc1\n'
           '+ plugins=bigquery\n'
Then later
Copy code
"failures": {"Stateful Ingestion": ["Fail safe mode triggered, entity difference percent:94.73684210526316 > '
                      'fail_safe_threshold:{self.stateful_ingestion_config.fail_safe_threshold}"]}, "soft_deleted_stale_entities": [], '
d
Did you use the bigquery connector? The issue that I fixed was with the bigquery beta connector
a
Yes, I used
bigquery
connector
bigquery-beta
connector didn't work for me when I tried before. It failed without giving any meaningful error. But let me try again.
d
We improved bigquery beta a a lot recently and it will replace the bigquery connector soon.
a
Didn't work.
Copy code
'/usr/local/bin/run_ingest.sh: line 40:   438 Killed                  ( datahub ${debug_option} ingest run -c "${recipe_file}" '
           '${report_option} )\n'
           '+ exit 1\n',
           "2022-10-19 18:17:04.660039 [exec_id=f56c5436-ec2d-4343-b936-d5eec7aafc6e] INFO: Failed to execute 'datahub ingest'",
           '2022-10-19 18:17:04.801238 [exec_id=f56c5436-ec2d-4343-b936-d5eec7aafc6e] INFO: Caught exception EXECUTING '
           'task_id=f56c5436-ec2d-4343-b936-d5eec7aafc6e, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 227, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Copy code
'+ datahub_version=0.9.0.2rc1\n'
           '+ plugins=bigquery-beta\n'
There were a lot of such warning messages:
Copy code
'[2022-10-19 18:16:57,398] WARNING  {datahub.ingestion.source.bigquery_v2.bigquery_schema:449} - '
           '<redacted project id>.metrics.cloudaudit_googleapis_com_activity_20221003 contains more than 300 columns, only processing 300 '
           'columns\n'
d
this is “normal” these tables has more than 300 columns and we don’t process all the columns
a
Ok, aside from these warning messages, there were no other error messages, so it is unclear what went wrong.
d
438 Killed                  ( datahub ${debug_option} ingest run -c "${recipe_file}" '
I guess you need to grant more memory to the container as it got killed
a
How do you know that it was killed due to memory?
d
If you try to go over the limits on k8s it kills the process and what you can see is this
killed
message. It can be some other reason as well but this is the most common issue.
a
This is the memory usage in the past hour for the datahub-actions pod. There was no spike.
d
Can you check the cpu as well? It is weird why it got killed
it basically got a kill signal
a
No CPU spike either
d
Can you check on k8s the reason why it killed the pod?
a
@dazzling-judge-80093 Sorry that I was looking at the wrong cluster yesterday. There was a memory and CPU spike. But the pod wasn't killed, just the datahub cli that ran the ingestion.
I was able to finally get it working after bumping the memory to 1GB. And it turned out the ingestion used 648MB memory, which it was able to successfully run using the
bigquery
connector with less than 256MB. What is the reason the
bigquery-beta
connector needs significantly more memory to run the same ingestion? Is it because I am running stateful ingestion?
m
@able-evening-90828: one of the main reasons for needing more memory is that the new connector is doing regular table metadata extraction as well as usage metadata extraction
which previously you would need to use the
bigquery-usage
connector for
a
@mammoth-bear-12532 @dazzling-judge-80093 thank you for your response. It makes sense. Is there any way I can turn off the extraction of usage metadata? We only care about the schema right now. In the recipe I only set the stateful ingestion to true, so everything else should be the default.
m
According to : https://datahubproject.io/docs/generated/ingestion/sources/bigquery#config-details it is
Copy code
include_usage_statistics
a
Thank you @mammoth-bear-12532. It makes sense that the flag is
true
by default. I just tried to set it to false and didn't see any memory spike any more. However, the old error about stateful ingestion change over the threshold came back again.
Copy code
'+ datahub_version=0.9.0.2rc1\n'
           '+ plugins=bigquery-beta\n'

...

          ' Pipeline finished with at least 1 failures ; produced 242 events in 2 minutes and 8.31 seconds.\n'
           '+ exit 1\n',
           "2022-10-24 18:14:16.198997 [exec_id=18405a1c-d9c7-4c45-bbfe-5b3e2f0e9d65] INFO: Failed to execute 'datahub ingest'",
           '2022-10-24 18:14:16.218294 [exec_id=18405a1c-d9c7-4c45-bbfe-5b3e2f0e9d65] INFO: Caught exception EXECUTING '
           'task_id=18405a1c-d9c7-4c45-bbfe-5b3e2f0e9d65, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 227, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],

...

"failures": {"Stateful Ingestion": ["Fail safe mode triggered, entity '
                      'difference percent:95.65217391304348 > fail_safe_threshold:{self.stateful_ingestion_config.fail_safe_threshold}"]}
m
Hi there, you will need to set this threshold to 100 to allow the ingestion to issue the deletes for all the extra entities that you will no longer be ingesting (since you turned off the usage extraction)
Copy code
config:
  stateful_ingestion:
    fail_safe_threshold: 100
a
It worked. Thank you very much @mammoth-bear-12532