Using the new ingestion bigquery beta version 0 9 0 just rel DataHub #troubleshoot

Using the new ingestion bigquery-beta version 0,9....

brave-secretary-27487

10/13/2022, 12:16 PM

Using the new ingestion bigquery-beta version 0,9.0 (just released) I constantly get this error. This error also appeared when using the prerelease. I don't understand what the error or how to resolve it. -edit moved block of stacktrace/code to comments

brave-secretary-27487

10/13/2022, 12:22 PM

There are some warning with the execution of the ingestion ``Unable to get tables for dataset odoo in project vanmoof-data-model, skipping. The error was: 'type' object is not subscriptable`

bulky-electrician-72362

10/13/2022, 12:33 PM

Thanks @brave-secretary-27487 for reporting this issue. @dazzling-judge-80093 can you have a look at this?

dazzling-judge-80093

10/13/2022, 2:26 PM

Interesting can you give me the whole error message?

little-megabyte-1074

10/13/2022, 8:55 PM

Hi @brave-secretary-27487! Gentle reminder to please stick to our Slack Guidelines & post large blocks of code/stack trace in threads; it’s a HUGE help for us to keep track of unaddressed questions across our various support channels! teamwork

brave-secretary-27487

10/14/2022, 8:44 AM

@little-megabyte-1074 Sorry didn't knew that will do in the future!

brave-secretary-27487

10/14/2022, 8:45 AM

Untitled.txt

brave-secretary-27487

10/14/2022, 8:45 AM

@dazzling-judge-80093 this is the complete stack trace

brave-secretary-27487

10/14/2022, 8:52 AM

Copy code

datahub.configuration.common.PipelineExecutionError: ('Source reported errors', BigQueryV2Report(events_produced=78, events_produced_per_sec=0, event_ids=['container-platforminstance-redacted-data-model-urn:li:container:41f24366a1728c03a192682b167e0310', 'container-subtypes-redacted-data-model-urn:li:container:41f24366a1728c03a192682b167e0310', 'container-subtypes-backoffice-urn:li:container:b15642ba529f5f3d02b15ddaaded692b', 'container-parent-container-backoffice-urn:li:container:b15642ba529f5f3d02b15ddaaded692b-urn:li:container:41f24366a1728c03a192682b167e0310', 'container-subtypes-bikepassport-urn:li:container:5641ea87d07db4da30527f1615bc75b8', 'container-parent-container-bikepassport-urn:li:container:5641ea87d07db4da30527f1615bc75b8-urn:li:container:41f24366a1728c03a192682b167e0310', 'container-platforminstance-dm_bike-urn:li:container:0b55a94dc64c5f8506592fd06e44841e', 'status-for-urn:li:container:21a9e98f22278215ac20d948e1971113', 'status-for-urn:li:container:21a9e98f22278215ac20d948e1971113', 'container-platforminstance-odoo-urn:li:container:eaa47f0af95e6bca81354fa024789d40', '... sampled of 78 total elements'], warnings={}, failures={'Stateful Ingestion': ['Fail safe mode triggered, entity difference percent:100.0 > fail_safe_threshold:{self.stateful_ingestion_config.fail_safe_threshold}']}, soft_deleted_stale_entities=[], tables_scanned=0, views_scanned=6, entities_profiled=0, filtered=['backoffice-datadump', 'odoo-aws-replication', 'rider-recap', 'auditlog_dataset.*', 'vm-datawarehouse-p', 'vm-datawarehouse-t'], query_combiner=None, num_total_lineage_entries={'redacted-data-model': 366}, num_skipped_lineage_entries_missing_data={'redacted-data-model': 30}, num_skipped_lineage_entries_not_allowed={}, num_lineage_entries_sql_parser_failure={'redacted-data-model': 138}, num_lineage_entries_sql_parser_success={}, num_skipped_lineage_entries_other={}, num_total_log_entries={}, num_parsed_log_entries={}, num_total_audit_entries={'redacted-data-model': 366}, num_parsed_audit_entries={'redacted-data-model': 366}, bigquery_audit_metadata_datasets_missing=None, lineage_failed_extraction=[], lineage_metadata_entries={}, lineage_mem_size={}, lineage_extraction_sec={}, usage_extraction_sec={'redacted-data-model': 2.14}, usage_failed_extraction=[], num_project_datasets_to_scan={'redacted-data-model': 8}, metadata_extraction_sec={'redacted-data-model.backoffice': 1.94, 'redacted-data-model.bikepassport': 3.02, 'redacted-data-model.bikeqc': 2.27, 'redacted-data-model.dm_bike': 1.9, 'redacted-data-model.dm_sales': 1.83, 'redacted-data-model.odoo': 4.3, 'redacted-data-model.ref_dimension': 4.93}, include_table_lineage=True, use_date_sharded_audit_log_tables=False, log_page_size=1000, use_v2_audit_metadata=None, use_exported_bigquery_audit_metadata=True, end_time=None, log_entry_start_time=None, log_entry_end_time=None, audit_start_time='2022-10-11T23:45:00Z', audit_end_time='2022-10-13T12:20:19Z', upstream_lineage={}, partition_info={}, profile_table_selection_criteria={}, selected_profile_tables={}, invalid_partition_ids={}, allow_pattern=None, deny_pattern=None, num_usage_workunits_emitted=0, query_log_delay=None, total_query_log_entries=None, num_read_events=None, num_query_events=None, num_filtered_read_events=None, num_filtered_query_events=None, num_operational_stats_workunits_emitted=0, read_reasons_stat=Counter(), operation_types_stat=Counter({'SELECT': 1})))

The configuration are as following

Copy code

source:
  type: bigquery-beta
  config:
    project_id_pattern:
      allow:
      - redacted-data-model
    dataset_pattern:
      deny:
      - auditlog_dataset
    include_table_lineage: 'True'
    include_usage_statistics: 'True'
    lineage_use_sql_parser: 'True'
    use_exported_bigquery_audit_metadata: 'True'
    bigquery_audit_metadata_datasets:
    - redacted-data-model.auditlog_dataset
    profiling:
      enabled: 'True'
      profile_table_level_only: 'True'
    stateful_ingestion:
      enabled: 'True'

dazzling-judge-80093

10/14/2022, 10:24 AM

Please, can you try to set

include_table_lineage

False

and check if it still fails without lineage info?

brave-secretary-27487

10/14/2022, 12:38 PM

@dazzling-judge-80093 Also doesn't work

brave-secretary-27487

10/14/2022, 12:39 PM

error.txt

brave-secretary-27487

10/17/2022, 7:44 AM

@dazzling-judge-80093 Disabled the statefull ingestion and the ingestion works

able-evening-90828

10/17/2022, 5:36 PM

After upgrading to

v0.9.0

, we also started seeing all the bigquery ingestion failing due to the following error and it wasn't clear from the logs what went wrong.

Copy code

"failures": '
                      '{"Stateful Ingestion": ["Fail safe mode triggered, entity difference percent:94.73684210526316 > '
                      'fail_safe_threshold:{self.stateful_ingestion_config.fail_safe_threshold}"]}, "soft_deleted_stale_entities": [], '

dazzling-judge-80093

10/17/2022, 5:36 PM

Are you using bigquery or bigquery-beta connector?

dazzling-judge-80093

10/17/2022, 5:37 PM

Which version did you use before?

able-evening-90828

10/17/2022, 5:37 PM

bigquery

able-evening-90828

10/17/2022, 5:37 PM

v0.8.45

able-evening-90828

10/17/2022, 7:02 PM

If I update the datahub cli version for an UI ingestion to 0.8.45, then it runs fine.

brave-secretary-27487

10/18/2022, 6:30 AM

I use the

bigquery-beta

connector

brave-secretary-27487

10/18/2022, 6:31 AM

Should I adjust the statefull ingestion threshold?

dazzling-judge-80093

10/18/2022, 6:32 AM

When you disabled stateful ingestion did it ingested anything? This fail safe threshold is to make sure to not soft delete too many items at once if there is an issue with the ingestion/filter out too many datasets

brave-secretary-27487

10/18/2022, 6:38 AM

@dazzling-judge-80093 I'm not really sure to be honest. I run the ingest but some changes are propogated to the UI and some say the last ingest was some time ago

brave-secretary-27487

10/18/2022, 6:39 AM

@dazzling-judge-80093 Here are the logs for more info

logs.txt

dazzling-judge-80093

10/18/2022, 6:39 AM

I will add some more debug line into our code and if you are ok with that it would help to identify any issue if you could try it and send me the logs

dazzling-judge-80093

10/18/2022, 6:40 AM

This is the error message I would like to figure out where it comes and why

Copy code

Unable to get tables for dataset dm_bike in project redacted-data-model, skipping. The error was: 'type' object is not subscriptable

brave-secretary-27487

10/18/2022, 6:40 AM

@dazzling-judge-80093 Ofcourse I would appreciate it! Also the lineage is picked up according to the logs but isn't digested. Can the logs also be turned on for that? I want to check why the lineage isn't working

brave-secretary-27487

10/18/2022, 6:41 AM

@dazzling-judge-80093 Let me know how to provide more logs and I will run it. was checking

<https://datahubproject.io/docs/debugging/>

to see how to enable more logs but didn't find a good solution on that page

dazzling-judge-80093

10/18/2022, 6:41 AM

I will ping you when I will have it today

brave-secretary-27487

10/18/2022, 6:42 AM

Thanks

dazzling-judge-80093

10/18/2022, 1:04 PM

@brave-secretary-27487 please, can you try to run an ingestion with this https://pypi.org/project/acryl-datahub/0.9.0.2rc0/ rc release and send me the logs? This should capture full stacktrace about the error we saw in the logs earlier.

brave-secretary-27487

10/18/2022, 1:23 PM

Hey @dazzling-judge-80093 here is the log

brave-secretary-27487

10/18/2022, 1:24 PM

error.txt

brave-secretary-27487

10/18/2022, 1:24 PM

Is the problem that we pull information from a project which we don't include in the ingestion also the problem that the lineage isn't working?

dazzling-judge-80093

10/18/2022, 2:43 PM

the two are connected, I’m going to provide a fix soon

dazzling-judge-80093

10/18/2022, 2:50 PM

this seems to be an issue with python 3.8 only

brave-secretary-27487

10/18/2022, 2:56 PM

@dazzling-judge-80093 Thanks for the support much appriciated @shy-island-99768 here some context

shy-island-99768

10/19/2022, 11:53 AM

@dazzling-judge-80093 I think I spotted some logic errors in the code related to the parse_sql part in bigquery lineage. I want to open a pullrequest to see if this really is an issue, but cannot push my branch. Can you help me get access to the repo?

dazzling-judge-80093

10/19/2022, 11:54 AM

You should be able to open a pull request from your fork

shy-island-99768

10/19/2022, 11:57 AM

Thanks! ✅

dazzling-judge-80093

10/19/2022, 1:30 PM

@brave-secretary-27487 we cut a new rc release which should resolve the issue above, please, can you test it? -> https://pypi.org/project/acryl-datahub/0.9.0.2rc1/

brave-secretary-27487

10/19/2022, 1:39 PM

@dazzling-judge-80093 Run and got everything we wanted. Much appriciated! Thank you very much

dazzling-judge-80093

10/19/2022, 1:40 PM

awesome, thanks for confirming and reporting it

able-evening-90828

10/19/2022, 5:44 PM

@dazzling-judge-80093 does this also fix the issue I had with

bigquery

ingestion?

dazzling-judge-80093

10/19/2022, 5:50 PM

@able-evening-90828 might be, can you check it?

able-evening-90828

10/19/2022, 6:10 PM

@dazzling-judge-80093 Still got same error.

Copy code

'infos': ['2022-10-19 17:56:42.170148 [exec_id=1538fcc1-ed5d-48f4-b43a-4e1784a1b5f3] INFO: Starting execution for task with name=RUN_INGEST',
           '2022-10-19 18:01:37.548764 [exec_id=1538fcc1-ed5d-48f4-b43a-4e1784a1b5f3] INFO: stdout=+ task_id=1538fcc1-ed5d-48f4-b43a-4e1784a1b5f3\n'
           '+ datahub_version=0.9.0.2rc1\n'
           '+ plugins=bigquery\n'

Then later

Copy code

"failures": {"Stateful Ingestion": ["Fail safe mode triggered, entity difference percent:94.73684210526316 > '
                      'fail_safe_threshold:{self.stateful_ingestion_config.fail_safe_threshold}"]}, "soft_deleted_stale_entities": [], '

dazzling-judge-80093

10/19/2022, 6:11 PM

Did you use the bigquery connector? The issue that I fixed was with the bigquery beta connector

able-evening-90828

10/19/2022, 6:12 PM

Yes, I used

bigquery

connector

able-evening-90828

10/19/2022, 6:13 PM

bigquery-beta

connector didn't work for me when I tried before. It failed without giving any meaningful error. But let me try again.

dazzling-judge-80093

10/19/2022, 6:23 PM

We improved bigquery beta a a lot recently and it will replace the bigquery connector soon.

able-evening-90828

10/19/2022, 6:23 PM

Didn't work.

Copy code

'/usr/local/bin/run_ingest.sh: line 40:   438 Killed                  ( datahub ${debug_option} ingest run -c "${recipe_file}" '
           '${report_option} )\n'
           '+ exit 1\n',
           "2022-10-19 18:17:04.660039 [exec_id=f56c5436-ec2d-4343-b936-d5eec7aafc6e] INFO: Failed to execute 'datahub ingest'",
           '2022-10-19 18:17:04.801238 [exec_id=f56c5436-ec2d-4343-b936-d5eec7aafc6e] INFO: Caught exception EXECUTING '
           'task_id=f56c5436-ec2d-4343-b936-d5eec7aafc6e, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 227, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}

able-evening-90828

10/19/2022, 6:24 PM

Copy code

'+ datahub_version=0.9.0.2rc1\n'
           '+ plugins=bigquery-beta\n'

able-evening-90828

10/19/2022, 6:25 PM

There were a lot of such warning messages:

Copy code

'[2022-10-19 18:16:57,398] WARNING  {datahub.ingestion.source.bigquery_v2.bigquery_schema:449} - '
           '<redacted project id>.metrics.cloudaudit_googleapis_com_activity_20221003 contains more than 300 columns, only processing 300 '
           'columns\n'

dazzling-judge-80093

10/19/2022, 6:26 PM

this is “normal” these tables has more than 300 columns and we don’t process all the columns

able-evening-90828

10/19/2022, 6:26 PM

Ok, aside from these warning messages, there were no other error messages, so it is unclear what went wrong.

dazzling-judge-80093

10/19/2022, 6:27 PM

438 Killed                  ( datahub ${debug_option} ingest run -c "${recipe_file}" '

I guess you need to grant more memory to the container as it got killed

able-evening-90828

10/19/2022, 6:28 PM

How do you know that it was killed due to memory?

dazzling-judge-80093

10/19/2022, 6:29 PM

If you try to go over the limits on k8s it kills the process and what you can see is this

killed

message. It can be some other reason as well but this is the most common issue.

able-evening-90828

10/19/2022, 6:29 PM

This is the memory usage in the past hour for the datahub-actions pod. There was no spike.

dazzling-judge-80093

10/19/2022, 6:30 PM

Can you check the cpu as well? It is weird why it got killed

dazzling-judge-80093

10/19/2022, 6:30 PM

it basically got a kill signal

able-evening-90828

10/19/2022, 6:31 PM

No CPU spike either

dazzling-judge-80093

10/20/2022, 7:10 AM

Can you check on k8s the reason why it killed the pod?

able-evening-90828

10/20/2022, 7:23 PM

@dazzling-judge-80093 Sorry that I was looking at the wrong cluster yesterday. There was a memory and CPU spike. But the pod wasn't killed, just the datahub cli that ran the ingestion.

able-evening-90828

10/20/2022, 10:06 PM

I was able to finally get it working after bumping the memory to 1GB. And it turned out the ingestion used 648MB memory, which it was able to successfully run using the

bigquery

connector with less than 256MB. What is the reason the

bigquery-beta

connector needs significantly more memory to run the same ingestion? Is it because I am running stateful ingestion?

mammoth-bear-12532

10/24/2022, 3:26 AM

@able-evening-90828: one of the main reasons for needing more memory is that the new connector is doing regular table metadata extraction as well as usage metadata extraction

mammoth-bear-12532

10/24/2022, 3:27 AM

which previously you would need to use the

bigquery-usage

connector for

able-evening-90828

10/24/2022, 4:49 PM

@mammoth-bear-12532 @dazzling-judge-80093 thank you for your response. It makes sense. Is there any way I can turn off the extraction of usage metadata? We only care about the schema right now. In the recipe I only set the stateful ingestion to true, so everything else should be the default.

mammoth-bear-12532

10/24/2022, 5:23 PM

According to : https://datahubproject.io/docs/generated/ingestion/sources/bigquery#config-details it is

Copy code

include_usage_statistics

able-evening-90828

10/24/2022, 6:58 PM

Thank you @mammoth-bear-12532. It makes sense that the flag is

true

by default. I just tried to set it to false and didn't see any memory spike any more. However, the old error about stateful ingestion change over the threshold came back again.

Copy code

'+ datahub_version=0.9.0.2rc1\n'
           '+ plugins=bigquery-beta\n'

...

          ' Pipeline finished with at least 1 failures ; produced 242 events in 2 minutes and 8.31 seconds.\n'
           '+ exit 1\n',
           "2022-10-24 18:14:16.198997 [exec_id=18405a1c-d9c7-4c45-bbfe-5b3e2f0e9d65] INFO: Failed to execute 'datahub ingest'",
           '2022-10-24 18:14:16.218294 [exec_id=18405a1c-d9c7-4c45-bbfe-5b3e2f0e9d65] INFO: Caught exception EXECUTING '
           'task_id=18405a1c-d9c7-4c45-bbfe-5b3e2f0e9d65, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 227, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],

...

"failures": {"Stateful Ingestion": ["Fail safe mode triggered, entity '
                      'difference percent:95.65217391304348 > fail_safe_threshold:{self.stateful_ingestion_config.fail_safe_threshold}"]}

mammoth-bear-12532

10/25/2022, 6:11 AM

Hi there, you will need to set this threshold to 100 to allow the ingestion to issue the deletes for all the extra entities that you will no longer be ingesting (since you turned off the usage extraction)

mammoth-bear-12532

10/25/2022, 6:12 AM

Copy code

config:
  stateful_ingestion:
    fail_safe_threshold: 100

able-evening-90828

10/25/2022, 7:02 PM

It worked. Thank you very much @mammoth-bear-12532

74 Views

Open in Slack

Previous Next