colossal-alligator-29986
03/25/2022, 3:53 PM[2022-03-25 15:09:59,922] INFO {datahub.cli.ingest_cli:91} - Starting metadata ingestion
[2022-03-25 15:09:59,922] INFO {datahub.ingestion.source.sql.bigquery:276} - Populating lineage info via GCP audit logs
[2022-03-25 15:09:59,928] INFO {datahub.ingestion.source.sql.bigquery:369} - Start loading log entries from BigQuery start_time=2022-03-23T23:45:00Z and end_time=2022-03-26T00:15:00Z
[2022-03-25 15:19:32,800] INFO {datahub.ingestion.source.sql.bigquery:380} - Finished loading 12047 log entries from BigQuery so far
[2022-03-25 15:19:32,800] INFO {datahub.ingestion.source.sql.bigquery:462} - Parsing BigQuery log entries: number of log entries successfully parsed=12047
[2022-03-25 15:19:32,800] INFO {datahub.ingestion.source.sql.bigquery:513} - Creating lineage map: total number of entries=12047, number skipped=1.
[2022-03-25 15:19:32,800] INFO {datahub.ingestion.source.sql.bigquery:270} - Built lineage map containing 12015 entries.colossal-alligator-29986
03/25/2022, 3:54 PMSink (datahub-rest) report:
{'downstream_end_time': None,
'downstream_start_time': None,
'downstream_total_latency_in_seconds': None,
'failures': [],
'gms_version': 'v0.8.31',
'records_written': 0,
'warnings': []}
but in the UI there’s no lineage showing up for the dataset that I pointed to …colossal-alligator-29986
03/25/2022, 3:55 PMsquare-activity-64562
03/25/2022, 4:02 PMcolossal-alligator-29986
03/25/2022, 4:02 PMsquare-activity-64562
03/25/2022, 4:03 PMSink (datahub-rest) report:square-activity-64562
03/25/2022, 4:03 PMcolossal-alligator-29986
03/25/2022, 4:04 PMcolossal-alligator-29986
03/25/2022, 4:05 PM'workunits_produced': 10}colossal-alligator-29986
03/25/2022, 4:06 PMsquare-activity-64562
03/25/2022, 4:07 PMsquare-activity-64562
03/25/2022, 4:07 PMcolossal-alligator-29986
03/25/2022, 4:07 PM'lineage_metadata_entries': 12015,
'log_entry_end_time': '2022-03-26T00:15:00Z',
'log_entry_start_time': '2022-03-23T23:45:00Z',
'num_parsed_audit_entires': None,
'num_parsed_log_entires': 12047,
'num_total_audit_entries': None,
'num_total_log_entries': 12047,
'os_details': 'Linux-5.4.0-1067-gcp-x86_64-with-Ubuntu-18.04-bionic',
'py_exec_path': '/home/andreslowrie/venv/bin/python3',
'py_version': '3.6.9 (default, Dec 8 2021, 21:08:43) \n[GCC 8.4.0]',
'query_combiner': {'combined_queries_issued': 4,
'queries_combined': 73,
'query_exceptions': 0,
'total_queries': 115,
'uncombined_queries_issued': 42},
'soft_deleted_stale_entities': [],
'start_time': datetime.datetime(2022, 3, 24, 0, 0, tzinfo=datetime.timezone.utc),
'tables_scanned': 1,
'use_exported_bigquery_audit_metadata': False,
'use_v2_audit_metadata': False,square-activity-64562
03/25/2022, 4:10 PMworkunit_ids? keys from this https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/api/source.py#L15colossal-alligator-29986
03/25/2022, 4:10 PM10square-activity-64562
03/25/2022, 4:10 PMworkunits_produced = 10 then there should be at least 10 workunits_produced toosquare-activity-64562
03/25/2022, 4:11 PMworkunit_idscolossal-alligator-29986
03/25/2022, 4:11 PMcolossal-alligator-29986
03/25/2022, 4:11 PMsquare-activity-64562
03/25/2022, 4:11 PMsquare-activity-64562
03/25/2022, 4:12 PM'log_entry_end_time': '2022-03-26T00:15:00Z',
'log_entry_start_time': '2022-03-23T23:45:00Z',
This tells us it looked at what start time to end timecolossal-alligator-29986
03/25/2022, 4:15 PMcolossal-alligator-29986
03/25/2022, 4:16 PMcolossal-alligator-29986
03/25/2022, 4:17 PM'container-platforminstance-our-bq-project-id-urn:li:container:10f9ca61ed89e0b95f4fb82690bacfc1',
'container-subtypes-our-bq-project-id-urn:li:container:10f9ca61ed89e0b95f4fb82690bacfc1',
'container-info-the-name-of-the-dataset-urn:li:container:2665ba5c1eca78c8a1ce78d3f402c224',
'container-platforminstance-the-name-of-the-dataset-urn:li:container:2665ba5c1eca78c8a1ce78d3f402c224',
'container-subtypes-the-name-of-the-dataset-urn:li:container:2665ba5c1eca78c8a1ce78d3f402c224',
'container-parent-container-the-name-of-the-dataset-urn:li:container:2665ba5c1eca78c8a1ce78d3f402c224-urn:li:container:10f9ca61ed89e0b95f4fb82690bacfc1',
'container-urn:li:container:2665ba5c1eca78c8a1ce78d3f402c224-to-urn:li:dataset:(urn:li:dataPlatform:bigquery,our-bq-project-id.the-name-of-the-dataset.the-name-of-the-dataset_name_of_the_table,PROD)',
'our-bq-project-id.the-name-of-the-dataset.the-name-of-the-dataset_name_of_the_table',
'profile-our-bq-project-id.the-name-of-the-dataset.the-name-of-the-dataset_name_of_the_table'],colossal-alligator-29986
03/25/2022, 4:17 PMcolossal-alligator-29986
03/25/2022, 4:20 PMsquare-activity-64562
03/25/2022, 4:29 PMsquare-activity-64562
03/25/2022, 4:29 PMsquare-activity-64562
03/25/2022, 4:30 PMsquare-activity-64562
03/25/2022, 4:32 PMsquare-activity-64562
03/25/2022, 4:36 PMsquare-activity-64562
03/25/2022, 4:36 PMsquare-activity-64562
03/25/2022, 4:37 PMcolossal-alligator-29986
03/25/2022, 4:49 PM---
source:
type: "bigquery"
config:
project_id: our-project-id
include_tables: true
include_views: true
include_table_lineage: true
table_pattern:
allow:
- '.*name_of_the_table.*'
schema_pattern:
allow:
- '.*the-name-of-the-dataset.*'
profiling:
enabled: true
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"square-activity-64562
03/25/2022, 4:57 PMschema_pattern as below
source:
type: "bigquery"
config:
project_id: gcp-project-name
schema_pattern:
allow:
- the-name-of-the-dataset
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"square-activity-64562
03/25/2022, 4:57 PMsquare-activity-64562
03/25/2022, 4:58 PMcolossal-alligator-29986
03/25/2022, 4:59 PM.*.*.* which python regex doesn’t allowcolossal-alligator-29986
03/25/2022, 5:00 PMsquare-activity-64562
03/25/2022, 5:01 PMcolossal-alligator-29986
03/25/2022, 5:01 PMcolossal-alligator-29986
03/25/2022, 5:01 PMcolossal-alligator-29986
03/25/2022, 5:02 PMcolossal-alligator-29986
03/25/2022, 5:03 PMsquare-activity-64562
03/28/2022, 3:26 PM# `schema_pattern` for BQ Datasets
schema_pattern:
allow:
- finance_bq_dataset
table_pattern:
deny:
# The exact name of the table is revenue_table_name
# The reason we have this `.*` at the beginning is because the current implmenetation of table_pattern is testing
# project_id.dataset_name.table_name
# We will improve this in the future
- .*revenue_table_namecolossal-alligator-29986
03/28/2022, 3:58 PMsquare-activity-64562
03/28/2022, 3:59 PMsquare-activity-64562
03/28/2022, 3:59 PMcolossal-alligator-29986
03/28/2022, 4:02 PM