I have a bigquery connector instance failling with...
# troubleshoot
w
I have a bigquery connector instance failling with the following error:
Copy code
│ PermissionDenied: 403 request failed: the user does not have 'bigquery.readsessions.create' permission for 'projects/XXXXXXXX'
According to the docs, that permission is required only for lineage. So I tried by disabling table lineage with:
include_table_lineage: False
However, still getting the same error. Is there any other config setting for disabling the table lineage? or is this a bug in the config field? 🧵
This is the full log for the error when table lineage disabled
Copy code
[2022-07-19 12:38:11,217] INFO     {datahub.cli.ingest_cli:99} - DataHub CLI version: 0.8.40+0.1.0
/usr/local/lib/python3.9/site-packages/datahub/ingestion/transformer/add_dataset_browse_path.py:33: DeprecationWarning: Call to deprecated class DatasetTransformer. (Legacy transformer that supports transforming MCE-s using transform_one method. Use BaseTransformer directly and implement the transform_aspect method)
  return cls(config, ctx)
/usr/local/lib/python3.9/site-packages/datahub/ingestion/transformer/add_dataset_ownership.py:174: DeprecationWarning: Call to deprecated class DatasetTransformer. (Legacy transformer that supports transforming MCE-s using transform_one method. Use BaseTransformer directly and implement the transform_aspect method)
  return cls(config, ctx)
[2022-07-19 12:38:14,869] INFO     {datahub.cli.ingest_cli:115} - Starting metadata ingestion
[2022-07-19 12:38:15,434] INFO     {datahub.ingestion.run.pipeline:104} - sink wrote workunit container-info-mo-data-catalog-dev-rygq-urn:li:container:3cab3e00a5a582ac90f3aaae6264e914
[2022-07-19 12:38:15,506] INFO     {datahub.ingestion.run.pipeline:104} - sink wrote workunit container-platforminstance-mo-data-catalog-dev-rygq-urn:li:container:3cab3e00a5a582ac90f3aaae6264e914
[2022-07-19 12:38:15,553] INFO     {datahub.ingestion.run.pipeline:104} - sink wrote workunit container-subtypes-mo-data-catalog-dev-rygq-urn:li:container:3cab3e00a5a582ac90f3aaae6264e914
[2022-07-19 12:38:17,311] INFO     {datahub.cli.ingest_cli:119} - Source (bigquery) report:
{'workunits_produced': 3,
 'workunit_ids': ['container-info-XXXXXXXX-urn:li:container:3cab3e00a5a582ac90f3aaae6264e914',
                  'container-platforminstance-XXXXXXXX-urn:li:container:3cab3e00a5a582ac90f3aaae6264e914',
                  'container-subtypes-XXXXXXXX-urn:li:container:3cab3e00a5a582ac90f3aaae6264e914'],
 'warnings': {},
 'failures': {},
 'cli_version': '0.8.40+0.1.0',
 'cli_entry_location': '/usr/local/lib/python3.9/site-packages/datahub/__init__.py',
 'py_version': '3.9.9 (main, Dec 21 2021, 10:03:34) \n[GCC 10.2.1 20210110]',
 'py_exec_path': '/usr/local/bin/python',
 'os_details': 'Linux-5.4.92-flatcar-x86_64-with-glibc2.31',
 'tables_scanned': 0,
 'views_scanned': 0,
 'entities_profiled': 0,
 'filtered': [],
 'soft_deleted_stale_entities': [],
 'query_combiner': None,
 'num_total_lineage_entries': None,
 'num_skipped_lineage_entries_missing_data': None,
 'num_skipped_lineage_entries_not_allowed': None,
 'num_skipped_lineage_entries_sql_parser_failure': None,
 'num_skipped_lineage_entries_other': None,
 'num_total_log_entries': None,
 'num_parsed_log_entires': None,
 'num_total_audit_entries': None,
 'num_parsed_audit_entires': None,
 'bigquery_audit_metadata_datasets_missing': None,
 'lineage_metadata_entries': None,
 'include_table_lineage': False,
 'use_date_sharded_audit_log_tables': False,
 'log_page_size': 1000,
 'use_v2_audit_metadata': False,
 'use_exported_bigquery_audit_metadata': False,
 'start_time': datetime.datetime(2022, 7, 18, 0, 0, tzinfo=datetime.timezone.utc),
 'end_time': datetime.datetime(2022, 7, 20, 0, 0, tzinfo=datetime.timezone.utc),
 'log_entry_start_time': None,
 'log_entry_end_time': None,
 'audit_start_time': None,
 'audit_end_time': None,
 'upstream_lineage': {},
 'partition_info': {}}
[2022-07-19 12:38:17,312] INFO     {datahub.cli.ingest_cli:122} - Sink (datahub-kafka) report:
{'records_written': 3,
 'warnings': [],
 'failures': [],
 'downstream_start_time': None,
 'downstream_end_time': None,
 'downstream_total_latency_in_seconds': None}
    ..................................................File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
    55   def error_remapped_callable(*args, **kwargs):
    56       try:
    57           return callable_(*args, **kwargs)
    58       except grpc.RpcError as exc:
--> 59           raise exceptions.from_grpc_error(exc) from exc
    ..................................................
     args = (
             parent: "projects/XXXXXXXX"
             read_session {
               data_format: ARROW
               table: "projects/XXXXXXXX/datasets/_f73150d601bf08c0a38c405e168e4a1391e5c632/tables/anonbade4036_44d1
             _476e_a120_33a3d83eac50"
               read_options {
                 arrow_serialization_options {
                   buffer_compression: LZ4_FRAME
                 }
               }
             }
             max_stream_count: 1
             , )
     kwargs = {'metadata': [(...), (...), ]}
     callable_ = <grpc._channel._UnaryUnaryMultiCallable object at 0x7f7c2b569c70>
     grpc.RpcError = <class 'grpc.RpcError'>
     exceptions.from_grpc_error = <function 'from_grpc_error' exceptions.py:590>
    ..................................................---- (full traceback above) ----
File "/usr/local/lib/python3.9/site-packages/datahub/entrypoints.py", line 149, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/datahub/upgrade/upgrade.py", line 333, in wrapper
    res = func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/datahub/telemetry/telemetry.py", line 338, in wrapper
    raise e
File "/usr/local/lib/python3.9/site-packages/datahub/telemetry/telemetry.py", line 290, in wrapper
    res = func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/datahub/utilities/memory_leak_detector.py", line 102, in wrapper
    res = func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 131, in run
    raise e
File "/usr/local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 117, in run
    pipeline.run()
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 215, in run
    for wu in itertools.islice(
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/sql/bigquery.py", line 905, in get_workunits
    for wu in super().get_workunits():
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/sql/sql_common.py", line 725, in get_workunits
    self.add_information_for_schema(inspector, schema)
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/sql/bigquery.py", line 757, in add_information_for_schema
    for row in result.fetchall():
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1288, in fetchall
    self.connection._handle_dbapi_exception(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1514, in _handle_dbapi_exception
    util.raise_(exc_info[1], with_traceback=exc_info[2])
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1284, in fetchall
    l = self.process_rows(self._fetchall_impl())
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1230, in _fetchall_impl
    return self.cursor.fetchall()
File "/usr/local/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/_helpers.py", line 494, in with_closed_check
    return method(self, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 382, in fetchall
    self._try_fetch()
File "/usr/local/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 256, in _try_fetch
    rows_iterable = self._bqstorage_fetch(bqstorage_client)
File "/usr/local/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 295, in _bqstorage_fetch
    read_session = bqstorage_client.create_read_session(
File "/usr/local/lib/python3.9/site-packages/google/cloud/bigquery_storage_v1/services/big_query_read/client.py", line 615, in create_read_session
    response = rpc(
File "/usr/local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 154, in __call__
    return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func
    return retry_target(
File "/usr/local/lib/python3.9/site-packages/google/api_core/retry.py", line 190, in retry_target
    return target()
File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from excPermissionDenied: 403 request failed: the user does not have 'bigquery.readsessions.create' permission for 'projects/mo-data-catalog-dev-rygq'
[2022-07-19 12:38:17,784] INFO     {datahub.entrypoints:176} - DataHub CLI version: 0.8.40+0.1.0 at /usr/local/lib/python3.9/site-packages/datahub/__init__.py
[2022-07-19 12:38:17,784] INFO     {datahub.entrypoints:179} - Python version: 3.9.9 (main, Dec 21 2021, 10:03:34) 
[GCC 10.2.1 20210110] at /usr/local/bin/python on Linux-5.4.92-flatcar-x86_64-with-glibc2.31
[2022-07-19 12:38:17,784] INFO     {datahub.entrypoints:182} - GMS config {}
Stream closed EOF for datahighway-dev/demo-ingestion-bigquery-manual-ncj-ndnv5 (crawler)
s
Sorry if the docs are not clear but I think that comment is meant for this
Copy code
logging.logEntries.list
logging.privateLogEntries.list
w
we can (I could even) fix docs easily 😅 so… it would be like that?
Copy code
# basic requirements

   bigquery.datasets.get
   bigquery.datasets.getIamPolicy
   bigquery.jobs.create
   bigquery.jobs.list
   bigquery.jobs.listAll
   bigquery.models.getMetadata
   bigquery.models.list
   bigquery.routines.get
   bigquery.routines.list
   bigquery.tables.get
   resourcemanager.projects.get
   bigquery.readsessions.create
   bigquery.readsessions.getData

   # requirements if profiling enabled

   bigquery.tables.create 
   bigquery.tables.getData
   bigquery.tables.list

   # requirements if table lineage enabled

   logging.logEntries.list
   logging.privateLogEntries.list
s
I believe yes
w
I’m not an expert on GCP permissions, but I understand that the only permission granting access to actual data (not metadata) is
bigquery.tables.getData
. Is that correct?
(and I couldn’t find a reference of the definitions of those permissions 😅 )
s
I believe so. But have not tested every permission granularly
w
s
can you please add
via GCP logging
I don't think we need those when not using logging api
w
done
s
sorry for nagging but
Copy code
needed for lineage generation via GCP logging
not just table lineage
will merge post this change
thanks