Wanted to bump this PR here: <https://github.com/d...
# contribute-code
p
d
Sorry, I checked this and for me with this pr it complains about STRUCT fields
Copy code
'warnings': {'calm-pagoda-323403.bigquery_demo.test': ['unable to map type STRUCT(a=ARRAY(String()), b=Boolean()) to metadata schema'],
And I saw this warning as well. I’m not sure if we should worried about this or not :):
Copy code
/Users/treff7es/shadow/datahub-linkedin/metadata-ingestion/src/datahub/ingestion/source/sql/bigquery.py:17: UserWarning: Obsolete pybigquery is installed, which is likely to
interfere with sqlalchemy_bigquery.
pybigquery should be uninstalled.
What do you think should we fully move to sqlalchemy biquery? If we do that we should upgrade great expectation as well, I’m afraid, as the version we have does not support but luckily the newer one supports this bigquery driver. My only concern are the monkey patches we have on Great Expectation but if you want I can check if the new one would work.
p
@dazzling-judge-80093 bummer probably makes sense to remove the dependency on pybigquery entirely, but that first warning you posted is slightly concerning could we maybe make it optional in the config to use pybigquery vs qlalchemy biquery? that does sound like a PITA though cc: @fresh-river-19527
d
That’s where I’m leaning towards as well to remove pybigquery fully
f
The warning for STRUCT fields means that the new library breaks the ingestion of this data type? Also not sure about what impact would have migrating to new great_expectations version, that's why I didn't want to remove the pybigquery dependency
d
I will check that impact. Let’s try to remove the old biquery driver.
f
@dazzling-judge-80093 I think I've found the problem with the ingestion of Struct columns. I added the fix in the PR
excited 1
d
great, thanks!
p
@dazzling-judge-80093 wondering if you were seeing the above errors you posted after Pedro's fix?
Sorry to keep bugging you @dazzling-judge-80093 but this PR is pretty important for us, wondering if you had a sec to take a look today Also happy to help any way I can
d
I can check it soon if it works locally
p
Thanks 🙏
d
@fresh-river-19527 I tried it out and now struct looks good but views are failing with:
Copy code
[2022-03-18 19:49:44,499] WARNING  {datahub.ingestion.source.sql.sql_common:1024} - Unable to ingest view ge_temp_views.ge-temp-ac7eb619-9959-4a66-8b63-2d6d1fc61a3b due to an exception.
 Traceback (most recent call last):
  File "/Users/treff7es/shadow/datahub/metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py", line 1016, in loop_views
    yield from self._process_view(
  File "/Users/treff7es/shadow/datahub/metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py", line 1071, in _process_view
    view_definition = inspector.get_view_definition(view, schema)
  File "/Users/treff7es/shadow/datahub/metadata-ingestion/venv/lib/python3.9/site-packages/sqlalchemy/engine/reflection.py", line 337, in get_view_definition
    return self.dialect.get_view_definition(
  File "/Users/treff7es/shadow/datahub/metadata-ingestion/venv/lib/python3.9/site-packages/pybigquery/sqlalchemy_bigquery.py", line 973, in get_view_definition
    view = client.get_table(view_name)
  File "/Users/treff7es/shadow/datahub/metadata-ingestion/venv/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 1031, in get_table
    table_ref = _table_arg_to_table_ref(table, default_project=self.project)
  File "/Users/treff7es/shadow/datahub/metadata-ingestion/venv/lib/python3.9/site-packages/google/cloud/bigquery/table.py", line 2731, in _table_arg_to_table_ref
    value = TableReference.from_string(value, default_project=default_project)
  File "/Users/treff7es/shadow/datahub/metadata-ingestion/venv/lib/python3.9/site-packages/google/cloud/bigquery/table.py", line 252, in from_string
    ) = _helpers._parse_3_part_id(
  File "/Users/treff7es/shadow/datahub/metadata-ingestion/venv/lib/python3.9/site-packages/google/cloud/bigquery/_helpers.py", line 893, in _parse_3_part_id
    raise ValueError(
ValueError: table_id must be a fully-qualified ID in standard SQL format, e.g., "project.dataset.table_id", got ge-temp-ac7eb619-9959-4a66-8b63-2d6d1fc61a3b
I think this happens because the monkey-patching of the
get_view_definition
does not work with the new driver. I don’t know yet why.
but I’m checking
ok, it seems like pybigquery is still used and sqllalchemy still uses that driver
p
is
ge-temp-ac7eb619-9959-4a66-8b63-2d6d1fc61a3b
a valid table name to begin with?
afaik tables shouldn't contain
-
d
If I removed pybigquery from the dependencies then it is fine.
It seems profiling works as well but I would like to run a few more things to make sure nothing gets broken if we remove pybigquery completly.
I tested the pr and we should remove pybiquery completly and need to bump great expectation to the latest (0.14.11). I tested it with major dwhs and profiling worked for all platform with this setup
if we keep pybigquery then sqlalchemy picks it up with reflection instead of the new driver
@fresh-river-19527 Hey, can you do these small changes on the pr and then I think we should be fine. -> I also added comments to the pr. Thanks for the contribution it is great we can remove the deprecated driver.
f
Hi @dazzling-judge-80093, yes, sure, I can make those changes with the libraries. Thanks a lot for your support.
d
Thanks and let me know when you are done and I will do another round of testing
f
@dazzling-judge-80093 I've just commited the changes
excited 1
p
This good to go?
d
Just checking
@fresh-river-19527 Thanks, I tested and look good but there is a merge conflict and a linting error which needs to be fixed. Can you fix the merge conflict and run black formatter on the
src/datahub/ingestion/source/sql/bigquery.py
file? Now you should be able to see on the pr if tests passes.
p
Hi @dazzling-judge-80093, just wondering if this change has been released on pypi?
d
I try to cut a python release today
p
Thanks! I pulled master, ran ingestion locally, and confirmed this 100% solves our issue
d
wow, awesome, thanks for checking
Version 0.8.31.2 is out now
thanks for the patience
p
awesome!!! thanks!!