wooden-football-7175
03/14/2022, 3:54 PMbatch_request
created from sql query, it show a strange error:
ERROR: name 'MetadataSQLParser' is not defined
Debugging, I added the parameter parse_table_names_from_sql
referee on the documentation to discover why is not sending the results to DH, but it seems that the provider class
do not have this parameter and is failing .
TypeError Traceback (most recent call last)
~/F14/gitlab/great-expectations/.venv/lib/python3.7/site-packages/great_expectations/data_context/util.py in instantiate_class_from_config(config, runtime_environment, config_defaults)
114 try:
--> 115 class_instance = class_(**config_with_defaults)
116 except TypeError as e:
TypeError: __init__() got an unexpected keyword argument 'parse_table_names_from_sql'
class DataHubValidationAction(ValidationAction):
def __init__(
self,
data_context: DataContext,
server_url: str,
env: str = builder.DEFAULT_ENV,
platform_instance_map: Optional[Dict[str, str]] = None,
graceful_exceptions: bool = True,
token: Optional[str] = None,
timeout_sec: Optional[float] = None,
retry_status_codes: Optional[List[int]] = None,
retry_max_times: Optional[int] = None,
extra_headers: Optional[Dict[str, str]] = None,
):
super().__init__(data_context)
self.server_url = server_url
self.env = env
self.platform_instance_map = platform_instance_map
self.graceful_exceptions = graceful_exceptions
self.token = token
self.timeout_sec = timeout_sec
self.retry_status_codes = retry_status_codes
self.retry_max_times = retry_max_times
self.extra_headers = extra_headers
wooden-football-7175
03/14/2022, 4:24 PMgracefull…
to false, next error occurs.
ERROR: Error running action with name datahub_action
Traceback (most recent call last):
File "/Users/guido/F14/gitlab/great-expectations/.venv/lib/python3.7/site-packages/great_expectations/validation_operators/validation_operators.py", line 452, in _run_actions
checkpoint_identifier=checkpoint_identifier,
File "/Users/guido/F14/gitlab/great-expectations/.venv/lib/python3.7/site-packages/great_expectations/checkpoint/actions.py", line 74, in run
**kwargs,
File "/Users/guido/F14/gitlab/great-expectations/.venv/lib/python3.7/site-packages/datahub/integrations/great_expectations/action.py", line 128, in _run
datasets = self.get_dataset_partitions(batch_identifier, data_asset)
File "/Users/guido/F14/gitlab/great-expectations/.venv/lib/python3.7/site-packages/datahub/integrations/great_expectations/action.py", line 613, in get_dataset_partitions
tables = MetadataSQLSQLParser(query).get_tables()
File "/Users/guido/F14/gitlab/great-expectations/.venv/lib/python3.7/site-packages/datahub/utilities/sql_parser.py", line 57, in __init__
self._parser = MetadataSQLParser(sql_query)
NameError: name 'MetadataSQLParser' is not defined
wooden-football-7175
03/14/2022, 4:25 PMwooden-football-7175
03/14/2022, 4:30 PM<http://192.168.0.14:9002/dataset/urn:li:dataset:(urn:li:dataPlatform:redshift,database.schema.table>, DEV)
platform_instance_map = { "redshift": "database.schema.table" }
Do not succeeded trying to view validationsloud-island-88694
loud-island-88694
big-carpet-38439
03/14/2022, 5:20 PMbig-carpet-38439
03/14/2022, 5:20 PMwooden-football-7175
03/14/2022, 5:45 PMwooden-football-7175
03/14/2022, 5:47 PMwooden-football-7175
03/14/2022, 6:05 PMwooden-football-7175
03/14/2022, 6:16 PMghost
table, it created blank and create lineage over a dummy table. This behaviour is not repeated here (I do not know if this is useful information)wooden-football-7175
03/14/2022, 6:41 PMdef get_platform_from_sqlalchemy_uri(sqlalchemy_uri: str) -> str:
if sqlalchemy_uri.startswith("bigquery"):
return "bigquery"
if sqlalchemy_uri.startswith("clickhouse"):
return "clickhouse"
if sqlalchemy_uri.startswith("druid"):
return "druid"
if sqlalchemy_uri.startswith("mssql"):
return "mssql"
if (
sqlalchemy_uri.startswith("jdbc:postgres:")
and sqlalchemy_uri.index("redshift.amazonaws") > 0
) or sqlalchemy_uri.startswith("redshift"):
return "redshift"
if sqlalchemy_uri.startswith("snowflake"):
return "snowflake"
if sqlalchemy_uri.startswith("presto"):
return "presto"
if sqlalchemy_uri.startswith("postgresql"):
return "redshift"
if sqlalchemy_uri.startswith("pinot"):
return "pinot"
if sqlalchemy_uri.startswith("oracle"):
return "oracle"
if sqlalchemy_uri.startswith("mysql"):
return "mysql"
if sqlalchemy_uri.startswith("mongodb"):
return "mongodb"
if sqlalchemy_uri.startswith("hive"):
return "hive"
if sqlalchemy_uri.startswith("awsathena"):
return "athena"
return "external"
On this function, that return the platfom, It parse the sqlalchemy_uri
I create a logger custom inside the function def get_dataset_partitions(self, batch_identifier, data_asset)
and print the URI
before it call the next funtion (line 627 on action.,py)
dataset_urn = make_dataset_urn_from_sqlalchemy_uri(
sqlalchemy_uri,
None,
table,
self.env,
self.get_platform_instance(
data_asset.active_batch_definition.datasource_name
),
)
The uri started with sqlalchemy_uri: postgresql+psycopg2://
so it returned the platform as `postgres`and not redshift
I try changing the return value to redshift, and the info was emmited OK.
The question here is, is a problem of my library which makes the query to redshift?wooden-football-7175
03/14/2022, 6:45 PMbig-carpet-38439
03/14/2022, 7:07 PMwooden-football-7175
03/14/2022, 8:18 PMhundreds-photographer-13496
03/16/2022, 10:33 AMERROR: name 'MetadataSQLParser' is not defined
and the new config param`parse_table_names_from_sql` (False, by default) has been added quite recently( PR.) and not released yet. It should be available in next release.wooden-football-7175
03/16/2022, 12:25 PMparse_table_names_from_sql
prop to the checkpoint config and execute correctly with that. That ERROR: name 'MetadataSQLParser' is not defined
error I guess that was generated by “no the last version” of GE and dependencies.
😃
Excelent PR 😃 🚀
Glad to helpbig-carpet-38439
03/16/2022, 3:10 PM