Hi Team, I am trying to add great expectations ass...
# troubleshoot
r
Hi Team, I am trying to add great expectations assertions to a snowflake dataset. The snowflake dataset has the URN in upper case since it is how it is defined in snowflake (I am using
convert_urns_to_lowercase: false
in the recipe). great expectations is converting the URN components to lower case. Is there a way to have DataHubValidationAction set the URNs to uppercase?
h
Hi @ripe-apple-36185 this is not possible at the moment. Can you paste urns you are getting with snowflake and GE ? Also, any particular reason why you are using
convert_urns_to_lowercase: True
? Do you use databases/schemas/tables with exactly same literals but different case so that their urns might collide if converted to lowercase ?
r
Hi @hundreds-photographer-13496 I am using
convert_urns_to_lowercase: false
to preserve how it is in Snowflake and be able to mesh metadata provided from different sources. Do you know if there is a way to emmit the output from the Do you know if there is a way to emmit the output from
DataHubValidationAction
to a file? It will help me understand what ios being sent. The URNs I have in DH are:
urn:li:dataset:(urn:li:dataPlatform:snowflake,RAW.ANALYTICS.STG_CUSTOMERS,PROD)
When I change to lower case and have URNs like
urn:li:dataset:(urn:li:dataPlatform:snowflake,raw.analytics.stg_customers,PROD)
, I see the GE results in. I tried changing
requires_name_normalize = True
in
snowdialect.py
, but that seems to only change the field names.
This seems more an issue with SQLAlchemy than GE (or for GE to provide that option, similar to the DH Snowflake plugin)
h
Do you know if there is a way to emmit the output from
DataHubValidationAction
to a file?
Its not possible at the moment. However if you are using python script to run the checkpoint, I can help with how to enable debug logs containing emitted output. From your observations, it looks like
urn:li:dataset:(urn:li:dataPlatform:snowflake,raw.analytics.stg_customers,PROD)
is the urn constructed by
DatahubValidationAction
. I am curious what was the urn was when you changed
requires_name_normalize = False
in
snowdialect.py
, as in
DataHubValidationAction
, when generating urn, only database name is explicitly converted to lowecase whereas schema and table name are read from GE batch spec, as is.
r
@hundreds-photographer-13496 without being able to see the output to a file I can’t really tell. The checkpoint executes properly and there is not much of debugging options as I can see.
h
@ripe-apple-36185 if you can run checkpoint using below python script, you'll be able to see debug logs of what is emitted to datahub.
Copy code
import logging
import great_expectations as ge

datahub_logger = logging.getLogger("datahub")
datahub_logger.setLevel(logging.DEBUG)
datahub_logger.addHandler(logging.StreamHandler())

context = ge.get_context()
context.run_checkpoint(checkpoint_name="<name of checkpoint>")
I have created this PR to be able to display debug logs just by setting environment variable DATAHUB_DEBUG=True. so in future, we should be able to do this without writing code 🙂
r
Thanks @hundreds-photographer-13496, I will try this and share the results.
Changing it does not make a change, this is the urn used:
urn:li:dataset:(urn:li:dataPlatform:snowflake,raw.analytics.stg_customers,PROD)
h
then you are right. This seems more to do with behavioral conflict between SQLAlchemy and snowflake, rather than GE or DH. https://github.com/snowflakedb/snowflake-sqlalchemy#object-name-case-handling There doesn't seem to be much we can do about it. If it works for you you can fallback to using lowercase urns for snowflake. The display name for datasets/schema/database is in exactly same case as available in snowflake so it could be a good option for you.
Let me know if there any sources that emit snowflake urns in non-lowercase and that is the concern.