while sending great expectations results to the da...
# troubleshoot
c
while sending great expectations results to the datahub, getting error like no dataset found
h
Yes, as the logs say - Only datasources using sqlalchemy execution engine are supported with GE DatahubValidationAction, as documented here - https://datahubproject.io/docs/metadata-ingestion/integration_docs/great-expectations/#limitations. Probably you are using some other execution engine - pandas or spark ? There is an open feature request for the supporting these - https://feature-requests.datahubproject.io/p/great-expectations-support-different-execution-engines. Please consider upvoting the same if you are interested in this feature
c
though this feature is upvoted, it might take sometime to come to production, I need any overcoming pipelines can be done on this???
h
It is possible to emit data quality checks using python script, as shown in this example - https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/data_quality_mcpw_rest.py Does this work for you ?
c
@hundreds-photographer-13496 Whether it uses great Expectations??
h
no.
c
then what is uses??
what about getting expectation result and ingesting it later to the datahub
does that works @hundreds-photographer-13496???
h
There is an active python development needed to get this to work and does not work out of the box for datasources connecting to csv via spark/pandas execution engine.
c
ok
then what you suggest for me to have data observability on my csv dataset @hundreds-photographer-13496???
h
You can enable profiling in your source (let's says s3 data lake if you are ingesting csv files from s3/local file system) - and get dataset statistics as showm here. https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:snowflake,lon[…]ompanions.adoption.pets,PROD)/Stats?is_lineage_mode=false Integrating assertions/data quality checks for csv datasets is not supported so it will not work, unless you want to implement it yourself. If you are familiar with python and great-expectations apis and willing to write python code to support ingesting GE assertions for your csv datasources yourself, you can refer to this guide https://datahubproject.io/docs/metadata-ingestion/developing/ and discuss in #contribute channel for any help required. (link to DatahubValidationAction class).