breezy-portugal-43538
03/24/2022, 9:27 AMpip install 'acryl-datahub[great-expectations]'
When running the checkpoint yml file there is an error prompted with missing module:
FileNotFoundError: No module named "datahub.integrations.great_expectations.action" could be found in the repository. Please make sure that the file, corresponding to this package and module, exists and that dynamic loading of code modules, templates, and assets is supported in your execution environment. This error is unrecoverable.
When I ran my IDE I see that during the import the integrations
module is not present, is it some bug occurring ubuntu?
Could you help to resolve the issue?
I am posting pictures below from the windows and ubuntu, if any more information would be required please let me know.hundreds-photographer-13496
03/24/2022, 9:33 AMimport datahub
datahub.nice_version_name()
integrations was added in v0.8.28breezy-portugal-43538
03/24/2022, 9:36 AMhundreds-photographer-13496
03/24/2022, 9:45 AMbreezy-portugal-43538
03/24/2022, 10:01 AMhundreds-photographer-13496
03/24/2022, 10:37 AMfrom datahub.integrations.great_expectations.action import DataHubValidationAction
breezy-portugal-43538
03/24/2022, 12:28 PMCollecting acryl-datahub[great-expectations]
Downloading acryl_datahub-0.8.31.1-py3-none-any.whl (693 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 693.9/693.9 KB 536.7 kB/s eta 0:00:00
Downloading acryl_datahub-0.8.31-py3-none-any.whl (680 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 680.5/680.5 KB 629.0 kB/s eta 0:00:00
Downloading acryl_datahub-0.8.30.0-py3-none-any.whl (680 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 680.5/680.5 KB 570.6 kB/s eta 0:00:00
Downloading acryl_datahub-0.8.29.2-py3-none-any.whl (678 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 678.1/678.1 KB 480.7 kB/s eta 0:00:00
Downloading acryl_datahub-0.8.29-py3-none-any.whl (667 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 667.7/667.7 KB 440.9 kB/s eta 0:00:00
Downloading acryl_datahub-0.8.28.1-py3-none-any.whl (665 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 665.7/665.7 KB 512.8 kB/s eta 0:00:00
Downloading acryl_datahub-0.8.28.0-py3-none-any.whl (664 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.9/664.9 KB 605.3 kB/s eta 0:00:00
Downloading acryl_datahub-0.8.27.2-py3-none-any.whl (652 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 652.1/652.1 KB 601.9 kB/s eta 0:00:00
WARNING: acryl-datahub 0.8.27.2 does not provide the extra 'great-expectations'
After specifying in requirements.txt
the version for of datahub acryl-datahub[great-expectations]>=0.8.31.0
then the prompted error was much more self explanatory:
Collecting acryl-datahub[great-expectations]>=0.8.31.0
Downloading acryl_datahub-0.8.31.1-py3-none-any.whl (693 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 693.9/693.9 KB 438.7 kB/s eta 0:00:00
Downloading acryl_datahub-0.8.31-py3-none-any.whl (680 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 680.5/680.5 KB 583.6 kB/s eta 0:00:00
INFO: pip is looking at multiple versions of sqlalchemy to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of great-expectations to determine which version is compatible with other requirements. This could take a while.
[...]
ERROR: Cannot install SQLAlchemy>=1.4.32, acryl-datahub[great-expectations]==0.8.31, acryl-datahub[great-expectations]==0.8.31.1 and acryl-datahub[great-expectations]==0.8.31.2 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit <https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts>
The conflict is caused by:
The user requested SQLAlchemy>=1.4.32
acryl-datahub[great-expectations] 0.8.31.2 depends on sqlalchemy==1.3.24; extra == "great-expectations"
The user requested SQLAlchemy>=1.4.32
acryl-datahub[great-expectations] 0.8.31.1 depends on sqlalchemy==1.3.24; extra == "great-expectations"
The user requested SQLAlchemy>=1.4.32
acryl-datahub[great-expectations] 0.8.31 depends on sqlalchemy==1.3.24; extra == "great-expectations"
Long story short - there is a conflict with the packages during the installation.
After changing the requirement from SQLAlchemy>=1.4.32
to SQLAlchemy==1.3.24
issue with pip installation was gone.
Although the previous error had disappeared the new one has occurred during the checkpoint run:
File "/usr/local/lib/python3.8/site-packages/great_expectations/util.py", line 360, in load_class
raise PluginClassNotFoundError(module_name=module_name, class_name=class_name)
great_expectations.exceptions.exceptions.PluginClassNotFoundError: The module: `datahub.integrations.great_expectations.action` does not contain the class: `DatahubValidationAction`.
- Please verify that the class named `DatahubValidationAction` exists.
After getting into my docker image, I was able to successfully run the import line that you had provided, so at this point I am unsure why is it happening (please see screenshot for reference).hundreds-photographer-13496
03/24/2022, 1:09 PMData*H*ubValidationAction
and not DatahubValidationAction
. checkpoin action's class name needs this fix.breezy-portugal-43538
03/24/2022, 2:15 PMCalculating Metrics: 100%|██████████| 86/86 [00:00<00:00, 323.28it/s]
WARNING: DataHubValidationAction does not recognize this GE data asset type - <class 'great_expectations.validator.validator.Validator'>. This is either using v2-api or execution engine other than sqlalchemy.
Calculating Metrics: 100%|██████████| 174/174 [00:00<00:00, 334.97it/s]
WARNING: DataHubValidationAction does not recognize this GE data asset type - <class 'great_expectations.validator.validator.Validator'>. This is either using v2-api or execution engine other than sqlalchemy.
ERROR: S3 error: 404 (NoSuchBucket): The specified bucket does not exist
ERROR: S3 error: 404 (NoSuchBucket): The specified bucket does not exist
breezy-portugal-43538
03/25/2022, 8:25 AMhundreds-photographer-13496
03/25/2022, 8:34 AMDatahubValidationAction
does not do anything with S3. Are you sure, these logs are not from GE side ? You can confirm by temporarily removing DatahubValidationAction and confirming if these logs still exist.
ERROR: S3 error: 404 (NoSuchBucket): The specified bucket does not exist
Regarding below warning log
WARNING: DataHubValidationAction does not recognize this GE data asset type - <class 'great_expectations.validator.validator.Validator'>. This is either using v2-api or execution engine other than sqlalchemy.
It's self explanatory. Validation metadata doesn;t get reported to datahub in this case.breezy-portugal-43538
03/25/2022, 2:47 PM