Hello, I am trying to run the integration for the...
# troubleshoot
b
Hello, I am trying to run the integration for the datahub with great expectations and I receive really strange error. Despite following the tutorial and installing the latest version of:
pip install 'acryl-datahub[great-expectations]'
When running the checkpoint yml file there is an error prompted with missing module:
FileNotFoundError: No module named "datahub.integrations.great_expectations.action" could be found in the repository. Please make sure that the file, corresponding to this package and module, exists and that dynamic loading of code modules, templates, and assets is supported in your execution environment. This error is unrecoverable.
When I ran my IDE I see that during the import the
integrations
module is not present, is it some bug occurring ubuntu? Could you help to resolve the issue? I am posting pictures below from the windows and ubuntu, if any more information would be required please let me know.
h
I don't think its about windows/ubuntu. Its about datahub version can you compare datahub version ?
Copy code
import datahub
datahub.nice_version_name()
integrations was added in v0.8.28
b
Ubuntu: 0.8.31.2 Windows: 0.8.31.1
h
interesting ! do you get same results if you execute this in IDEs ? Sometimes IDEs use some other python environment so confirming.
b
results from IDEs match each other 🙂
hmmmm 1
h
what error do you get when you do
Copy code
from datahub.integrations.great_expectations.action import DataHubValidationAction
b
There was a minor mistake and a lack of precision when describing the problem. What I try to do is to run a docker image that will handle the great_expectations for the datahub. During the build of the docker image I saw that there is a inconsistency for the packages, which was prompted by a warning that I had ignored:
Copy code
Collecting acryl-datahub[great-expectations]
  Downloading acryl_datahub-0.8.31.1-py3-none-any.whl (693 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 693.9/693.9 KB 536.7 kB/s eta 0:00:00
  Downloading acryl_datahub-0.8.31-py3-none-any.whl (680 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 680.5/680.5 KB 629.0 kB/s eta 0:00:00
  Downloading acryl_datahub-0.8.30.0-py3-none-any.whl (680 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 680.5/680.5 KB 570.6 kB/s eta 0:00:00
  Downloading acryl_datahub-0.8.29.2-py3-none-any.whl (678 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 678.1/678.1 KB 480.7 kB/s eta 0:00:00
  Downloading acryl_datahub-0.8.29-py3-none-any.whl (667 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 667.7/667.7 KB 440.9 kB/s eta 0:00:00
  Downloading acryl_datahub-0.8.28.1-py3-none-any.whl (665 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 665.7/665.7 KB 512.8 kB/s eta 0:00:00
  Downloading acryl_datahub-0.8.28.0-py3-none-any.whl (664 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.9/664.9 KB 605.3 kB/s eta 0:00:00
  Downloading acryl_datahub-0.8.27.2-py3-none-any.whl (652 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 652.1/652.1 KB 601.9 kB/s eta 0:00:00
WARNING: acryl-datahub 0.8.27.2 does not provide the extra 'great-expectations'
After specifying in
requirements.txt
the version for of datahub
acryl-datahub[great-expectations]>=0.8.31.0
then the prompted error was much more self explanatory:
Copy code
Collecting acryl-datahub[great-expectations]>=0.8.31.0
  Downloading acryl_datahub-0.8.31.1-py3-none-any.whl (693 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 693.9/693.9 KB 438.7 kB/s eta 0:00:00
  Downloading acryl_datahub-0.8.31-py3-none-any.whl (680 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 680.5/680.5 KB 583.6 kB/s eta 0:00:00
INFO: pip is looking at multiple versions of sqlalchemy to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of great-expectations to determine which version is compatible with other requirements. This could take a while.
[...]
ERROR: Cannot install SQLAlchemy>=1.4.32, acryl-datahub[great-expectations]==0.8.31, acryl-datahub[great-expectations]==0.8.31.1 and acryl-datahub[great-expectations]==0.8.31.2 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit <https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts>

The conflict is caused by:
    The user requested SQLAlchemy>=1.4.32
    acryl-datahub[great-expectations] 0.8.31.2 depends on sqlalchemy==1.3.24; extra == "great-expectations"
    The user requested SQLAlchemy>=1.4.32
    acryl-datahub[great-expectations] 0.8.31.1 depends on sqlalchemy==1.3.24; extra == "great-expectations"
    The user requested SQLAlchemy>=1.4.32
    acryl-datahub[great-expectations] 0.8.31 depends on sqlalchemy==1.3.24; extra == "great-expectations"
Long story short - there is a conflict with the packages during the installation. After changing the requirement from
SQLAlchemy>=1.4.32
to
SQLAlchemy==1.3.24
issue with pip installation was gone. Although the previous error had disappeared the new one has occurred during the checkpoint run:
Copy code
File "/usr/local/lib/python3.8/site-packages/great_expectations/util.py", line 360, in load_class
    raise PluginClassNotFoundError(module_name=module_name, class_name=class_name)
great_expectations.exceptions.exceptions.PluginClassNotFoundError: The module: `datahub.integrations.great_expectations.action` does not contain the class: `DatahubValidationAction`.
        - Please verify that the class named `DatahubValidationAction` exists.
After getting into my docker image, I was able to successfully run the import line that you had provided, so at this point I am unsure why is it happening (please see screenshot for reference).
h
I know this! the name of class is
Data*H*ubValidationAction
and not
DatahubValidationAction
. checkpoin action's class name needs this fix.
thank you 1
b
Yes, you were correct, my mistake 🤦 Well, after adjusting the name, the error disappeared, but the new one during the actual run of checkpoint has occurred. For some reason it is looking for the S3 bucket and I am unsure why...
Copy code
Calculating Metrics: 100%|██████████| 86/86 [00:00<00:00, 323.28it/s]
WARNING: DataHubValidationAction does not recognize this GE data asset type - <class 'great_expectations.validator.validator.Validator'>.                         This is either using v2-api or execution engine other than sqlalchemy.
Calculating Metrics: 100%|██████████| 174/174 [00:00<00:00, 334.97it/s]
WARNING: DataHubValidationAction does not recognize this GE data asset type - <class 'great_expectations.validator.validator.Validator'>.                         This is either using v2-api or execution engine other than sqlalchemy.
ERROR: S3 error: 404 (NoSuchBucket): The specified bucket does not exist
ERROR: S3 error: 404 (NoSuchBucket): The specified bucket does not exist
@hundreds-photographer-13496 I am unsure about the root cause of this error, why datahub tries to retrieve data from S3? As far as I understood, this S3 might come from the defined source, but in my case none S3 bucket was specified in any config file, could You help resolving this error? 🙂
h
Strange.
DatahubValidationAction
does not do anything with S3. Are you sure, these logs are not from GE side ? You can confirm by temporarily removing DatahubValidationAction and confirming if these logs still exist.
Copy code
ERROR: S3 error: 404 (NoSuchBucket): The specified bucket does not exist
Regarding below warning log
Copy code
WARNING: DataHubValidationAction does not recognize this GE data asset type - <class 'great_expectations.validator.validator.Validator'>.                         This is either using v2-api or execution engine other than sqlalchemy.
It's self explanatory. Validation metadata doesn;t get reported to datahub in this case.
b
Hello, thank you for the reply and all the help 🙂 So I tried to get rid of the datahub action in my yml file and it has actually solved the problem - S3 error was not visible, I am unsure why this checkpoint could try to connect to S3... I will try to debug it further on my own, but if you have some thoughts please let me know