after updating to latest datahub `0.8.17.2`, I sud...
# troubleshoot
r
after updating to latest datahub
0.8.17.2
, I suddenly started seeing this error in Airflow DAGs. The only change in the config is to add a simple transformer of adding dataset owners.
Copy code
Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1869, in _instantiate_datasource_from_config
    ] = self._build_datasource_from_config(name=name, config=config)
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1938, in _build_datasource_from_config
    config_defaults={"module_name": module_name},
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/data_context/util.py", line 121, in instantiate_class_from_config
    class_instance = class_(**config_with_defaults)
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 64, in sqlalchemy_datasource_init
    underlying_datasource_init(self, *args, **kwargs, engine=conn)
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/datasource/sqlalchemy_datasource.py", line 217, in __init__
    name, "ModuleNotFoundError: No module named 'sqlalchemy'"
great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource my_sqlalchemy_datasource-a18b60ef-52a5-481c-a73f-769ff10a8ffe, error: ModuleNotFoundError: No module named 'sqlalchemy'

During handling of the above exception, another exception occurred:
Copy code
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1336, in _execute_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 117, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 128, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/usr/local/airflow/dags/datacatalog/utils/ingestion.py", line 65, in ingest_metadata_from_snowflake
    pipeline.run()
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/run/pipeline.py", line 141, in run
    for wu in self.source.get_workunits():
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/sql/snowflake.py", line 235, in get_workunits
    for wu in super().get_workunits():
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/sql/sql_common.py", line 361, in get_workunits
    yield from self.loop_profiler(profile_requests, profiler)
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/sql/sql_common.py", line 634, in loop_profiler
    profile_requests, self.config.profiling.max_workers
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 526, in generate_profiles
    yield async_profile.result()
  File "/usr/lib64/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/usr/lib64/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib64/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 537, in generate_profile_from_request
    **request.batch_kwargs,
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 547, in generate_profile
    with self._ge_context() as ge_context, PerfTimer() as timer:
  File "/usr/lib64/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 497, in _ge_context
    **dict(datasourceConfigSchema.dump(datasource_config)),
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/core/usage_statistics/usage_statistics.py", line 286, in usage_statistics_wrapped_method
    result = func(*args, **kwargs)
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1822, in add_datasource
    initialize=initialize,
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1846, in _instantiate_datasource_from_config_and_update_project_config
    raise e
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1840, in _instantiate_datasource_from_config_and_update_project_config
    name=name, config=config
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1872, in _instantiate_datasource_from_config
    datasource_name=name, message=str(e)
I was able to run this from the CLI though, just not from Airflow
m
@red-pizza-28006 can you check what version of great expectations you have in your airflow instance?
r
so we dont install great expectations explicitly, it just comes in as a dependency from datahub
m
We pinned great expectations to 0.13.43 since the latest one 0.13.44 exhibits this error
I suspect your airflow env is pulling in 44
r
hmm, i can confirm that i have 0.13.43 on my local airflow, but since in prod we use MWAA, i am not sure if I downgrade datahub version if it downgrades dependencies as well
m
And does your local airflow work correctly now?
r
Okay, when i am at 0.8.17.2, I get this error
Copy code
Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/api/registry.py", line 84, in _ensure_not_lazy
    plugin_class = import_path(path)
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/api/registry.py", line 32, in import_path
    item = importlib.import_module(module_name)
  File "/usr/lib64/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/sql/snowflake.py", line 20, in <module>
    from datahub.ingestion.source.sql.sql_common import (
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/sql/sql_common.py", line 141, in <module>
    class SQLAlchemyConfig(ConfigModel):
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/sql/sql_common.py", line 156, in SQLAlchemyConfig
    from datahub.ingestion.source.ge_data_profiler import GEProfilingConfig
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 26, in <module>
    from typing_extensions import Concatenate, ParamSpec
ImportError: cannot import name 'Concatenate' from 'typing_extensions' (/usr/local/lib/python3.7/site-packages/typing_extensions.py)
But when i downgrade to
0.8.16.11
, I get the error i reported earlier.
so regardless of which version I use, I am getting one of the errors above, i think some dependency had changed for earlier versions as well, where we didnt pin to older versions of great expectations?
m
Thanks for letting us know @red-pizza-28006 will reply to this thread soon
r
thank you 😊
m
@red-pizza-28006: this has been fixed in
0.8.17.3
… quite a few interesting breakages involving great-expectations.
r
perfect, let me upgrade and take a look.
fyi this worked @mammoth-bear-12532, thank you for this!
🎉 1