jolly-tent-99362
10/27/2022, 4:50 AMcomplete_json = {
"source": {
"type": "bigquery",
"config": {
"project_id": "",
"credential": cred_json,
"include_views": "true",
"include_tables": "true",
"include_table_lineage": "true",
"upstream_lineage_in_report": "true",
"schema_pattern": {
"ignoreCase": "true",
"allow": ["^webengage_mum$"]
},
"table_pattern": {
"ignoreCase": "true",
"deny": ["^.*\.temp_.*"]
},
"profile_pattern": {
"allow": ["^.*\.application.*"]
},
"stateful_ingestion": {
"enabled": "true",
"remove_stale_metadata": "true",
"state_provider": {
"type": "datahub",
"config": {
"datahub_api": {
"server": datahub_gms_url,
"token": datahub_gms_token
}
}
}
},
"profiling": {
"enabled": "true",
"bigquery_temp_table_schema": ".datahub",
"turn_off_expensive_profiling_metrics": "true",
"query_combiner_enabled": "false",
"max_number_of_fields_to_profile": 1000,
"profile_table_level_only": "true",
"include_field_null_count": "true",
"include_field_min_value": "true",
"include_field_max_value": "true",
"include_field_mean_value": "true",
"include_field_median_value": "true",
"include_field_stddev_value": "true",
"include_field_quantiles": "true",
"include_field_distinct_value_frequencies": "true",
"include_field_histogram": "true",
"include_field_sample_values": "true"
}
},
},
"pipeline_name": "biquery_profiling_tables",
"sink": {
"type": "datahub-kafka",
"config": {
"connection": {
"bootstrap": bootstrap_url,
"schema_registry_url": schema_registry_url,
},
},
},
}
The job is running for sometime and then failing with following error:
[2022-10-26, 05:26:34 UTC] {ge_data_profiler.py:918} ERROR - Encountered exception while profiling <dataset>.<tableName>
Traceback (most recent call last):
File "/opt/python3.8/lib/python3.8/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 892, in _generate_single_profile
batch = self._get_ge_dataset(
File "/opt/python3.8/lib/python3.8/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 951, in _get_ge_dataset
batch = ge_context.data_context.get_batch(
File "/opt/python3.8/lib/python3.8/site-packages/great_expectations/data_context/data_context/base_data_context.py", line 1642, in get_batch
return self._get_batch_v2(
File "/opt/python3.8/lib/python3.8/site-packages/great_expectations/data_context/data_context/base_data_context.py", line 1336, in _get_batch_v2
datasource = self.get_datasource(batch_kwargs.get("datasource"))
File "/opt/python3.8/lib/python3.8/site-packages/great_expectations/data_context/data_context/base_data_context.py", line 2062, in get_datasource
raise ValueError(
ValueError: Unable to load datasource `my_sqlalchemy_datasource-548b19eb-6db0-4fa2-8673-0e62306a3c7d` -- no configuration found or invalid configuration.
[2022-10-26, 05:26:35 UTC] {ge_data_profiler.py:773} INFO - Profiling 1 table(s) finished in 2.387 seconds
Can someone help please?jolly-tent-99362
10/27/2022, 5:01 AMastonishing-answer-96712
10/27/2022, 6:27 PMjolly-tent-99362
11/01/2022, 2:59 AMjolly-tent-99362
11/01/2022, 11:01 AMastonishing-answer-96712
11/01/2022, 3:58 PMgray-shoe-75895
11/01/2022, 9:18 PMpip freeze in your airflow environment and let me know what it outputs?jolly-tent-99362
11/02/2022, 4:35 AMjolly-tent-99362
11/02/2022, 4:36 AMgray-shoe-75895
11/02/2022, 10:00 PMpip install process somewhere which will contain the version numbers. I haven’t used google cloud composer myself so I don’t know exactly what it’s called, but I know that other people have been able to provide them before