HELP ! Hi all, Trying to use Great expectation for...
# troubleshoot
s
HELP ! Hi all, Trying to use Great expectation for Data Validation. The checkpoint runs, but the Validations are not getting displayed in Datahub. Added this in checkpoint configuration :
Copy code
- name: datahub_action
    action:
      module_name: datahub.integrations.great_expectations.action
      class_name: DataHubValidationAction
      server_url: <http://localhost:8080> #datahub server url
Getting this message when checkpoint runs :
l
@hundreds-photographer-13496 ^
h
@swift-breakfast-25077 Can you update to latest version of datahub GE action ?
pip install 'acryl-datahub[great-expectations]'==0.8.36
With that, we can get detailed debug logs after setting DATAHUB_DEBUG env var to True as mentioned in this doc - https://datahubproject.io/docs/metadata-ingestion/integration_docs/great-expectations/#debugging
On a separate note, do you have the respective datasets ingested in DataHub already ?
s
Yes of course i have, il will try to update datahub GE
@hundreds-photographer-13496 where can i add the DATAHUB_DEBUG var ? when i add it in the checkpoint l get this error
h
Have you tried using
export DATAHUB_DEBUG=True
in same terminal where you run great expectations checkpoint
s
yes, but where i can view the logs ?
hmmmm 1
h
aah , I was expecting logs to show up along side GE logs. Looks like something is off.
Meanwhile we can debug some other way. Do you have access to datahub-gms logs at the time when validation runs in GE ?
Do you see any logs like INGEST PROPOSAL proposal: {aspectName=assertionRunEvent, entityUrn=urnliassertion:xxxxx, entityType=assertion
s
@hundreds-photographer-13496 yes i see
h
Can you run datahub cli command against the assertion urn you see and share the results
datahub get --urn "urn:li:assertion:xxxxx"
s
i got this :
h
hmm, if you use dataset urn instead or assertion urn in above command, do you get non-empty output ?
oh, unless you are running datahub on local machine you would need to run
datahub init
to point cli to datahub server
s
do i overwrite ?
h
yes
s
my datahub access token ??
h
how are you running datahub ? using datahub docker quickstart ? ?
s
Yes
h
hmm, then just press enter, no need to set its value. Strange that its not working datahub init.
Now , does this return result datahub get --urn "dataset urn" ? Replace with your dataset's urn
s
h
ah, strange.
btw, I figured out how to show Datahub action logs when running GE checkpoint. Can you see if it works for you. This is a python script.
Copy code
import great_expectations as ge
import logging
logger = logging.getLogger("datahub.integrations.great_expectations.action")
handler = logging.StreamHandler()
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)
context = ge.get_context()
context.run_checkpoint("your checkpoint name")
❤️ 1
s
i will try it
After triyng
Copy code
{
  "run_id": {
    "run_name": "20220606-130715-my-run-name-template",
    "run_time": "2022-06-06T13:07:15.017662+00:00"
  },
  "run_results": {
    "ValidationResultIdentifier::code_wilayas/20220606-130715-my-run-name-template/20220606T130715.017662Z/bbd03d0b1e64d9da02f096a8efc90f6d": {
      "validation_result": {
        "results": [
          {
            "exception_info": {
              "raised_exception": false,
              "exception_traceback": null,
              "exception_message": null
            },
            "result": {
              "element_count": 48,
              "unexpected_count": 0,
              "unexpected_percent": 0,
              "partial_unexpected_list": [],
              "missing_count": 0,
              "missing_percent": 0,
              "unexpected_percent_total": 0,
              "unexpected_percent_nonmissing": 0,
              "partial_unexpected_index_list": null,
              "partial_unexpected_counts": []
            },
            "meta": {},
            "expectation_config": {
              "meta": {},
              "kwargs": {
                "column": "Wilaya_NK",
                "max_value": 48,
                "min_value": 1,
                "batch_id": "bbd03d0b1e64d9da02f096a8efc90f6d"
              },
              "expectation_type": "expect_column_values_to_be_between"
            },
            "success": true
          }
        ],
        "meta": {
          "great_expectations_version": "0.15.2",
          "expectation_suite_name": "code_wilayas",
          "run_id": {
            "run_name": "20220606-130715-my-run-name-template",
            "run_time": "2022-06-06T13:07:15.017662+00:00"
          },
          "batch_spec": {
            "data_asset_name": "public.D_Wilaya",
            "table_name": "D_Wilaya",
            "batch_identifiers": {},
            "schema_name": "public",
            "type": "table"
          },
          "batch_markers": {
            "ge_load_time": "20220606T120715.048208Z"
          },
          "active_batch_definition": {
            "datasource_name": "DWH",
            "data_connector_name": "default_inferred_data_connector_name",
            "data_asset_name": "public.D_Wilaya",
            "batch_identifiers": {}
          },
          "validation_time": "20220606T120715.267537Z"
        },
        "statistics": {
          "evaluated_expectations": 1,
          "successful_expectations": 1,
          "unsuccessful_expectations": 0,
          "success_percent": 100
        },
        "evaluation_parameters": {},
        "success": true
      },
      "actions_results": {
        "store_validation_result": {
          "class": "StoreValidationResultAction"
        },
        "store_evaluation_params": {
          "class": "StoreEvaluationParametersAction"
        },
        "update_data_docs": {
          "local_site": "<file://C>:\\Users\\user\\Desktop\\PFE\\Great\\great_expectations\\uncommitted/data_docs/local_site/validations\\code_wilayas\\20220606-130715-my-run-name-template\\20220606T130715.017662Z\\bbd03d0b1e64d9da02f096a8efc90f6d.html",
          "class": "UpdateDataDocsAction"
        },
        "datahub_action": {
          "datahub_notification_result": "DataHub notification succeeded",
          "class": "DataHubValidationAction"
        }
      }
    }
  },
  "checkpoint_config": {
    "slack_webhook": null,
    "action_list": [
      {
        "name": "store_validation_result",
        "action": {
          "class_name": "StoreValidationResultAction"
        }
      },
      {
        "name": "store_evaluation_params",
        "action": {
          "class_name": "StoreEvaluationParametersAction"
        }
      },
      {
        "name": "update_data_docs",
        "action": {
          "class_name": "UpdateDataDocsAction",
          "site_names": []
        }
      },
      {
        "name": "datahub_action",
        "action": {
          "module_name": "datahub.integrations.great_expectations.action",
          "class_name": "DataHubValidationAction",
          "server_url": "<http://localhost:8080>"
        }
      }
    ],
    "expectation_suite_ge_cloud_id": null,
    "module_name": "great_expectations.checkpoint",
    "runtime_configuration": {},
    "run_name_template": "%Y%m%d-%H%M%S-my-run-name-template",
    "notify_on": null,
    "class_name": "Checkpoint",
    "profilers": [],
    "name": "my_checkpoint",
    "evaluation_parameters": {},
    "site_names": null,
    "validations": [
      {
        "batch_request": {
          "datasource_name": "DWH",
          "data_connector_name": "default_inferred_data_connector_name",
          "data_asset_name": "public.D_Wilaya",
          "data_connector_query": {
            "index": -1
          }
        },
        "expectation_suite_name": "code_wilayas"
      }
    ],
    "batch_request": {},
    "expectation_suite_name": null,
    "config_version": 1,
    "template_name": null,
    "ge_cloud_id": null,
    "notify_with": null
  },
  "success": true
}
h
Looks like dataset urns have different database name. DWH in dataset ingested in datahub vs DW from dataset urn in great expectations integration. What is the actual name of postgres database ?
s
the actuel name is DW, do i change it to DWH ?
when i change it to DWH its works 😀 thank youuuu so much for your help
r
Hey there! 👋 Make sure your message includes the following information if relevant, so we can help more effectively! 1. Which DataHub version are you using? (e.g. 0.12.0) 2. Please post any relevant error logs on the thread!