HELP Hi all Trying to use Great expectation for Data Validat DataHub #troubleshoot

HELP ! Hi all, Trying to use Great expectation for...

swift-breakfast-25077

06/05/2022, 5:12 PM

HELP ! Hi all, Trying to use Great expectation for Data Validation. The checkpoint runs, but the Validations are not getting displayed in Datahub. Added this in checkpoint configuration :

Copy code

- name: datahub_action
    action:
      module_name: datahub.integrations.great_expectations.action
      class_name: DataHubValidationAction
      server_url: <http://localhost:8080> #datahub server url

Getting this message when checkpoint runs :

loud-island-88694

06/05/2022, 6:22 PM

@hundreds-photographer-13496 ^

hundreds-photographer-13496

06/06/2022, 6:41 AM

@swift-breakfast-25077 Can you update to latest version of datahub GE action ?

pip install 'acryl-datahub[great-expectations]'==0.8.36

With that, we can get detailed debug logs after setting DATAHUB_DEBUG env var to True as mentioned in this doc - https://datahubproject.io/docs/metadata-ingestion/integration_docs/great-expectations/#debugging

hundreds-photographer-13496

06/06/2022, 7:53 AM

On a separate note, do you have the respective datasets ingested in DataHub already ?

swift-breakfast-25077

06/06/2022, 7:58 AM

Yes of course i have, il will try to update datahub GE

swift-breakfast-25077

06/06/2022, 10:02 AM

@hundreds-photographer-13496 where can i add the DATAHUB_DEBUG var ? when i add it in the checkpoint l get this error

hundreds-photographer-13496

06/06/2022, 11:10 AM

Have you tried using

export DATAHUB_DEBUG=True

in same terminal where you run great expectations checkpoint

swift-breakfast-25077

06/06/2022, 11:14 AM

swift-breakfast-25077

06/06/2022, 11:15 AM

yes, but where i can view the logs ?

hmmmm 1

hundreds-photographer-13496

06/06/2022, 11:17 AM

aah , I was expecting logs to show up along side GE logs. Looks like something is off.

hundreds-photographer-13496

06/06/2022, 11:18 AM

Meanwhile we can debug some other way. Do you have access to datahub-gms logs at the time when validation runs in GE ?

hundreds-photographer-13496

06/06/2022, 11:18 AM

Do you see any logs like INGEST PROPOSAL proposal: {aspectName=assertionRunEvent, entityUrn=urnliassertion:xxxxx, entityType=assertion

swift-breakfast-25077

06/06/2022, 11:30 AM

@hundreds-photographer-13496 yes i see

hundreds-photographer-13496

06/06/2022, 11:31 AM

Can you run datahub cli command against the assertion urn you see and share the results

datahub get --urn "urn:li:assertion:xxxxx"

swift-breakfast-25077

06/06/2022, 11:35 AM

i got this :

hundreds-photographer-13496

06/06/2022, 11:37 AM

hmm, if you use dataset urn instead or assertion urn in above command, do you get non-empty output ?

hundreds-photographer-13496

06/06/2022, 11:40 AM

oh, unless you are running datahub on local machine you would need to run

datahub init

to point cli to datahub server

swift-breakfast-25077

06/06/2022, 11:45 AM

do i overwrite ?

hundreds-photographer-13496

06/06/2022, 11:46 AM

yes

swift-breakfast-25077

06/06/2022, 11:48 AM

my datahub access token ??

hundreds-photographer-13496

06/06/2022, 11:49 AM

how are you running datahub ? using datahub docker quickstart ? ?

swift-breakfast-25077

06/06/2022, 11:50 AM

Yes

hundreds-photographer-13496

06/06/2022, 11:51 AM

hmm, then just press enter, no need to set its value. Strange that its not working datahub init.

hundreds-photographer-13496

06/06/2022, 11:51 AM

Now , does this return result datahub get --urn "dataset urn" ? Replace with your dataset's urn

swift-breakfast-25077

06/06/2022, 11:53 AM

hundreds-photographer-13496

06/06/2022, 11:53 AM

ah, strange.

hundreds-photographer-13496

06/06/2022, 11:55 AM

btw, I figured out how to show Datahub action logs when running GE checkpoint. Can you see if it works for you. This is a python script.

Copy code

import great_expectations as ge
import logging
logger = logging.getLogger("datahub.integrations.great_expectations.action")
handler = logging.StreamHandler()
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)
context = ge.get_context()
context.run_checkpoint("your checkpoint name")

❤️ 1

swift-breakfast-25077

06/06/2022, 11:56 AM

i will try it

swift-breakfast-25077

06/06/2022, 12:19 PM

After triyng

Copy code

{
  "run_id": {
    "run_name": "20220606-130715-my-run-name-template",
    "run_time": "2022-06-06T13:07:15.017662+00:00"
  },
  "run_results": {
    "ValidationResultIdentifier::code_wilayas/20220606-130715-my-run-name-template/20220606T130715.017662Z/bbd03d0b1e64d9da02f096a8efc90f6d": {
      "validation_result": {
        "results": [
          {
            "exception_info": {
              "raised_exception": false,
              "exception_traceback": null,
              "exception_message": null
            },
            "result": {
              "element_count": 48,
              "unexpected_count": 0,
              "unexpected_percent": 0,
              "partial_unexpected_list": [],
              "missing_count": 0,
              "missing_percent": 0,
              "unexpected_percent_total": 0,
              "unexpected_percent_nonmissing": 0,
              "partial_unexpected_index_list": null,
              "partial_unexpected_counts": []
            },
            "meta": {},
            "expectation_config": {
              "meta": {},
              "kwargs": {
                "column": "Wilaya_NK",
                "max_value": 48,
                "min_value": 1,
                "batch_id": "bbd03d0b1e64d9da02f096a8efc90f6d"
              },
              "expectation_type": "expect_column_values_to_be_between"
            },
            "success": true
          }
        ],
        "meta": {
          "great_expectations_version": "0.15.2",
          "expectation_suite_name": "code_wilayas",
          "run_id": {
            "run_name": "20220606-130715-my-run-name-template",
            "run_time": "2022-06-06T13:07:15.017662+00:00"
          },
          "batch_spec": {
            "data_asset_name": "public.D_Wilaya",
            "table_name": "D_Wilaya",
            "batch_identifiers": {},
            "schema_name": "public",
            "type": "table"
          },
          "batch_markers": {
            "ge_load_time": "20220606T120715.048208Z"
          },
          "active_batch_definition": {
            "datasource_name": "DWH",
            "data_connector_name": "default_inferred_data_connector_name",
            "data_asset_name": "public.D_Wilaya",
            "batch_identifiers": {}
          },
          "validation_time": "20220606T120715.267537Z"
        },
        "statistics": {
          "evaluated_expectations": 1,
          "successful_expectations": 1,
          "unsuccessful_expectations": 0,
          "success_percent": 100
        },
        "evaluation_parameters": {},
        "success": true
      },
      "actions_results": {
        "store_validation_result": {
          "class": "StoreValidationResultAction"
        },
        "store_evaluation_params": {
          "class": "StoreEvaluationParametersAction"
        },
        "update_data_docs": {
          "local_site": "<file://C>:\\Users\\user\\Desktop\\PFE\\Great\\great_expectations\\uncommitted/data_docs/local_site/validations\\code_wilayas\\20220606-130715-my-run-name-template\\20220606T130715.017662Z\\bbd03d0b1e64d9da02f096a8efc90f6d.html",
          "class": "UpdateDataDocsAction"
        },
        "datahub_action": {
          "datahub_notification_result": "DataHub notification succeeded",
          "class": "DataHubValidationAction"
        }
      }
    }
  },
  "checkpoint_config": {
    "slack_webhook": null,
    "action_list": [
      {
        "name": "store_validation_result",
        "action": {
          "class_name": "StoreValidationResultAction"
        }
      },
      {
        "name": "store_evaluation_params",
        "action": {
          "class_name": "StoreEvaluationParametersAction"
        }
      },
      {
        "name": "update_data_docs",
        "action": {
          "class_name": "UpdateDataDocsAction",
          "site_names": []
        }
      },
      {
        "name": "datahub_action",
        "action": {
          "module_name": "datahub.integrations.great_expectations.action",
          "class_name": "DataHubValidationAction",
          "server_url": "<http://localhost:8080>"
        }
      }
    ],
    "expectation_suite_ge_cloud_id": null,
    "module_name": "great_expectations.checkpoint",
    "runtime_configuration": {},
    "run_name_template": "%Y%m%d-%H%M%S-my-run-name-template",
    "notify_on": null,
    "class_name": "Checkpoint",
    "profilers": [],
    "name": "my_checkpoint",
    "evaluation_parameters": {},
    "site_names": null,
    "validations": [
      {
        "batch_request": {
          "datasource_name": "DWH",
          "data_connector_name": "default_inferred_data_connector_name",
          "data_asset_name": "public.D_Wilaya",
          "data_connector_query": {
            "index": -1
          }
        },
        "expectation_suite_name": "code_wilayas"
      }
    ],
    "batch_request": {},
    "expectation_suite_name": null,
    "config_version": 1,
    "template_name": null,
    "ge_cloud_id": null,
    "notify_with": null
  },
  "success": true
}

hundreds-photographer-13496

06/06/2022, 12:50 PM

Looks like dataset urns have different database name. DWH in dataset ingested in datahub vs DW from dataset urn in great expectations integration. What is the actual name of postgres database ?

swift-breakfast-25077

06/06/2022, 1:09 PM

the actuel name is DW, do i change it to DWH ?

swift-breakfast-25077

06/06/2022, 1:12 PM

when i change it to DWH its works 😀 thank youuuu so much for your help

rich-barista-93413

01/24/2024, 5:55 PM

Hey there! 👋 Make sure your message includes the following information if relevant, so we can help more effectively! 1. Which DataHub version are you using? (e.g. 0.12.0) 2. Please post any relevant error logs on the thread!

10 Views

Open in Slack

Previous Next