Hello guys. I want to activate 'validation' button...
# troubleshoot
a
Hello guys. I want to activate 'validation' button which is closed at first when datahub quickstart runs. I was following guides for great_expectations. I've finished to install acryl-datahub[great-expectations]. However, I can't find the yaml file for great_expectations/checkpoint directory. What can I do to solve this? I attached the figure which captures my situation.
m
As the validation is specific to each dataset, you have to create 3 files: datasource, suite and checkpoint. • Datasource file: File in which you will have to specify all the information needed for GE to be able to connect to the table you want to make the validation tests for. • Suite file: File in which you will indicate all the tests to be done in the table of the source specified in the previous file. • Checkpoint File: It will be the file you will have to execute to be able to see the tests in datahub. In this file you will have to specify a few options which will make the connection to Datahub possible.
Copy code
action_list:
	- name: store_validation_result
		action:
			class_name: StoreValidationResultAction
	- name: store_evaluation_params
		action:
			class_name:StoreEvaluationParametersAction
	- name: update_data_docs
		action:	
			class_name: UpdateDataDocsAction
			site_names: []
	- name: datahub_action
		action:
			module_name: datahub.integrations.great_expectations.action
			class_name: DatahubValidationAction
			server_url: <http://datahub-gms:8080>
a
@microscopic-mechanic-13766 You mean, I should make three files in the checkpoint directory, right?
m
No, each file would be stored in a different directory. To be able to "enable" the validation tab you have to create the 3 files mentioned above, not just the checkpoint (as the checkpoint file just takes info from the other 2 files)
a
@microscopic-mechanic-13766 After receiving your answer, I've figured out that 'great_expectations init' command makes the essentials automatically. As you instructed, I made my own 'suite.json' , 'checkpoint.yaml' file and added information about datasource to 'great_expectation.yml' . But the problem is : json file about suite causes 'unknown field' error. The format of dictionary is identical to that of test_suite.json which is located in the datahub github. I'm now trying to solve this.... Have you ever encountered this kind of error when you tried to do this? For your information, I attached two figures. First one is about my suite.json file and the second is about error msg. 'id' is the column name of my private databse I connected.
m
great_expectations init
as far as I know just creates the needed directory structure for Great Expectations to work correctly. Creating the suite of expectations can sometimes be a nightmare. As I guess you are not very familiar with it (as it is also my case), I would recommend to follow this steps: •
great_expectations suite new --no-jupyter
• Select option
3- Automatically, using a profiler
• Indicate file name •
jupyter notebook /great_expectations/uncommitted/edit_<file_name> --allow-root --ip 0.0.0.0
I would also recommend you to create the files via jupyter notebook, as it gives out more info, is easier,... (The last command would be to be able to access jupyter in your browser and to use Jupyter's UI to create mentioned files)