Hey folks I m very interested to see what the integration wi DataHub #getting-started

Hey folks, I'm very interested to see what the int...

icy-holiday-55016

06/08/2021, 8:41 AM

Hey folks, I'm very interested to see what the integration with DQ systems such as Great Expectations look like. Are you still expecting to be able to release it before the end of June?

loud-island-88694

06/08/2021, 3:01 PM

Hello Steven, it is likely to be in the first week of July. What are the main use cases you are targeting? i.e when do you want GE to be triggered, how do you want the test results to be captured back in datahub?

icy-holiday-55016

06/08/2021, 3:38 PM

Hi, requirements are still a little vague though I do have one-use case. This may be more of a GE question as opposed to a Datahub question, just let me know if that's the case. Use case is that we have multiple datasets but running quality rules on them in isolation isn't suitable for us. We'd like to be able to run quality rules across multiple data sets. An example would be: • we have a dataset of things with 2000 records • we have another dataset with extra information, that we want to join onto the first one for reporting purposes, but that only has 1800 records Could a rule be generated to work across multiple datasets? In the above example, it would identify a gap of 200 records. From a datahub perspective, this might mean a data quality suite is not necessarily tied to a single dataset, but potentially multiple.

loud-island-88694

06/09/2021, 9:43 PM

Sorry for the delay in respondnign @icy-holiday-55016 This might be an eventual requirement for Great Expectations but I know that multi-dataset validations aren't supported currently. I can foresee us adding support for expressing these rules and reporting the results back. The exact implementation mechanism using a DQ framework is still TBD

icy-holiday-55016

06/10/2021, 7:57 AM

No problem, thanks for the info

Open in Slack

Previous Next