DataHub #design-data-quality

Join Slack

mammoth-bear-12532

04/23/2021, 7:50 AM

set the channel description: Design Channel to discuss all things Data Quality

bored-finland-65840

06/29/2021, 2:16 PM

@bored-finland-65840 has left the channel

miniature-ram-76637

07/08/2021, 12:21 PM

heya, this channel feels remarkably empty given the subject matter. has any movement been made on this? willing to get involved with whoever is taking this on

big-carpet-38439

07/08/2021, 1:41 PM

Heyo ! We are currently starting this initiative by working on modeling and ingesting rich data profiles from BigQuery 🙂 This is currently being driven by @gray-shoe-75895

mammoth-bear-12532

07/13/2021, 3:38 AM

<!channel> we are starting to poll the community on how they would expect to integrate their great expectations (suite/deployment) with DataHub. Please let us know who all would like to participate and give us input into the design of this important integration between the two projects!

mammoth-bear-12532

07/23/2021, 7:30 AM

@miniature-ram-76637 hope you are attending this townhall!

🙌 1

handsome-belgium-11927

09/17/2021, 11:04 AM

Hello, everyone! Any ideas on how to show dataset data actuality (or some data quality information) in the current UI? I guess if we add this to properties it will start creating new version of the dataset after each ingestion and this is not we want for sure. Profiling tab is well designed for this, but the fields there are not editable. May be it is possible to add custom fields into profiling to show values that we want? May be I should have posted it in the UI section 🤔

mammoth-sugar-1353

09/20/2021, 8:07 AM

Hey all, I was wondering what the intersection was between GE and DQ's analysers? We've started using DQ, and I think that building the UI around the

Completeness

Maximum

etc terms would be really useful. It would give a simple way for quality measures to be compared between datasets. For it to work, they would need to line up with GE's equivalent. Full list from DQ here >>>

fancy-fireman-15263

11/17/2021, 12:14 PM

Have we got something on the roadmap related to visualising GCP Composer (Airflow) metrics? Big fan of the calendar view:

salmon-rose-54694

02/10/2022, 12:10 PM

Is there a guide on how to integrate deequ into datahub?

gifted-bird-57147

05/17/2022, 5:45 PM

Hi, I'm playing around with GE and Datahub. And I noticed that even though my latest test are successful the overall 'checkmark' is still a red cross and not a green check. Is this intended behavior? I would expect the green checkmark to appear if the latest testsuite is successful

broad-article-1339

10/18/2022, 4:46 PM

Hi everyone, I have a question about how dbt tests are surfaced in the validation tab. A dbt test can succeed, fail or warn. Does datahub show the

warn

type?

little-lunch-35136

01/11/2023, 6:23 PM

Hello, everyone, looking for sample Assertion yaml file for cli ingest, we would like to do similar thing as lineage with this sample file in repo. Thanks, -Ning

little-lunch-35136

01/26/2023, 6:05 AM

Hi, All, not sure this has happened to anyone, we are running airflow, gx on snowflake tables. Following gx doc for snowflake connections string as

Copy code

snowflake://<USER_NAME>:<PASSWORD>@<ACCOUNT_NAME>/<DATABASE_NAME>/<SCHEMA_NAME>?warehouse=<WAREHOUSE_NAME>&role=<ROLE_NAME>&application=great_expectations_oss
CONNECTION_STRING = f"snowflake://{sfUser}:{sfPswd}@{sfAccount}/DEV_ODS_DB/CBS_ODS?warehouse={wh}&role={role}&application=great_expectations_oss"

dbname is DEV_ODS_DB and schema is CBS_ODS. Everything runs and gx datahub action successful, but NO assertion attached to table, further investigating seems the URN which gx sending to datahub is following

Copy code

urn:li:dataPlatform:snowflake,dev_ods_db/cbs_ods.cbs_ods.building_info,PROD)

instead of this as it is showing in datahub

Copy code

urn:li:dataPlatform:snowflake,dev_ods_db.cbs_ods.building_info,PROD

so seems datahub action is mistaking two parts DEV_ODS_DB/CBS_ODS as the database name. Is this bug or some config I missed? thanks. Slack Conversation

✅ 1

gorgeous-tent-62316

02/07/2023, 5:03 PM

Hi All, We are looking for data quality, that is, is there any way to enforce that a dataset has at least one owner? Or a certain type of datasets have a schema? Thanks

best-umbrella-88325

02/20/2023, 3:59 PM

Hello there community! Slight question here. I've been trying to create assertions on various datasets as per this example https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/data_quality_mcpw_rest.py. However, when I run the same file against two datasets, the assertions get removed from the first dataset and are only visible on the second one. Looks like the results get overwritten. Is there any workaround to this? I wish to apply the same set of assertions on multiple datasets.. Thanks in advance.

bored-truck-17085

03/08/2023, 2:24 PM

Does someone know if this

warn

sync is on the Datahub datahub roadmap? Currently all dbt validations with

warn

status, are showing as a

fail

red-florist-94889

09/26/2023, 2:40 PM

@dazzling-judge-80093 I have my expectations done using pyspark GE lib and store it as json doc/html doc. How I reuse that result for enabling validation tab in datahub?

limited-motherboard-51317

01/10/2024, 3:13 PM

Hi All! I'm currently study with the team. If it's feasible and how. To check if metamodel of dataset in DataHub is the same as schema in related parquet file. Does anybody have same experience?