https://datahubproject.io logo
Join Slack
Powered by
# design-data-quality
  • m

    mammoth-bear-12532

    04/23/2021, 7:50 AM
    set the channel description: Design Channel to discuss all things Data Quality
  • b

    bored-finland-65840

    06/29/2021, 2:16 PM
    @bored-finland-65840 has left the channel
  • m

    miniature-ram-76637

    07/08/2021, 12:21 PM
    heya, this channel feels remarkably empty given the subject matter. has any movement been made on this? willing to get involved with whoever is taking this on
  • b

    big-carpet-38439

    07/08/2021, 1:41 PM
    Heyo ! We are currently starting this initiative by working on modeling and ingesting rich data profiles from BigQuery 🙂 This is currently being driven by @gray-shoe-75895
    m
    g
    • 3
    • 4
  • m

    mammoth-bear-12532

    07/13/2021, 3:38 AM
    <!channel> we are starting to poll the community on how they would expect to integrate their great expectations (suite/deployment) with DataHub. Please let us know who all would like to participate and give us input into the design of this important integration between the two projects!
    f
    c
    +4
    • 7
    • 17
  • m

    mammoth-bear-12532

    07/23/2021, 7:30 AM
    @miniature-ram-76637 hope you are attending this townhall!
    🙌 1
    m
    • 2
    • 1
  • h

    handsome-belgium-11927

    09/17/2021, 11:04 AM
    Hello, everyone! Any ideas on how to show dataset data actuality (or some data quality information) in the current UI? I guess if we add this to properties it will start creating new version of the dataset after each ingestion and this is not we want for sure. Profiling tab is well designed for this, but the fields there are not editable. May be it is possible to add custom fields into profiling to show values that we want? May be I should have posted it in the UI section 🤔
    m
    b
    s
    • 4
    • 7
  • m

    mammoth-sugar-1353

    09/20/2021, 8:07 AM
    Hey all, I was wondering what the intersection was between GE and DQ's analysers? We've started using DQ, and I think that building the UI around the
    Completeness
    ,
    Maximum
    etc terms would be really useful. It would give a simple way for quality measures to be compared between datasets. For it to work, they would need to line up with GE's equivalent. Full list from DQ here >>>
    l
    • 2
    • 4
  • f

    fancy-fireman-15263

    11/17/2021, 12:14 PM
    Have we got something on the roadmap related to visualising GCP Composer (Airflow) metrics? Big fan of the calendar view:
    b
    p
    • 3
    • 9
  • s

    salmon-rose-54694

    02/10/2022, 12:10 PM
    Is there a guide on how to integrate deequ into datahub?
    l
    b
    • 3
    • 5
  • g

    gifted-bird-57147

    05/17/2022, 5:45 PM
    Hi, I'm playing around with GE and Datahub. And I noticed that even though my latest test are successful the overall 'checkmark' is still a red cross and not a green check. Is this intended behavior? I would expect the green checkmark to appear if the latest testsuite is successful
    l
    b
    • 3
    • 4
  • b

    broad-article-1339

    10/18/2022, 4:46 PM
    Hi everyone, I have a question about how dbt tests are surfaced in the validation tab. A dbt test can succeed, fail or warn. Does datahub show the
    warn
    type?
    m
    • 2
    • 5
  • l

    little-lunch-35136

    01/11/2023, 6:23 PM
    Hello, everyone, looking for sample Assertion yaml file for cli ingest, we would like to do similar thing as lineage with this sample file in repo. Thanks, -Ning
    h
    • 2
    • 3
  • l

    little-lunch-35136

    01/26/2023, 6:05 AM
    Hi, All, not sure this has happened to anyone, we are running airflow, gx on snowflake tables. Following gx doc for snowflake connections string as
    Copy code
    snowflake://<USER_NAME>:<PASSWORD>@<ACCOUNT_NAME>/<DATABASE_NAME>/<SCHEMA_NAME>?warehouse=<WAREHOUSE_NAME>&role=<ROLE_NAME>&application=great_expectations_oss
    CONNECTION_STRING = f"snowflake://{sfUser}:{sfPswd}@{sfAccount}/DEV_ODS_DB/CBS_ODS?warehouse={wh}&role={role}&application=great_expectations_oss"
    dbname is DEV_ODS_DB and schema is CBS_ODS. Everything runs and gx datahub action successful, but NO assertion attached to table, further investigating seems the URN which gx sending to datahub is following
    Copy code
    urn:li:dataPlatform:snowflake,dev_ods_db/cbs_ods.cbs_ods.building_info,PROD)
    instead of this as it is showing in datahub
    Copy code
    urn:li:dataPlatform:snowflake,dev_ods_db.cbs_ods.building_info,PROD
    so seems datahub action is mistaking two parts DEV_ODS_DB/CBS_ODS as the database name. Is this bug or some config I missed? thanks. Slack Conversation
    ✅ 1
  • g

    gorgeous-tent-62316

    02/07/2023, 5:03 PM
    Hi All, We are looking for data quality, that is, is there any way to enforce that a dataset has at least one owner? Or a certain type of datasets have a schema? Thanks
    a
    m
    b
    • 4
    • 3
  • b

    best-umbrella-88325

    02/20/2023, 3:59 PM
    Hello there community! Slight question here. I've been trying to create assertions on various datasets as per this example https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/data_quality_mcpw_rest.py. However, when I run the same file against two datasets, the assertions get removed from the first dataset and are only visible on the second one. Looks like the results get overwritten. Is there any workaround to this? I wish to apply the same set of assertions on multiple datasets.. Thanks in advance.
    l
    h
    • 3
    • 6
  • b

    bored-truck-17085

    03/08/2023, 2:24 PM
    Does someone know if this
    warn
    sync is on the Datahub datahub roadmap? Currently all dbt validations with
    warn
    status, are showing as a
    fail
    .
  • r

    red-florist-94889

    09/26/2023, 2:40 PM
    @dazzling-judge-80093 I have my expectations done using pyspark GE lib and store it as json doc/html doc. How I reuse that result for enabling validation tab in datahub?
  • l

    limited-motherboard-51317

    01/10/2024, 3:13 PM
    Hi All! I'm currently study with the team. If it's feasible and how. To check if metamodel of dataset in DataHub is the same as schema in related parquet file. Does anybody have same experience?