https://datahubproject.io logo
Docs
Join the conversationJoin Slack
Channels
acryl-omnisend
advice-data-governance
advice-metadata-modeling
all-things-datahub-in-windows
all-things-deployment
announcements
authentication-authorization
chatter
column-level-lineage
contribute
contribute-datahub-blog
data-council-workshop-2023
datahub-soda-test
demo-slack-notifications
design-business-glossary
design-data-product-entity
design-data-quality
design-datahub-documentation
design-dataset-access-requests
design-dataset-joins
feature-requests
flyte-datahub-integration
getting-started
github-activities
help-
i18n-community-contribution
ingestion
integration-alteryx-datahub
integration-azure-datahub
integration-dagster-datahub
integration-databricks-datahub
integration-datastudio-datahub
integration-iceberg-datahub
integration-powerbi-datahub
integration-prefect-datahub
integration-protobuf
integration-tableau-datahub
integration-vertica-datahub
introduce-yourself
jobs
metadata-day22-hackathon
muti-tenant-deployment
office-hours
openapi
plugins
show-and-tell
talk-data-product-management
troubleshoot
ui
Powered by Linen
design-data-quality
  • b

    big-carpet-38439

    07/08/2021, 1:41 PM
    Heyo ! We are currently starting this initiative by working on modeling and ingesting rich data profiles from BigQuery 🙂 This is currently being driven by @gray-shoe-75895
    m
    g
    • 3
    • 4
  • m

    mammoth-bear-12532

    07/13/2021, 3:38 AM
    <!channel> we are starting to poll the community on how they would expect to integrate their great expectations (suite/deployment) with DataHub. Please let us know who all would like to participate and give us input into the design of this important integration between the two projects!
    f
    c
    +4
    • 7
    • 17
  • m

    mammoth-bear-12532

    07/23/2021, 7:30 AM
    @miniature-ram-76637 hope you are attending this townhall!
    🙌 1
    m
    • 2
    • 1
  • h

    handsome-belgium-11927

    09/17/2021, 11:04 AM
    Hello, everyone! Any ideas on how to show dataset data actuality (or some data quality information) in the current UI? I guess if we add this to properties it will start creating new version of the dataset after each ingestion and this is not we want for sure. Profiling tab is well designed for this, but the fields there are not editable. May be it is possible to add custom fields into profiling to show values that we want? May be I should have posted it in the UI section 🤔
    m
    b
    s
    • 4
    • 7
  • m

    mammoth-sugar-1353

    09/20/2021, 8:07 AM
    Hey all, I was wondering what the intersection was between GE and DQ's analysers? We've started using DQ, and I think that building the UI around the
    Completeness
    ,
    Maximum
    etc terms would be really useful. It would give a simple way for quality measures to be compared between datasets. For it to work, they would need to line up with GE's equivalent. Full list from DQ here >>>
    l
    • 2
    • 4
  • f

    fancy-fireman-15263

    11/17/2021, 12:14 PM
    Have we got something on the roadmap related to visualising GCP Composer (Airflow) metrics? Big fan of the calendar view:
    b
    p
    • 3
    • 9
  • s

    salmon-rose-54694

    02/10/2022, 12:10 PM
    Is there a guide on how to integrate deequ into datahub?
    l
    b
    • 3
    • 5
  • g

    gifted-bird-57147

    05/17/2022, 5:45 PM
    Hi, I'm playing around with GE and Datahub. And I noticed that even though my latest test are successful the overall 'checkmark' is still a red cross and not a green check. Is this intended behavior? I would expect the green checkmark to appear if the latest testsuite is successful
    l
    b
    • 3
    • 4
  • b

    broad-article-1339

    10/18/2022, 4:46 PM
    Hi everyone, I have a question about how dbt tests are surfaced in the validation tab. A dbt test can succeed, fail or warn. Does :datahub: show the
    warn
    type?
    m
    b
    • 3
    • 6
  • l

    little-lunch-35136

    01/11/2023, 6:23 PM
    Hello, everyone, looking for sample Assertion yaml file for cli ingest, we would like to do similar thing as lineage with this sample file in repo. Thanks, -Ning
    h
    • 2
    • 3
  • l

    little-lunch-35136

    01/26/2023, 6:05 AM
    Hi, All, not sure this has happened to anyone, we are running airflow, gx on snowflake tables. Following gx doc for snowflake connections string as
    snowflake://<USER_NAME>:<PASSWORD>@<ACCOUNT_NAME>/<DATABASE_NAME>/<SCHEMA_NAME>?warehouse=<WAREHOUSE_NAME>&role=<ROLE_NAME>&application=great_expectations_oss
    CONNECTION_STRING = f"snowflake://{sfUser}:{sfPswd}@{sfAccount}/DEV_ODS_DB/CBS_ODS?warehouse={wh}&role={role}&application=great_expectations_oss"
    dbname is DEV_ODS_DB and schema is CBS_ODS. Everything runs and gx datahub action successful, but NO assertion attached to table, further investigating seems the URN which gx sending to datahub is following
    urn:li:dataPlatform:snowflake,dev_ods_db/cbs_ods.cbs_ods.building_info,PROD)
    instead of this as it is showing in datahub
    urn:li:dataPlatform:snowflake,dev_ods_db.cbs_ods.building_info,PROD
    so seems datahub action is mistaking two parts DEV_ODS_DB/CBS_ODS as the database name. Is this bug or some config I missed? thanks. Posted in #ingestion
  • g

    gorgeous-tent-62316

    02/07/2023, 5:03 PM
    Hi All, We are looking for data quality, that is, is there any way to enforce that a dataset has at least one owner? Or a certain type of datasets have a schema? Thanks
    a
    m
    b
    • 4
    • 3
  • b

    best-umbrella-88325

    02/20/2023, 3:59 PM
    Hello there community! Slight question here. I've been trying to create assertions on various datasets as per this example https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/data_quality_mcpw_rest.py. However, when I run the same file against two datasets, the assertions get removed from the first dataset and are only visible on the second one. Looks like the results get overwritten. Is there any workaround to this? I wish to apply the same set of assertions on multiple datasets.. Thanks in advance.
    l
    h
    • 3
    • 6
Powered by Linen
Title
b

best-umbrella-88325

02/20/2023, 3:59 PM
Hello there community! Slight question here. I've been trying to create assertions on various datasets as per this example https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/data_quality_mcpw_rest.py. However, when I run the same file against two datasets, the assertions get removed from the first dataset and are only visible on the second one. Looks like the results get overwritten. Is there any workaround to this? I wish to apply the same set of assertions on multiple datasets.. Thanks in advance.
l

loud-island-88694

02/20/2023, 4:29 PM
@hundreds-photographer-13496 ^
h

hundreds-photographer-13496

02/21/2023, 7:47 AM
Hi @best-umbrella-88325 Are you trying to use same assertion urn (identifier) for both datasets ? If you use different assertion urns (See
def assertionUrn
), your problem will get solved.
b

best-umbrella-88325

02/21/2023, 7:48 AM
I'm using different assertion urns. I'm creating the ones like this
def assertionUrn(info: AssertionInfo,validationName) -> str:
    return "urn:li:assertion:assertionInfo"+validationName.replace(" ","_")
h

hundreds-photographer-13496

02/21/2023, 7:49 AM
More context: An assertion entity is equivalent to a "data quality check on an entity instance". I believe in your case, you are trying to use same assertion urn for two entities. This does not work and also not required, unless the assertion is a cross dataset assertion i.e its final status(pass/fail) depends on state of both datasets at a time. Cross dataset assertions are not supported, as of now. For simple data quality checks that execute on one dataset at a time. one needs to use separate assertion urns for each. Take a look at Assertion Identity section - https://datahubproject.io/docs/generated/metamodel/entities/assertion/#identity for more details.
b

best-umbrella-88325

02/21/2023, 8:04 AM
Thanks for the help @hundreds-photographer-13496! There was a bug in our assertion urn generation code, which was causing the assertion urns to be duplicated. This is fixed now. 🙂
h

hundreds-photographer-13496

02/21/2023, 12:22 PM
glad you figured it out ! 🙂
View count: 1