Tried searching but no luck..has anyone created ta...
# advice-metadata-modeling
a
Tried searching but no luck..has anyone created tags for fields denoting which ones have data quality checks on them? we have a set of baseline checks and business critial one and was looking to see what patterns other have done
g
Datahub does not yet support automatic tag attachment for tables and fields, but you can parse DQ check code and add tags using SDK. But the challenging thing is keeping these tags updated in line with changes happening with DQ code.
a
we are creating and running our checks in Databricks then outputting to a table we build powerBI reports. What I want to do is then use the table as input to use the API to tag the fields during each DQ run to add/update/remove tags so that when our customer looks at a table, they can answer the question "which fields have DQ checks on them?" and they can look at our reporting to to get more detials
a
Hello @adorable-salesclerk-90917, here's an example of how we create tags that don't exist and tag datasets:
Copy code
def create_tag(self, name: str) -> str:
        tag_urn = make_tag_urn(name)
        # check if the tag exists
        query = (
            """query {
                tag(urn: \""""
            + tag_urn
            + """\") {
                    properties {
                        name
                    }
                }
            }"""
        )
        exists = self.client.execute(query)
        if exists["data"]["tag"]["properties"] is None:
            # tag doesn't exist, create it
            mutation = (
                """mutation {
            createTag(input: {id: \""""
                + name
                + """\", name: \""""
                + name
                + """\"})
          }
          """
            )
            json = self.client.execute(query=mutation)
            tag_urn = json["data"]["createTag"]
            print(f"tag {tag_urn} created.")
            return tag_urn
        else:
            return tag_urn

    def tag_dataset(self, dataset_urn, tag_name) -> bool:
        tag_urn = make_tag_urn(tag_name)
        mutation = (
            """mutation {
          addTag(input: {tagUrn: \""""
            + tag_urn
            + """\", resourceUrn: \""""
            + dataset_urn
            + """\"})
        }
        """
        )
        json = self.client.execute(query=mutation)
        return bool(json["data"]["addTag"])
👀 1
r
Hey Sean, the Python SDK has a lot of helper functions to support your use case. apologies for not adding this to your post earlier. encourage bookmarking this section as it has a lot of example code- find the tag a dataset area here for a code sample. https://datahubproject.io/docs/0.11.0/api/datahub-apis#datahub-api-comparison