I've absolutely loved reading your article about d...
# advice-data-governance
g
I've absolutely loved reading your article about data contracts @mammoth-bear-12532, it's exactly the problem we are facing internally and could not agree more with it. Smarter contracts + core views of data that users can repeatedly use and build trust upon is critical for any data organization to succeed. I know a lot of folks in this channel are interested in governance, so I thought I'd re-share the article by Shirshanka: https://blog.datahubproject.io/data-contracts-wrapped-2022-470e0c43365d As well as: https://drive.google.com/file/d/1UIZSmAPxqADwJEmwDVBJWUhvBAQ9voig/
plus1 2
m
Great to hear it resonated with you @gorgeous-dinner-4055 and thanks for sharing with the community 😀
b
Where can i actually find examples of data contracts? github example via CLI upserts spews validation errors.
m
What errors are you facing? cc @gray-shoe-75895
b
i'm not sure if this is a issue with the cli being 12.1.14? and i'm running 12.0? i've tried a contract .. very simple.. freshness check and schema check. was referencing this example https://github.com/datahub-project/datahub/blob/59674b545715f568820d1ee9fe19dbe6a5[…]a-ingestion/examples/data_contract/pet_of_the_week.dhub.dc.yaml and these are my errors datahub datacontract upsert -f ~/work/datahub/audit_log.yaml [2024-01-23 102843,485] ERROR {datahub.entrypoints:186} - Command failed: 3 validation errors for DataContract version field required (type=value_error.missing) freshness -> root Discriminator 'type' is missing in value (type=value_error.discriminated_union.missing_discriminator; discriminator_key=type) display_name extra fields not permitted (type=value_error.extra) my yaml.. very simple.. version: 1 display_name: AuditLog entity: "urnlidataset:(urnlidataPlatform:s3,s3/path/went/here,PROD)" freshness: time: 0700 granularity: DAILY schema: properties: eeqrre: type: string eqrrr: type: integer eeerq: type: string eerrr: type: string qer: type: Struct eer: type: String c3e: type: integer cc3: type: string status: type: String cc1: type: integer the random column names are just names.. are they suppose to be prefixe with field_ ?
g
Looks like our example yaml in the repo is outdated - fixing that here https://github.com/datahub-project/datahub/pull/9707 Just a heads up though - we're still iterating on the yaml format, and it will likely change a bit before its final iteration
b
all good, this is just testing and its only for.. one team currently. not very large scale
is datahub 12.0 valid for this format? or does it have to be on the latest 12.1.whatever it is?
g
I think it will work with 0.12.0
c
@gray-shoe-75895 is it restricted to any kind of platform? I am trying out v0.12.1 and when I push the yaml file, I get “Update succeeded for urn urnlidataContract:7e2a151528ec2f616eddc5e44ed9de38” message. But when I try to access the dataset though UI, generally I get this error -> “No enum constant com.linkedin.datahub.graphql.generated.AssertionType.DATA_SCHEMA (code 400), What does it mean?
Copy code
version: 1  # datahub yaml format version

# Note: this data contract yaml format is still in development, and will likely
# change in backwards-incompatible ways in the future.
entity: urn:li:dataset:(urn:li:dataPlatform:dbt,datasetName,PROD)
freshness:
  type: cron
  cron: 0 7 * * *  # 7am daily
  timezone: America/Los_Angeles
schema:
  type: json-schema
  json-schema:
    properties:
      external_id:
        type: string
        native_type: VARCHAR(100)
      division_id:
        type: string
        native_type: VARCHAR(100)
      code:
        type: string
        native_type: VARCHAR(100)
      name:
        type: string
        native_type: VARCHAR(100)
data_quality:
  - type: unique
    column: external_id
l
@crooked-carpet-28986 I got exactly same error in UI. Did you overcome this issue?