Hey guys, any ideas on how to quantifiably measure...
# advice-data-governance
r
Hey guys, any ideas on how to quantifiably measure the impact of data discovery on a business? How do you justify that all your activities around implementing a data catalog, improving metadata, etc., is actually helping the analysts, scientists, and engineers? Are you primarily using MAU or something a little more nuanced?
1
p
good documentation helps velocity more than anything else in data engineering/science imo, it’s hard to quantify though
a
I agree with Nick. There are ways you can track “Time to Discover”, but they are imperfect at best. For that, we use more qualitative data. When it comes to quantitative data, we focus on adoption. MAU, cohort retention, and specifically which Datahub features are used by users on a monthly basis.
l
(Cross-posting my response to this question from another Slack) In prior roles, we tracked MAU & activity within the data catalog, and also started looking into: • Volume of requests for simple data pulls via Slack support channels & request portals, ie. Zendesk/JIRA Service Desk (theoretically, support volume should go down as data becomes more self-serve) • Metadata coverage (% assets with documentation, owners, etc.) • Data quality test coverage (top 25% used data sets & anything tagged as
business-critical
required data quality tests) But a lot of those can turn into vanity metrics - it’s worth thinking through why you adopted a catalog in the first place. Is it empower data asset owners to maintain a high quality of data? Adhere to compliance/PII requirements? Drive data democratization via search & discovery? Understand the impact of a breaking change (hello, data contracts!) Tying the metic back to the initial problem makes it much easier to measure impact!