witty-butcher-82399
01/16/2023, 11:17 AMechoing-airport-49548
01/17/2023, 6:10 PMwitty-butcher-82399
01/18/2023, 8:26 AMMODEL
DatasetCostStatistics:
- costItem: array[CostItem]
CostItem:
- concept: string ; describes the concept item, use this to explain measure units
- count: double ; number of items for the given concept
- costFactor: double ; cost of a unit
EXAMPLES
- urn: 'urn:li:dataset:(urn:li:dataPlatform:kafka,xxx,PROD)'
costItem:
- concept: Topic size (GBs)
count: 15
costFactor: 0.001
- urn: 'urn:li:dataset:(urn:li:dataPlatform:s3,yyy,PROD)'
costItem:
- concept: Storage (GBs)
count: 1500
costFactor: 0.00001
- concept: GDPR Deletions
count: 8500000
costFactor: 0.001
- concept: GDPR Extracts
count: 4500000
costFactor: 0.001
- urn: 'urn:li:dataset:(urn:li:dataPlatform:ZZZ,zzz,PROD)'
costItem:
- concept: Storage (GBs)
count: 1500
costFactor: 0.00001
- concept: Licensing (€)
count: 10000
costFactor: 1
The overall idea is cost needs to be explained where it comes from. So cost is split into multiple concepts and each concept is specified in the original measurement unit + the cost factor to convert it into money.
WDYT?big-carpet-38439
01/18/2023, 6:06 PMbig-carpet-38439
01/18/2023, 6:06 PMbig-carpet-38439
01/18/2023, 6:06 PMwitty-butcher-82399
01/18/2023, 7:19 PMwitty-butcher-82399
01/23/2023, 9:23 AMbig-ocean-9800
05/04/2023, 2:54 AMwitty-butcher-82399
05/04/2023, 7:52 AMrecord CostItem {
concept: string
amount: double
measurementUnit: MeasurementUnit
costPerUnit: double
}
record CostTimeseriesStatistics includes TimeseriesAspectBase {
costItems: optional array[CostItem]
}
enum MeasurementUnit {
BYTES
GIGABYTES
SECONDS
DAYS
EUROS
...
}
This aspect is usually populated by data platform owners; the shared cost of the platform is allocated individually for the datasets individually.
As a timeseries, we can track cost along time.
Cost is an array because there may be multiple items here: hot/cold storage, computing resources, network, support, licensing, etc.
And for each item, we have the original measurement unit and a cost factor to convert that amount into money and so we can aggregate all items.big-ocean-9800
05/04/2023, 6:21 PMbig-ocean-9800
05/04/2023, 6:29 PMwitty-butcher-82399
05/05/2023, 2:35 PMHow have you been liking this modeling so far?
how are you adding this custom timeseries aspect to datahub?For the moment, we link this aspect to datasets only. It could make sense for other entities too, such as Charts and Dashboards... or Data Platform Instances. But we are not there yet. We haven't forked for adding the custom aspect. https://datahubproject.io/docs/metadata-modeling/extending-the-metadata-model/#to-fork-or-not-to-fork There you have a good starting point for defining custom aspects and no forking. Hope this helps.
Has it been working for all of your use-cases?Not sure if all... but it worked so far 😅 At this moment we just want to show the cost of a dataset; quite simple.
big-ocean-9800
05/05/2023, 7:05 PMwitty-butcher-82399
05/08/2023, 8:58 AMbig-ocean-9800
05/08/2023, 9:00 PMwitty-butcher-82399
05/09/2023, 2:46 PMbig-ocean-9800
05/10/2023, 9:31 PMwitty-butcher-82399
05/11/2023, 12:59 PMlimited-easter-9800
05/12/2023, 2:18 PMbig-ocean-9800
05/22/2023, 6:00 PM