modern-monitor-81461
01/18/2022, 10:33 PMDistinct Count
and Distinct %
since it is not possible to compute those by only using the manifest metrics (there is a set of metrics for each data file, so a distinct value in file A and a distinct value in file B do not mean that we have 2 distinct values in the table... it could be the same value, so the distinct count for the table would be 1). Min
, Max
, Null Count
and Null %
are reliable though. Is it a problem if the Iceberg source profiling does not provide a full picture?
My code would greatly benefit from a review since I don't think I leveraged all the tooling from DataHub ingestion. What would you recommend? That I try to polish it as much as I think I can and then ask for a review, so do this sooner in case I need to do a big refactoring? I don't want to waste your time too much, but I don't want to waste mine either! 😉mammoth-bear-12532
chilly-holiday-80781
01/19/2022, 2:12 AMlittle-megabyte-1074