Hi team! Could you please confirm how Datahub obta...
# ingestion
g
Hi team! Could you please confirm how Datahub obtain profiling stats for Bigquery under the hood? Does it query over each table in bigquery to compute it's statistics or does it obtain this directly from logs? cc. @acceptable-potato-35922
d
We query the tables directly but there are some optimisation: • Running approx queries wherever possible • Profiling only the latest partition for partitioned/sharded tables
Do you happen to know what stats bigquery can provide?
g
Thanks @dazzling-judge-80093! As far as I know, bq can get schema related information. Not sure it can get stats like row & column counts directly without the querying the table. That's where profiling would come in, right?
d
Yes, exactly, we collect distinct count, sample values, min/max values from columns etc…