Hey! In Bigquery we have a couple of sharded table...
# ingestion
r
Hey! In Bigquery we have a couple of sharded tables, where one new table is created each day, ie project.dataset.table_20210901, project.dataset.table_20210902 etc. Does anyone know how to avoid ingesting metadata for all of these tables? Would like to only ingest metadata for the lastest shard (with todays date).
b
Not familiar with bigquery, but there is a table config option in the recipes (example https://github.com/linkedin/datahub/blob/master/metadata-ingestion/examples/recipes/mssql_to_console.yml) that performs regex on table names to see whether or not the table will be processed. You will need to be super specific to get the current day table though.
r
Hm thanks. But as regex doesn't really have a concept of time I'm not sure how to specify the latest shard.
g
one idea would be to create for example a view
project.dataset.table_latest
that always points to the latest shard and then ingest that view 🤔
r
yes that's not a bad idea actually
thanks
w
Hi, this already something we are actively working on!
To clarify, we are working on being able to ingest only the latests views. Not on time partitioned tables
🙌 1