brainy-oxygen-20792
05/03/2023, 4:50 PM--select
) on their own schedule, which may be daily or hourly.
So scheduling DataHub to pull on a schedule means our assertions (run_results.json) may not be complete.
Ideas we're considering in the thread.lively-cat-88289
05/03/2023, 4:50 PMbrainy-oxygen-20792
05/03/2023, 5:02 PMgray-shoe-75895
05/03/2023, 5:34 PMentities_enabled
and node_name_pattern
filters, which you can use to make sure that your datahub ingestion runs only bring in the relevant stuffbrainy-oxygen-20792
05/11/2023, 6:13 AMdbt docs generate
so we can do a stateful ingestion of the model, snapshot, test etc definitions here.
• On the execution of a DBT build/test we can ingest only the test_result entities, using the manifest and catalog from the earlier deployment and the run_results from the execution. Edit: since run results are appended to the timeseries index and DataHub picks the most recent test (not most recent ingestion) per definition, if they happen to run out of order, no problem
I think that solves my problem (and helps anyone else looking for how to orchestrate their DBT ingestion)