Jannik Buhr
06/05/2024, 3:47 PMtargets
R package (https://books.ropensci.org/targets/walkthrough.html#change-code).
Is there a way of querying the driver for differences between it's cache and workflow? Something like a dry-run that checks the nodes but doesn't execute them.Stefan Krawczyk
06/05/2024, 4:08 PMStefan Krawczyk
06/05/2024, 4:11 PMStefan Krawczyk
06/05/2024, 4:13 PMJannik Buhr
06/05/2024, 7:07 PMJannik Buhr
06/05/2024, 7:09 PMStefan Krawczyk
06/05/2024, 7:10 PMJannik Buhr
06/05/2024, 7:11 PMStefan Krawczyk
06/05/2024, 7:14 PMJannik Buhr
06/05/2024, 7:14 PMJannik Buhr
06/05/2024, 7:17 PMJannik Buhr
06/05/2024, 7:18 PMThierry Jean
06/05/2024, 7:19 PMJannik Buhr
06/05/2024, 7:19 PMJannik Buhr
06/05/2024, 7:19 PMThierry Jean
06/05/2024, 7:21 PMStefan Krawczyk
06/05/2024, 7:21 PMJannik Buhr
06/05/2024, 7:22 PMThierry Jean
06/05/2024, 7:22 PMJannik Buhr
06/05/2024, 7:23 PMStefan Krawczyk
06/05/2024, 7:23 PMJannik Buhr
06/05/2024, 7:26 PMJannik Buhr
06/05/2024, 7:37 PMtar_make()
which only recomputes the targets (=hamilton nodes) whose inputs, code or dependencies have changed, or you call tar_visnetwork()
(=hamilton dr.display_all_functions()
) to look at your workflow and also see which nodes are cached and similar to hamilton data loaders there are special targets you can use to declare a file path as an input such that the node gets invalidated (so will be recomputed) when the file changes.Jannik Buhr
06/05/2024, 7:41 PM# _targets.R file
library(targets)
source("R/functions.R")
tar_option_set(packages = c("readr", "dplyr", "ggplot2"))
list(
tar_target(file, "data.csv", format = "file"),
tar_target(data, get_data(file)),
tar_target(model, fit_model(data)),
tar_target(plot, plot_model(model, data))
)
turns into the following DAG:
notice, how the functions like get_data
that are used within the targets automatically become part of the the DAG such the changes to them can be trackedJannik Buhr
06/05/2024, 7:44 PMStefan Krawczyk
06/05/2024, 8:07 PMdef hash_hamilton_nodes(dr: driver.Driver) -> Dict[str, str]:
"""Hash the source code of Hamilton functions from nodes in a Driver"""
from hamilton import graph_types
graph = graph_types.HamiltonGraph.from_graph(dr.graph)
return {n.name: n.version for n in graph.nodes}
def what_is_still_valid_in_cache(dr: driver.Driver, disk_cache: ...) -> list[str]:
node_hashes = hash_hamilton_nodes(dr)
nodes_in_cache = disk_cache.nodes_history
result = []
for node, node_versions in nodes_in_cache.items():
# caveat assumes we produce the same hashes -- would need to double check code paths here
current_hash = node_hashes[node]
if current_hash in node_versions:
result.append(node)
return result
# then you could minimally style the viz that way
# see <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/styling_visualization>
ideally we’d figure out the paths impacted, but at least visually you could see something quickly…Stefan Krawczyk
06/05/2024, 8:52 PM