This message was deleted.
# general
s
This message was deleted.
r
Is there an overview of the current capabilities somewhere? I would be interested to be able to use different types of nodes e.g. by using a decorator to make the diagram easier to read for specific use cases.
e
@Roel Bertens and I caught up offline. To tie the loop, we don’t have all of them documented, they’re a bit spread out. But there are currently a few different functions that should be well-documented: • Driver.display_all_functionsDriver.display_downstream_ofDriver.display_upstream_ofDriver.visualize_path_betweenDriver.visualize_materialization @Roel Bertens is scoping out adding the notion of adding schema information to make viz easier
t
• would like to have them all name with the same convention
display_
• infer file type from name/path • pipe graphviz to avoid generating a DOT file • create a structured object
HamiltonDisplayConfig
that's well documented and type-annotated that encompasses all of the public interface (args, kwargs, render_kwargs, graph_kwargs) • advanced: provide string alias for long type annotations • advanced: enable theming (requires structured objects and config)
m
+1 on avoiding the .dot file generation and render .png directly
a
Is there a way to render inputs as nodes (with a different color), instead of having them floating (and duplicated) next to each node?
s
there’s a flag that might help.
deduplicate_inputs=True
🔥 2
👍 1
otherwise yeah we’re thinking about how to bring more customization to visualization
r
@Stefan Krawczyk any news on this topic? I want to be able to tag nodes with different types to give them a different color to show some more structure in the DAG. Are there similar ideas or examples of this already? We could e.g. @tag to define a type for the node and use a config to choose the colors corresponding to each type. Or we could tag with colors directly which could be automatically picked up. But then maybe a special decorator for this would be better. Then you could also include a name to display in the legend for that color too. E.g. @custom_node_type(color=.., name=..) What do you think?
e
@Roel Bertens interesting — to flesh it out more: What types do you want? Toy example maybe? I like the idea of tagging categories/metadata, or passing in a tag to hamilton to categorize/putting it in the legend, just want to make it a bit more concrete. The other possibility is outputting the .dot file and modifying that if you want to work quickly/build something more custom. Should be pretty easy to x-reference that with the node tags
r
My example: I have three different types of nodes: input data, intermediate data and output data. I want to give them a different look such that it is easier to take all the info from the diagram as a user
input data is source data. Intermediate is pretty clear. And output data (features) is what is ready for the users to use
t
I think this should get you started! The general idea is to define a stylesheet containing Graphviz attributes and parse the tags from the `driver.list_all_variables()`to edit the output of `driver.display_*`which is a `graphviz.Digraph`object full solution: https://gist.github.com/zilto/e034d12cf6d632f0f3ea9c3686830ce6 interactive browser graphviz editor: https://edotor.net/
Copy code
# main.py
import graphviz
from hamilton import driver

import functions

# customize graphviz render: <https://graphviz.org/docs/nodes/>
# careful with overwriting string attributes; fillcolor should be safe
level_stylesheet = dict(
    intermediate=dict(
        level="intermediate",  # add arbitrary metadata to the DOT file; could collide with graphviz attributes
        fillcolor="royalblue",  # edits the style
    ),
    final=dict(
        level="final",
        fillcolor="aquamarine",
    )
)


dr = driver.Builder().with_modules(functions).build()
g: graphviz.Digraph = dr.display_all_functions()

for v in dr.list_available_variables():
    if level := v.tags.get("level"):
        g.node(v.name, **level_stylesheet[level])
s
🤔 setting up a discussion on enabling node level styling — https://github.com/DAGWorks-Inc/hamilton/discussions/624
r
Thanks @Thierry Jean for the example! Am I correct that this doesn’t include the different styling in the legend?
s
the legend is a “subgraph” property on the graphviz object — so I assume it’s accessible to be modified/added to (will look up code in a bit / wait for Thierry).
t
As a temporary solution (hoping to support it better), you can add this section
Copy code
for v in dr.list_available_variables():
    if level := v.tags.get("level"):
        g.node(v.name, **level_stylesheet[level])
        continue

# the style used for Function nodes
default_node_style = dict(
    shape="rectangle",
    margin="0.15",
    style="rounded,filled",
    fillcolor="#b4d8e4",
    fontname="Helvetica",
)

# `cluster__legend` is the name of the legend subgraph
with g.subgraph(name='cluster__legend') as legend_subgraph:
    for level, style in level_stylesheet.items():
        legend_node = dict(**default_node_style)  # set default style
        legend_node.update(**style)  # update default style with stylesheet
        legend_subgraph.node(level, label=level, **legend_node)
r
Awesome thanks for the quick replies.
t
No problem! It's good to get a sense of what viz features are useful so we can scope how to move forward
s
@Roel Bertens FYI - tags include the python module a function was in, in case that’s helpful.
r
It is becoming a bit hacky but for the short term I've also added this to remove 'function' from the legend because all my nodes have another style.
Copy code
to_remove = '\t\tfunction [fillcolor="#b4d8e4" fontname=Helvetica margin=0.15 shape=rectangle style="rounded,filled"]\n'
g.body = [l for l in g.body if l != to_remove]
👍 1
t
In the future, we could have custom styling generated before the legend so the legend always matches what's displayed!
r
I'm running into another issue. When I use display_upstream_of I can't iterate over the variables using list_available_variables because then I get also the variables that are not shown. Any quick tip to only get the ones in the graph? Besides parsing the g.body
t
you should be able to use a similar workflow with
driver.what_is_upstream_of(VAR_NAME_1, VAR_NAME_2, ...)
, which returns variables objects like
list_available_variables()
. There is also
what_is_downstream_of()
and
what_is_path_between()
👍 1
🙌 1
s
👍 2