This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

10/13/2023, 11:28 PM

This message was deleted.

👀 1

Stefan Krawczyk

10/13/2023, 11:37 PM

@JVial thanks for the question. I get the following diagram.

Stefan Krawczyk

10/13/2023, 11:38 PM

So yes, inputs can be outputs. Inputs are marked as “Input: X” with

dashed lines

. Outputs are marked with a

rectangular

shape. So you see here, you can have inputs that are also outputs.

Stefan Krawczyk

10/13/2023, 11:39 PM

To understand your question better — are you asking is there a better way to visualize things in the graph? or?

Stefan Krawczyk

10/13/2023, 11:43 PM

If you use the

materializer

functionality — what an output can be, can be visualized more explicitly.

Copy code

from hamilton import base
from hamilton.io.materialization import to
# instead of execute you can do:
result, _ = dr.materialize(
  to.memory(
    id="example_df",
    dependencies=output_columns,
    combine=base.PandasDataFrameResult()
  ),
      inputs=initial_columns
)

# and then to visualize:
dr.visualize_materialization(
  to.memory(
    id="example_df",
    dependencies=output_columns,
    combine=base.PandasDataFrameResult()
  ),
      inputs=initial_columns
)

Where the visualization output will be:

JVial

10/14/2023, 8:40 AM

Hello and thanks for the fast reply. Yeah so my question is, can i visiualize that an input is equal to an output (1:1 -> input == output) If I understand it correctly, so if in the graph a value is marked with dashed lines than it is an input. If a value is marked as a rectangle it is an output. And if a value is marked as a dashed rectangle it is a input and output? Is my understanding correct?

Stefan Krawczyk

10/15/2023, 3:47 AM

a value is marked with dashed lines than it is an input

yes. It’ll also have

Input:

in the node value.

a value is marked as a rectangle it is an output

correct.

if a value is marked as a dashed rectangle it is a input and output

correct 👍

JVial

10/15/2023, 8:34 AM

@Stefan Krawczyk Thank you for the explanation. Now I get it :)👏

JVial

10/15/2023, 9:11 AM

Ok sry 🙂 I have one more question. Is it possible to use the same name for an input and an output but transform it? I mean I have a pd.Series names first_name and i have a output names first_name. I have a function called first_name which get the pd.Series first_name as input, manipulate it (e.g rename a string).

Copy code

def customer_first_name(customer_first_name: pd.Series) -> pd.Series:
  return customer_first_name.str.replace('Lana', '', case=False)

from hamilton import driver, ad_hoc_utils

temp_module = ad_hoc_utils.create_temporary_module(
     customer_first_name)
config = {}
dr = driver.Driver(config, temp_module)

output_columns = [
    'customer_first_name'
]

input_data = customer_data_df.to_dict('series')
df = dr.execute(output_columns, inputs=input_data)

RecursionError: maximum recursion depth exceeded

Stefan Krawczyk

10/15/2023, 4:26 PM

@JVial yep that doesn’t work. What you’ve effectively defined there is a node with an edge to itself (i.e. a loop); Hamilton cant tell the difference between input and output there. More broadly, Hamilton was created to try to make it really easy to debug an output. i.e. go from output to code, and understand the order of computation to it. So we try to make it hard to “redefine” i.e. “mutate”, the same thing twice. So you’ll need to name either the input or the function differently; it’s common to use

_raw

as a suffix for inputs.

Copy code

def customer_first_name(customer_first_name_raw: pd.Series) -> pd.Series:
  return customer_first_name.str.replace('Lana', '', case=False)

...

input_data = customer_data_df.to_dict('series')
input_data = {f"{c}_raw": v for c, v in input_data.items()}  # add _raw suffix
df = dr.execute(output_columns, inputs=input_data)

Note: depending on the transforms you’re doing you might like this issue that should be done soon.

👍 1

Open in Slack

Previous Next