Hi I am using materialize and add a `to pickle` I found the Hamilton Open Source #hamilton-help

Hi, I am using materialize, and add a `to.pickle` ...

Roy Kid

03/27/2024, 12:44 PM

Hi, I am using materialize, and add a

to.pickle

. I found the materilizer already added into graph, but it can not get

node.version

. The materializer's version is None, so it can not be sorted. Here is a snapshot:

👀 1

Elijah Ben Izzy

03/27/2024, 3:06 PM

Ok so this is a bug that needs to be fixed. Will look when I’m at my computer. @Thierry Jean any thoughts on an easy way to get past this?

Roy Kid

03/27/2024, 3:08 PM

Genius! Is this because I use it in a weird way? Because I ran https://github.com/DAGWorks-Inc/hamilton/blob/main/examples/experiment_management/run.py many days ago and it worked well.

Elijah Ben Izzy

03/27/2024, 3:09 PM

(At least I think it’s a bug, don’t want to commit before I look more :) )

Roy Kid

03/27/2024, 3:09 PM

Thanks a lot!

Elijah Ben Izzy

03/27/2024, 3:09 PM

We made changes that might have brought this up — the weird thing is that version would error before, but this might be a case we didn’t account for

Thierry Jean

03/27/2024, 3:17 PM

Here's the gist of the bug: • the ExperimentTracker used to hash nodes directly, but this was replaced in PR #734 • Now, the version hash is provided via

HamiltonNode.version

• the new versioning API was designed and tested in the context of the CLI, which doesn't have to deal with materializers The bug raises some design questions (for version, experiment tracking, and caching): •

HamiltonNode.version

contains information about code version • a materializer has a code version based on it's class (instead of a function) • does it make sense to version the data of a materializer? after all, if someone calls for a materializer, they probably expect results to be materialized

👍 1

Roy Kid

03/27/2024, 6:31 PM

Can I use ExperimentTracker with dr.execute? I got this error instead:

Materializer dependency [..., n_chains: int, ...] is not a string, a function, or a driver.Variable.

Thierry Jean

03/27/2024, 6:34 PM

@Roy Kid the ExperimentTracker was designed to be used with

.materialize()

, we're currently implementing a fix! Regarding this other error message, you're getting this when calling

.execute()

Roy Kid

03/27/2024, 6:35 PM

Yes. I read the source code and you are right, I can not bypass materialize by using

.execute()

Elijah Ben Izzy

03/27/2024, 7:22 PM

We should have a clean error for this? E.G. detect this error and call out the execution method?

👍 1

Thierry Jean

03/27/2024, 7:24 PM

we could check

run_before_graph_execution

that the

HamiltonGraph

contains nodes tagged as materializers. I'm wondering how it handles

additional_vars

Stefan Krawczyk

03/28/2024, 5:37 AM

@Roy Kid Will have it out soon. Fix here

Stefan Krawczyk

03/28/2024, 6:03 AM

@Roy Kid I can’t publish the new package to pypi because pypi is having issues. If you need to get unblocked you can install from github directly.

Copy code

pip install <git+ssh://git@github.com/dagworks-inc/hamilton.git@main>

Stefan Krawczyk

03/28/2024, 6:00 PM

Okay this is up

1.55.1

Roy Kid

03/28/2024, 10:06 PM

Thanks! I was just on a plane and now it's working. Looks like everything is working fine!

Roy Kid

03/28/2024, 11:03 PM

hmmm, some wired things happen again: First time I run materialize it works fine, but after one successful run, it stuck somewhere:

Copy code

(wflow) exp/test [ python test.py                                                                                                                           ] 12:00 AM
^CTraceback (most recent call last):
  File "/proj/snic2021-5-546/users/x_jicli/exp/test/test.py", line 35, in <module>
  File "/home/x_jicli/miniconda3/envs/wflow/lib/python3.10/site-packages/hamilton/driver.py", line 57, in wrapped_fn
    return call_fn(*args, **kwargs)
  File "/home/x_jicli/miniconda3/envs/wflow/lib/python3.10/site-packages/hamilton/driver.py", line 1470, in materialize
    raw_results = self.raw_execute(
  File "/home/x_jicli/miniconda3/envs/wflow/lib/python3.10/site-packages/hamilton/driver.py", line 650, in raw_execute
    results = self.graph_executor.execute(
  File "/home/x_jicli/miniconda3/envs/wflow/lib/python3.10/site-packages/hamilton/driver.py", line 228, in execute
    executors.run_graph_to_completion(execution_state, self.execution_manager)
  File "/home/x_jicli/miniconda3/envs/wflow/lib/python3.10/site-packages/hamilton/execution/executors.py", line 374, in run_graph_to_completion
    while not GraphState.is_terminal(execution_state.get_graph_state()):
  File "/home/x_jicli/miniconda3/envs/wflow/lib/python3.10/site-packages/hamilton/execution/state.py", line 433, in get_graph_state
    def get_graph_state(self) -> GraphState:
KeyboardInterrupt

or:

Copy code

(wflow) x_jicli/exp [ python test.py                                                                                                                        ] 11:56 PM
^CTraceback (most recent call last):
  File "/proj/snic2021-5-546/users/x_jicli/exp/test.py", line 34, in <module>
    dr.materialize(*materilizers)
  File "/home/x_jicli/miniconda3/envs/wflow/lib/python3.10/site-packages/hamilton/driver.py", line 57, in wrapped_fn
    return call_fn(*args, **kwargs)
  File "/home/x_jicli/miniconda3/envs/wflow/lib/python3.10/site-packages/hamilton/driver.py", line 1470, in materialize
    raw_results = self.raw_execute(
  File "/home/x_jicli/miniconda3/envs/wflow/lib/python3.10/site-packages/hamilton/driver.py", line 650, in raw_execute
    results = self.graph_executor.execute(
  File "/home/x_jicli/miniconda3/envs/wflow/lib/python3.10/site-packages/hamilton/driver.py", line 228, in execute
    executors.run_graph_to_completion(execution_state, self.execution_manager)
  File "/home/x_jicli/miniconda3/envs/wflow/lib/python3.10/site-packages/hamilton/execution/executors.py", line 374, in run_graph_to_completion
    while not GraphState.is_terminal(execution_state.get_graph_state()):
KeyboardInterrupt

Roy Kid

03/28/2024, 11:05 PM

and I try to make a minimal reproducible demo:

Copy code

# test.py
from hamilton import driver
from hamilton.plugins import h_experiments
from hamilton.io.materialization import to, from_

def connect(a:dict) -> str:
    print('connect')
    print(a)
    return 'connect'

if __name__ == "__main__":
    import test

    tracker_hook = h_experiments.ExperimentTracker(
        experiment_name='exp',
        base_directory='.',
    )


    dr = (
        driver.Builder()
        .with_modules(test)
        .enable_dynamic_execution(allow_experimental_mode=True)
        # .with_execution_manager(execution_manager)
        .with_adapters(tracker_hook)
        .build()
    )
    materilizers = [
        from_.pickle(path='./data.pickle', target='a'),
    ]
    dr.materialize(*materilizers)

I only tested it on HPC, so I don't know if it is environmental problem or my code's problem.

👀 1

Stefan Krawczyk

03/28/2024, 11:58 PM

Ok will look at this. Yes this is weird.

Stefan Krawczyk

03/29/2024, 6:41 AM

@Roy Kid so the issue in the above code is that there is no output requested. Thus yeah you’ve found a 🐛 — but it makes sense. e.g. this shows a blank image:

Copy code

dr.visualize_materialization(*materilizers, output_file_path='./test.png')

But the following executes fine — if we request

connect

as an output

Copy code

dr.visualize_materialization(*materilizers, additional_vars=["connect"], output_file_path='./test.png')

dr.materialize(*materilizers, additional_vars=["connect"])

Roy Kid

03/29/2024, 8:15 PM

thanks! a little bit difficult to debug this! i will try it again 😋

👍 1

Open in Slack

Previous Next