GitHub
05/01/2024, 11:55 PM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/50a68a998975599a9d4d6f725251ff50ab159d66|50a68a99>
- Adds hamilton UI README image to repo
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 12:12 AMPreview
tab and select the appropriate sub-template:
• <?expand=1&template=HAMILTON_CONTRIB_PR_TEMPLATE.md|sf-hamilton-contrib template>
Else, if not, please remove this block of text.
--- PR TEMPLATE INSTRUCTIONS (2) ---
[Short description explaining the high-level reason for the pull request]
Changes
How I tested this
Notes
Checklist
☐ PR has an informative and human-readable title (this will be pulled into the release notes)
☐ Changes are limited to a single goal (no scope creep)
☐ Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
☐ Any change in functionality is tested
☐ New functions are documented (with a description, list of inputs, and expected output)
☐ Placeholder code is flagged / future TODOs are captured in comments
☐ Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 12:13 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/b112de3780940f1676cf1f046c940e118b59f44f|b112de37>
- Adding the Hamilton/UI README
<https://github.com/DAGWorks-Inc/hamilton/commit/21322af34b50426f1618c0f05827a9277c844e30|21322af3>
- Adds images + cleans up README for UI
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 5:03 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by skrawcz
<https://github.com/DAGWorks-Inc/hamilton/commit/08ab43c05f3dd2e31731bb460757c4b15f6ebfc6|08ab43c0>
- Update ui/README.md
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 5:41 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by skrawcz
<https://github.com/DAGWorks-Inc/hamilton/commit/41a4529a9290593ea516065615a9b799f4394cc0|41a4529a>
- Update README.md
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 6:12 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by skrawcz
<https://github.com/DAGWorks-Inc/hamilton/commit/ba495e74c77369f8fdfd74467e05c2d03f7207c7|ba495e74>
- Update examples/ibis/feature_engineering/run.py
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 6:30 AMGitHub
05/02/2024, 6:36 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by skrawcz
<https://github.com/DAGWorks-Inc/hamilton/commit/ca0c42ff6a967994bc2fa90e75b62b4b61d7d274|ca0c42ff>
- Update examples/ibis/feature_engineering/table_dataflow.py
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 3:56 PMDataLoader
and DataSaver
).
• The nodes can be called directly via .execute()
• Materializers appear in HamiltonGraph
and visualizations even if they aren't executed.
• Validate the DAG, including the materializers before execution.
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 3:58 PM=X
afterwards for X
where str(X)
is short, otherwise =...
Describe alternatives you've considered
Not doing this?
Additional context
Seeing contrib stuff, this could help.
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 3:59 PM%%cell_to_module -m test --display
def foo() -> int:
pass
In next cell:
from hamilton import node
from hamilton.graph_types import HamiltonNode
import test
n = node.Node.from_fn(foo)
dr = driver.Driver({}, test)
var = dr.list_available_variables()[0]
var.version
Stack Traces
raceback (most recent call last):
File "/databricks/python/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3378, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<command-2983067552043610>", line 7, in <module>
var.version
File "/usr/lib/python3.9/functools.py", line 969, in __get__
val = self.func(instance)
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-4c459acb-42c0-448b-bd4f-511f5ba39d33/lib/python3.9/site-packages/hamilton/graph_types.py", line 151, in version
return hash_source_code(self.originating_functions[0], strip=True)
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-4c459acb-42c0-448b-bd4f-511f5ba39d33/lib/python3.9/site-packages/hamilton/graph_types.py", line 71, in hash_source_code
source = inspect.getsource(source)
File "/usr/lib/python3.9/inspect.py", line 1024, in getsource
lines, lnum = getsourcelines(object)
File "/usr/lib/python3.9/inspect.py", line 1006, in getsourcelines
lines, lnum = findsource(object)
File "/usr/lib/python3.9/inspect.py", line 835, in findsource
raise OSError('could not get source code')
OSError: could not get source code
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/databricks/python/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 1997, in showtraceback
stb = self.InteractiveTB.structured_traceback(
File "/databricks/python/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1112, in structured_traceback
return FormattedTB.structured_traceback(
File "/databricks/python/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1006, in structured_traceback
return VerboseTB.structured_traceback(
File "/databricks/python/lib/python3.9/site-packages/IPython/core/ultratb.py", line 859, in structured_traceback
formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
File "/databricks/python/lib/python3.9/site-packages/IPython/core/ultratb.py", line 812, in format_exception_as_a_whole
frames.append(self.format_record(r))
File "/databricks/python/lib/python3.9/site-packages/IPython/core/ultratb.py", line 730, in format_record
result += ''.join(_format_traceback_lines(frame_info.lines, Colors, self.has_colors, lvals))
File "/databricks/python/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/databricks/python/lib/python3.9/site-packages/stack_data/core.py", line 698, in lines
pieces = self.included_pieces
File "/databricks/python/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/databricks/python/lib/python3.9/site-packages/stack_data/core.py", line 649, in included_pieces
pos = scope_pieces.index(self.executing_piece)
File "/databricks/python/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/databricks/python/lib/python3.9/site-packages/stack_data/core.py", line 628, in executing_piece
return only(
File "/databricks/python/lib/python3.9/site-packages/executing/executing.py", line 164, in only
raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0
Note that this is fixed by #770, but we should fix the underlying error.
Here's the repro for that case:
from hamilton.graph_types import hash_source_code
hash_source_code(var.originating_functions[0])
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 4:02 PM# functions.py
# functions.py - declare and link your transformations as functions....
import pandas as pd
from hamilton.htypes import Parallelizable, Collect
def motor(motor_list: list[int]) -> Parallelizable[int]:
for _motor in motor_list:
yield _motor
def _is_motor_on(motor: int ) -> bool:
return motor % 2 == 0
def motor_status(motor: int) -> dict:
# logic to check
return {
"motor_id": motor,
"is_on": _is_motor_on(motor)
}
def aggregate_statuses(motor_status: Collect[dict]) -> list[dict]:
return list(motor_status)
def on_motor(motor_status: Collect[dict]) -> Parallelizable[int]:
for motor_dict in motor_status:
if motor_dict["is_on"]:
yield motor_dict["motor_id"]
def status_check_1(on_motor: int) -> float:
# some status check.
return 2.3 * on_motor
def status_check_2(on_motor: int, status_check_1: float) -> str:
return f"some result based on {on_motor} and {status_check_1}"
def status_result(on_motor: int, status_check_1: float, status_check_2: str) -> dict:
return locals()
def on_motor_statuses(status_result: Collect[dict]) -> pd.DataFrame:
return pd.DataFrame(status_result)
# run.py
# And run them!
import functions
from hamilton import base
from hamilton import driver
from hamilton.execution import executors
dr = (
driver.Builder()
.enable_dynamic_execution(allow_experimental_mode=True)
.with_modules(functions)
# .with_remote_executor(executors.SynchronousLocalTaskExecutor())
.with_adapters(base.PandasDataFrameResult())
.build()
)
# dr = driver.Driver({}, functions)
result = dr.execute(
['on_motor_statuses'],
inputs={'motor_list': [1, 2, 3, 4, 5]}
)
print(result)
dr.display_all_functions(
"graph.dot", orient="TB", show_legend=False)
Stack Traces
File "hamilton/hamilton/driver.py", line 650, in raw_execute
results = self.graph_executor.execute(
File "hamilton/hamilton/driver.py", line 230, in execute
raw_result = results_cache.read(final_vars)
File "hamilton/hamilton/execution/state.py", line 113, in read
raise KeyError(f"Key {formatted_key} not found in cache") # noqa E713
KeyError: 'Key on_motor_statuses not found in cache'
DAG image
graph dot
Steps to replicate behavior
See code above
Library & System Information
Latest.
Expected behavior
I don't see why this couldn't work.
Additional context
If you make on_motor
depend on aggregate_statuses
and thus have a node in between, things work.
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 4:03 PMGitHub
05/02/2024, 4:06 PMto.png(...)
, which takes in a pyplot/sklearn object. Only thing is this is applicable towards a bunch more filetypes with a single arg change.
Describe the solution you'd like
Options:
1. Have an adapter class allow for multiple types, take in the type when saving/loading
2. Just subclass this
(2) might be the nicest, although subclasses with dataclass are a big messy.
See this for an example of how this works: https://github.com/DAGWorks-Inc/hamilton/pull/467/files#diff-02e6f9bd1ec33e0a1eef8e2cc1b91d973df6b981ef061793d437fd92faa4916aR51
Additional context
Add any other context or screenshots about the feature request here.
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 4:06 PMGitHub
05/02/2024, 4:08 PM@resolve
decorator needs a when
argument to be passed. The only supported argument is ResolveAt.CONFIG_AVAILABLE
.
Describe the solution you'd like
Make the when
argument a keyword argument with a default of ResolveAt.CONFIG_AVAILABLE
Describe alternatives you've considered
Making a custom decorator @resolve_at_config_available
that wraps this and defaults
Additional context
I find I am scattering when=ResolveAt.CONFIG_AVAILABLE
around plus it's import which feels a bit boilerplatey... sry I'm being lazy here rather than submitting a PR :(
I've also not really looked to see if theres any guidance around where the when
functionality might be expanded, but regardless it does feel a bit like a default would still be reasonable?
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 4:09 PMtarget_
to specify which one gets saved.
@save_to.parquet(
path=source('save_to_value_1'),
target_='value_1',
output_name_='value_1_saved')
@save_to.csv(
path=source('save_to_value_2'),
target_='value_2',
output_name_='value_2_saved')
@parameterize(
value_1={'foo' : value(...)},
value_2={'foo' : value(...)},
)
def value_n(foo: ...) -> pd.DataFrame:
return ...
Describe the solution you'd like
Change save_to
from a SingleNodeNodeTransformer
to the standard one, with an error when there are multiple nodes outputted and there's ambiguity.
Describe alternatives you've considered
Not doing this. Problem is an API could get verbose.
Additional context
Add any other context or screenshots about the feature request here.
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 4:12 PMGitHub
05/02/2024, 4:12 PMGitHub
05/02/2024, 4:13 PMGitHub
05/02/2024, 4:13 PMjit
for DAGs that people execute over and over again.
Describe the solution you'd like
Two solutions:
1. Prototype the ability to compile a hamilton graph ahead of time with Numba. You could use how we get Hamilton to run on Dask as a starting point (TODO: link to code). See these numba docs for ahead of time compilation.
2. Prototype the ability to use the jit compiler with Numba. That way the first time someone runs execute things are compiled (no speed up), but the second time, things are lightning quick! See these docs.
Things to think about with prototype (1):
1. Since compiling a head of time requires types -- we might need some better way to specify them? Or perhaps we can have numba infer it?
2. The output of compilation is another set of python module(s) -- this is what we'd then want to use for computation.
3. What is therefore the correct order of operations? Build the function graph, compile it, then somehow build the graph again with the new functions (?), and use that for execution?
4. What are the limitations of this approach in terms of use cases, etc. We could limit to numpy and python primitive code only for instance.
Things to think about with prototype (2):
1. What use cases does this make sense for?
2. What are the limitations of this approach?
Describe alternatives you've considered
Haven't.
Additional context
• https://numba.readthedocs.io/en/stable/user/pycc.html#overview
• https://numba.readthedocs.io/en/stable/reference/types.html#numba-types
• https://numba.readthedocs.io/en/stable/user/jit.html
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 4:14 PMGitHub
05/02/2024, 4:16 PMcheck_output
decorator runs a test and it fails. That is, if we standardize on tag keys, then decorators could assume them and make use of them.
Describe the solution you'd like
Enable decorators access to a context
or some variable that would allow them to get at this information.
Describe alternatives you've considered
N/A
Additional context
Taken from the discussion with whylabs folks on what would be useful.
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 4:19 PMGitHub
05/02/2024, 6:04 PM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/e85dd71696bc10497b40fdbfb922b9a106fb4c09|e85dd716>
- Updates to work with latest django + requirements.txt/dev mode
<https://github.com/DAGWorks-Inc/hamilton/commit/8c1c6452049b5ae2c87f7cfca557c50028d03a92|8c1c6452>
- Adds deployment option for allowed hosts
<https://github.com/DAGWorks-Inc/hamilton/commit/80b60206fb00d82bb579d7d0cc725615c3096f75|80b60206>
- Fixes propelauth integration
<https://github.com/DAGWorks-Inc/hamilton/commit/05f6a12ca18acd86e55c1dd27e5630c68eca8ef0|05f6a12c>
- Fixes Hamiton architecture diagram
<https://github.com/DAGWorks-Inc/hamilton/commit/f8370ec951627fb1bc99a899d4c47ab6f1201317|f8370ec9>
- Removes fauly https from .env.local for local dev
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 7:07 PMGitHub
05/02/2024, 7:56 PM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/e87a2cf75596706d26beb8eacaa0eccb9e4792b5|e87a2cf7>
- fixed typo ui/README
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 9:49 PMGitHub
05/02/2024, 10:07 PM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by skrawcz
<https://github.com/DAGWorks-Inc/hamilton/commit/703d8934d8d633f8e976402883d5eb9998a77fb2|703d8934>
- Updates hamilton UI docs with video
DAGWorks-Inc/hamiltonGitHub
05/02/2024, 10:15 PM