GitHub
03/12/2025, 3:03 PMGitHub
03/12/2025, 6:11 PM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by skrawcz
<https://github.com/DAGWorks-Inc/hamilton/commit/a14aefa476bb4717cd63d6f2c585803e4194a0b5|a14aefa4>
- fix type hint in parallel-task.rst documentation. Use RST link formatting instead of markdown.
DAGWorks-Inc/hamiltonGitHub
03/16/2025, 12:21 AMlist_available_variables
has topological sort documented nad observed. No ordering specified now so we can make it more specific.
Describe alternatives you've considered
• add a topological sort utility function
• add this to HamiltonGraph
Additional context
See slack
DAGWorks-Inc/hamiltonGitHub
03/17/2025, 4:19 AMprefect
• This branch is based on #1287 and will need to be rebased if/when accepted
## Changes
This PR adds a pair of adapters (in hamiton.plugins.h_logging
) named LoggingAdapter
and AsyncLoggingAdapter
that can be used to log the following state in a the execution of a graph (supporting both V1/V2 drivers where applicable):
• Graph start (GraphExecutionHook
)
• Task grouping (TaskGroupingHook
)
• Task submission (TaskSubmissionHook
)
• Task pre-execution (TaskExecutionHook
))
• Node pre-execution (NodeExecutionHook
)
• Node post-execution (NodeExecutionHook
)
• Task post-execution (TaskExecutionHook
)
• Task resolution (TaskResolutionHook
)
• Graph completion (GraphExecutionHook
)
These adapters keep track of their current execution context by using an internal ContextVar
. This allows the log to have a context dependent prefix, by way of an internal custom log adapter, such as
Graph run 'c7236c13-94ca-4e5e-85a6-2f32af054736' - Starting graph execution
or
`Task 'expand-stargazer_url.0.block-stargazer_url' - Task completed [OK]
Additionally, a function called get_logger
was added that returns the custom log adapter which the user can use to create context-aware logs from within a node. For example the following log inside node a
...
from hamilton.plugins.h_logging import get_logger
def a() -> str:
logger = get_logger("name_or_logger_or_none")
logger.warning("Encountered a warning")
will generate the following:
Node 'a' - Encountered a warning
See the Notes section for some examples
## How I tested this
I added tests for both LoggingAdapter
and AsyncLoggingAdapter
. The tests for LoggingAdapter
examine the V1 driver with and without the FutureAdapter
and the V2 driver with the synchronous, multi-threading, multi-process, ray, and dask task-based executors. The tests may be a little brittle because some are dependent on undocumented task id naming conventions. Open to suggestions 😄.
## Examples
Here I would like to present some examples. I used a logging configuration based on rich - but any will suffice.
### Synchronous - non-branching
Standard node-based Hamilton graphs using the LoggingAdapter
...
def a() -> str:
return "a"
def b(a: str) -> str:
return a + " b"
def c(b: str) -> str:
return b + " c"
will produce the following ...
INFO Graph run '24fd50b2-760b-4cf2-b1e1-61476c18e9b3' - Starting graph execution
DEBUG Node 'a' - Starting execution without dependencies
INFO Node 'a' - Finished execution [OK]
DEBUG Node 'b' - Starting execution with dependencies 'a'
INFO Node 'b' - Finished execution [OK]
DEBUG Node 'c' - Starting execution with dependencies 'b'
INFO Node 'c' - Finished execution [OK]
INFO Graph run '24fd50b2-760b-4cf2-b1e1-61476c18e9b3' - Finished graph execution [OK]
### Synchronous - branching
Branching graphs with multiple possible paths using the LoggingAdapter
(with or without the FutureAdapter
)...
def a() -> str:
return "a"
def b() -> str:
return "b"
def c() -> str:
return "c"
def d(a: str, b: str) -> str:
return a + " " + b + " d"
def e(c: str) -> str:
return c + " e"
def f(d: str, e: str) -> str:
return d + " " + e + " f"
will produce the following (with a potentially different order)...
INFO Graph run '31d4bc1d-8020-4e24-91a5-732a03497f1a' - Starting graph execution
DEBUG Node 'c' - Submitting async node without dependencies
DEBUG Node 'a' - Submitting async node without dependencies
DEBUG Node 'b' - Submitting async node without dependencies
DEBUG Node 'd' - Submitting async node with dependencies 'a', 'b'
DEBUG Node 'e' - Submitting async node with dependencies 'c'
DEBUG Node 'f' - Submitting async node with dependencies 'd', 'e'
INFO Node 'c' - Finished execution [OK]
INFO Node 'a' - Finished execution [OK]
INFO Node 'b' - Finished execution [OK]
INFO Node 'e' - Finished execution [OK]
INFO Node 'd' - Finished execution [OK]
INFO Node 'f' - Finished execution [OK]
INFO Graph run '31d4bc1d-8020-4e24-91a5-732a03497f1a' - Finished graph execution [OK]
### Asynchronous - Branching
Async branching graphs are also supported using the AsyncLoggingAdapter
but must be used with the async driver. There is a little weakness in this adapter dealing with the current state of hooks in the async adapter - see the code for more details.
async def a() -> str:
return "a"
async def b() -> str:
return "b"
async def c() -> str:
return "c"
async def d(a: str, b: str) -> str:
return a + " " + b + " d"
async def e(c: str) -> str:
return c + " e"
async def f(d: str, e: str) -> str:
return d + " " + e + " f"
These will produce the following (again with a potentially different order)...
DEBUG Node 'a' - Submitting async node without dependencies
DEBUG Node 'b' - Submitting async node without dependencies
DEBUG Node 'd' - Submitting async node with dependencies 'a', 'b'
DEBUG Node 'c' - Submitting async node without dependencies
DEBUG Node 'e' - Submitting async node with dependencies 'c'
DEBUG Node 'f' - Submitting async node with dependencies 'd', 'e'
INFO Node 'a' - Finished execution [OK]
INFO Node 'b' - Finished execution [OK]
INFO Node 'c' - Finished execution [OK]
INFO Node 'd' - Finished execution [OK]
INFO Node 'e' - Finished execution [OK]
INFO Node 'f' - Finished execution [OK]
INFO Graph run '38deee89-4a91-4253-b913-ce3c1e60b791' - Finished graph execution [OK]
### Task based
Task-based executors (synchronous, threading, processing, ray, dask ,...) are also supported with the LoggingAdapter
. Note however that user context logs from inside a node may (threading, ray) or may not (multiprocessing, dask) be supported. Some executors provide log pass throughs (ray) the produce slightly different results.
def b(a: int) -> int:
return a
def c(b: int) -> Parallelizable[int]:
for i in range(b):
yield i
def d(c: int) -> int:
return 2 * c
def e(d: Collect[int]) -> int:
return sum(d)
def f(e: int) -> int:
return e
These will produce the following (with a potentially different order) when a = 2
...
```
INFO Graph run 'dc4ec112-ba50-44ae-8b64-11b0ad099f74' - Starting graph execution
INFO Graph run 'dc4ec112-ba50-44ae-8b64-11b0ad099f74' - Using inputs 'a'
INFO Graph run 'dc4ec112-ba50-44ae-8b64-11b0ad099f74' - Dynamic DAG detected; task-based logging is enabled
DEBUG Task 'a' - Initializing new task and submitting to executor
DEBUG Task 'a' - Task completed [OK]
DEBUG Task 'b' - Initializing new task and submitting to executor
DEBUG Task 'b' - Starting execution
DEBUG Task 'b' - Starting execution with dependencies 'a'
DEBUG Task 'b' - Node 'b' - Finished execution [OK]
DEBUG Task 'b' - Finished execution [Ok]
INFO Task 'b' - Task completed [OK]
DEBUG Task 'expand-c' - Initializing new task and submitting to executor
DEBUG Task 'expand-c' - Starting execution of nodes 'c'
DEBUG Task 'expand-c' - Starting execution with dependencies 'b'
DEBUG Task 'expand-c' - Node 'c' - Finished execution [OK]
DEBUG Task 'expand-c' - Finished execution [Ok]
INFO …
DAGWorks-Inc/hamiltonGitHub
03/21/2025, 12:19 AMxgboost
errors in CI related to test_xgboost_booster_json_writer
and test_xgboost_booster_json_reader
where the following error is encountered:
Check failed: base_score > 0.0f && base_score < 1.0f: base_score must be in (0,1) for logistic loss, got: 0
Currently blocking #1287 and #1294
## Changes
I added a default base_score
to the fitted_xgboost_booster
fixture.
## How I tested this
N/A
## Notes
Oddly enough this, these errors never show up on my machine (Windows). Which is usually reversed!
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
03/21/2025, 3:59 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/25422f485725b95fedea4c4d83a6b701b320befe|25422f48>
- Add base_score
to fitted_xgboost_booster
🤞 (#1295)
DAGWorks-Inc/hamiltonGitHub
03/21/2025, 4:12 AMGitHub
03/21/2025, 4:12 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/b4a03e7a1a6f1b96ad7d5757b9ed22f0abb16ab7|b4a03e7a>
- Bumps sf-hamilton-ui version to 0.0.17 (#1296)
DAGWorks-Inc/hamiltonGitHub
03/25/2025, 3:43 PMfrom __future__ import annotations
and hamilton.function_modifiers.datasaver
in conjunction. Since having the annotations included is a default in e.g. ruff
(see https://docs.astral.sh/ruff/rules/future-required-type-annotation/), it can lead to friction.
# Current behavior
Using from __future__ import annotations
breaks datasaver usage.
The internal test if return_annotation not in (dict, Dict)
fails, since the return_annotation
becomes a string in this scenario
## Stack Traces
Traceback (most recent call last):
File "~/playground/hamilton_saver.py", line 6, in <module>
@datasaver() # you need ()
^^^^^^^^^^^
File "~/.venv/lib/python3.11/site-packages/hamilton/function_modifiers/base.py", line 60, in replace__call__
return call_fn(self, fn)
^^^^^^^^^^^^^^^^^
File "~/.venv/lib/python3.11/site-packages/hamilton/function_modifiers/base.py", line 102, in __call__
self.validate(fn)
File "~/.venv/lib/python3.11/site-packages/hamilton/function_modifiers/adapters.py", line 864, in validate
raise InvalidDecoratorException(f"Function: {fn.__qualname__} must return a dict.")
hamilton.function_modifiers.base.InvalidDecoratorException: Function: save_json_data must return a dict.
## Steps to replicate behavior
1. Take datasaver example from https://hamilton.dagworks.io/en/latest/reference/decorators/datasaver/
2. Add from __future__ import annotations
at the top of the file
## Library & System Information
Ubuntu 24.04
Python 3.11.11
hamilton 1.86.1
# Expected behavior
datasaver should work with __future__.annotations
?
# Note
simply extending the check to if return_annotation not in (dict, Dict, 'dict'):
fixes the issue.
DAGWorks-Inc/hamiltonGitHub
03/28/2025, 4:04 PMfunction_modifiers.adapters.datasaver.validate
.
## Changes
• modified datasaver.validate
to check against __future__.annotations
compatible type hints
## How I tested this
• added from __future__ import annotations
to test_adapters
• ran pytest tests/function_modifiers
→ test failed
• made above changes
• ran pytest tests/function_modifiers
→ test succeeded
## Notes
correct_ds_function
in test_adapters.py
uses an alias for `dict`: dict_ = dict
. However, for the string comparison this is not resolved to `dict`; added dict_
to the validation for now.GitHub
03/29/2025, 6:12 PM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/06e1c209ee34a184516885e78db2d5d1a233a282|06e1c209>
- Add Task Submission / Return Hooks (#1287)
DAGWorks-Inc/hamiltonGitHub
03/29/2025, 6:15 PM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/cda7a9608b43d1660273f2f249ef49df638b1d23|cda7a960>
- Add Context-Aware Synchronous/Asynchronous Logging Adapters (#1294)
DAGWorks-Inc/hamiltonGitHub
03/29/2025, 6:22 PMGitHub
03/29/2025, 6:47 PM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/7437f038f8f5a1a670b82e55b59274c34c88cbe5|7437f038>
- Bumps sf-hamilton version from 1.87.0 to 1.88.0 (#1299)
DAGWorks-Inc/hamiltonGitHub
03/29/2025, 6:55 PMddtrace
until deprecated module can be evaluated by @cswartzvi in #1288
• Fix a Type Hint and Link in Parallel Execution Documentation by @bustosalex1 in #1292
• Fix xgboost
errors in CI by @cswartzvi in #1295
• Bumps sf-hamilton-ui version to 0.0.17 by @elijahbenizzy in #1296
• Add Task Submission / Return Hooks by @cswartzvi in #1287
• Add Context-Aware Synchronous/Asynchronous Logging Adapters by @cswartzvi in #1294
• Bumps sf-hamilton version from 1.87.0 to 1.88.0 by @elijahbenizzy in #1299
## New Contributors
• @jonas-meyer made their first contribution in #1284
Full Changelog: sf-hamilton-1.87.0...sf-hamilton-1.88.0
DAGWorks-Inc/hamiltonGitHub
03/29/2025, 6:55 PMGitHub
03/31/2025, 6:54 PMPreview
tab and select the appropriate sub-template:
• <?expand=1&template=HAMILTON_CONTRIB_PR_TEMPLATE.md|sf-hamilton-contrib template>
Else, if not, please remove this block of text.
--- PR TEMPLATE INSTRUCTIONS (2) ---
[Short description explaining the high-level reason for the pull request]
## Changes
## How I tested this
## Notes
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
04/03/2025, 6:16 AM/hamilton3
and then serving the UI from that subpath.
## Notes
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
04/06/2025, 2:57 AMget_file_metadata
function to correctly handle Windows drive paths where the scheme
from parse.urlparse
may include the Windows drive letter.
### Testing Fixture Updates:
• `tests/caching`: Because the metadata_store
and result_store
used the same temporary directory, deletions during clean-up were running into Window's file share locking. Switched to the tmp_path_factory
fixture and decoupled the paths for the metadata_store
and result_store
### Environment Variable Mocking:
• `tests/plugins/test_pandas_extensions.py`: Added mocking for the TZDIR
environment variable in the test_pandas_orc_reader
test. Note this is due to how Windows interacts with the IANA timezone database.
• `tests/test_telemetry.py`: Added mocking for the HAMILTON_TELEMETRY_ENABLED
environment variable in telemetry configuration tests. Previous tests were changing os.environ
directly leading to issues if the user already had ``HAMILTON_TELEMETRY_ENABLED` set.
### Platform-Specific Test Adjustments:
• `tests/plugins/test_plotly_extensions.py`: Added a platform check to skip the test_plotly_static_writer
test on Windows. There are some issue with using the plotly
dependency kaleido
to generate static images on Windows.
## How I tested this
N/A
## Notes
N/A
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
04/06/2025, 1:34 PMunpack_fields
which allows for the extraction of fields from a tuple output (a cross between extract_columns
and extract_fields
).
## Changes
• Added unpack_fields
decorator to hamilton/function_modifiers/__init__.py
and implemented its logic in hamilton/function_modifiers/expanders.py
. This decorator enables the extraction of fields from a tuple output, expanding a single function into multiple nodes.
## How I tested this
• Added multiple test cases in tests/function_modifiers/test_expanders.py
to validate the functionality of the unpack_fields
decorator, including tests for valid and invalid type annotations and different tuple configurations.
## Notes
• Example usage with an fixed size tuple:
@unpack_fields("X_train" "X_validation", "X_test")
def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Randomly split data into train, validation, test"""
X_train, X_validation, X_test = random_split(X)
return X_train, X_validation, X_test
• Example usage with a subset of fixed length tuples:
@unpack_fields("X_train")
def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Randomly split data into train, validation, test"""
X_train, X_validation, X_test = random_split(X)
return X_train, X_validation, X_test
• Example usage with an indeterminate length tuple:
@unpack_fields("X_train" "X_validation", "X_test")
def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, ...]:
"""Randomly split data into train, validation, test"""
X_train, X_validation, X_test = random_split(X)
return X_train, X_validation, X_test
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
04/07/2025, 2:44 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/25d18812a57377e553abe9f55f95172cf531ea06|25d18812>
- Fix local (Windows) tests (#1302)
DAGWorks-Inc/hamiltonGitHub
04/07/2025, 2:45 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/fa85f36afb6fbdc14864ab7682e13b90796138e0|fa85f36a>
- Wires through REACT_APP_HAMILTON_SUB_PATH for docker (#1301)
DAGWorks-Inc/hamiltonGitHub
04/12/2025, 11:15 PMAwesome thanks. @omsawant-coder how about writing up a documentation section on https://hamilton.dagworks.io/en/latest/how-tos/microservice/ that uses the example linked?
So task would be:
• Provide an explanation of how you could use Hamilton in a FastAPI webservice using Asynchronous python, using the example code we already have.Originally posted by @skrawcz in #1186 DAGWorks-Inc/hamilton
GitHub
05/09/2025, 3:02 AMextract_fields
so that it can infer field types from the type annotation.
Warning
Important: This PR is based on the branch in #1303 and not main
. Recommend merging that branch first. Sorry for the confusion!
## Changes
• Isolated the field extraction logic in a helper function called _process_extract_fields
this function determines field types when necessary before calling the preexisting helper _validate_extract_fields
• The extract_fields
class now calls _process_extract_fields
directly (instead of _validate_extract_fields
)
• Documentation on using extract_fields
was updated to include unpacked field names, list of field names, and the previously undocumented TypedDict
## How I tested this
• Added test cases to validate the functionality of the extract_fields
decorator with inferred field types
• Updated and consolidated existing annotation checks to handle explicit field types, inferred field types, and TypedDicts
## Notes
To use this feature you must specify a generic dictionary with valid type paramerters - therefore it will only work for homogenous dictionaries. For example, the following would extract the standard X_train
, X_test
, y_train
, and y_test
as np.ndarray
by using unpacked field names:
@extract_fields('X_train', 'X_test' 'y_train' 'y_test') # unpacked field names
def train_test_split_func(...) -> Dict[str, np.ndarray]:
...
return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}
You can also pass a list of field names to the first argument:
@extract_fields(['X_train', 'X_test' 'y_train' 'y_test']) # list of field names
def train_test_split_func(...) -> Dict[str, np.ndarray]:
...
return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}
This also preserves backward compatibility with non-generic dictionaries:
@extract_fields(dict( # fields specified as a dictionary
X_train=np.ndarray,
X_validation=np.ndarray,
X_test=np.ndarray,
))
def train_test_split_func(...) -> Dict:
...
return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
05/09/2025, 3:54 AMdlt
in dlt-hub/dlt#2430 moved the loader_file_format
parameter from pipeline.normalize
to pipeline.extract
. This caused CI tests for the dlt
plugin to fail (most notably in #1305).
## Changes
Updated tests/plugins/test_dlt_extensions.py
, moving loader_file_format
from pipeline.normalize
to pipeline.extract
.
## How I tested this
Covered by existing tests/plugins/test_dlt_extensions.py
.
## Notes
N/A
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
05/09/2025, 12:12 PMcustom_style_function
I am able to adjust the style with the node
and `node_class `values. And I can adjust the style of the function
- node_class
, but do not get the input
or output
classes to be changed.
DAGWorks-Inc/hamiltonGitHub
05/12/2025, 4:47 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by skrawcz
<https://github.com/DAGWorks-Inc/hamilton/commit/fe481ad1a2279a6467b6c9abf49096fe362ff685|fe481ad1>
- Update dlt
plugin with changes to loader_file_format
DAGWorks-Inc/hamiltonGitHub
05/14/2025, 7:04 PM/github subscribe DAGWorks-Inc/hamilton
GitHub
05/14/2025, 7:04 PM/github subscribe DAGWorks-Inc/hamilton
Stefan Krawczyk
05/15/2025, 8:32 PM