GitHub
03/03/2025, 5:39 PMrich
for the curious). Currently, for task-based DAGs, TaskExecutionHook
will only fire before and after a task is executed. The hooks have no knowledge of the overall task landscape, including:
1. Number (and index) of tasks in the current group
2. Overall groups in the graph
3. Details about the expander task parameterization
4. Type of current task (expander, collector, etc.)
5. Spawning task ID (if available)
Note: Item 1 was originally discussed on Slack: https://hamilton-opensource.slack.com/archives/C03MANME6G5/p1728403433108319
Describe the solution you'd like
After speaking with @elijahbenizzy, an initial implementation for item 1 was suggested that modifies the TaskImplementation
object to store the current task index and the total number of tasks. This information would then be wired through various methods in the ExecutionState
class and be eventually passed to the lifecycle hooks run_after_task_execution
and run_before_task_execution
on TaskExecutionHook
.
While implementing the above in a test branch (https://github.com/cswartzvi/hamilton/tree/update_task_execution_hook) I found that it was still difficult to create a multi-level progress bar without some of the information in item 2-5. To that end I also added:
• spawning_task_id
and purpose
to the methods and hooks associated with TaskExecutionHook
• Created a new hook post_task_group
that runs after the tasks are grouped
• Created a new hook post_task_expand
that runs after the expander task is parameterized
With these additional changes (also in the branch above) I was able to create my coveted multi-level progress bar:
class TaskProgressHook(TaskExecutionHook, TaskGroupingHook, GraphExecutionHook):
def __init__(self) -> None:
self._console = rich.console.Console()
self._progress = rich.progress.Progress(console=self._console)
def run_before_graph_execution(self, **kwargs: Any):
pass
def run_after_graph_execution(self, **kwargs: Any):
self._progress.stop() # in case progress thread is lagging
def run_after_task_grouping(self, *, tasks: List[TaskSpec], **kwargs):
self._progress.add_task("Running Task Groups:", total=len(tasks))
self._progress.start()
def run_after_task_expansion(self, *, parameters: dict[str, Any], **kwargs):
self._progress.add_task("Running Parallelizable:", total=len(parameters))
def run_before_task_execution(self, *, purpose: NodeGroupPurpose, **kwargs):
if purpose == NodeGroupPurpose.GATHER:
self._progress.advance(self._progress.task_ids[0])
self._progress.stop_task(self._progress.task_ids[-1])
def run_after_task_execution(self, *, purpose: NodeGroupPurpose, **kwargs):
if purpose == NodeGroupPurpose.EXECUTE_BLOCK:
self._progress.advance(self._progress.task_ids[-1])
else:
self._progress.advance(self._progress.task_ids[0])
Multi-Level-Progress
Maybe I reached a little too far with this for my own selfish goals 😄, either way please let me know if you would be interested in a PR for any, or all, of the changes to the task lifecycle adapters (heck, I would also be willing to add rich plugins if you like that as well). Thanks!
Additional context
Currently, the build-in lifecycle adapter ProgressBar
has an indeterminate length for task-based DAGs.
DAGWorks-Inc/hamiltonGitHub
03/03/2025, 6:13 PM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by skrawcz
<https://github.com/DAGWorks-Inc/hamilton/commit/fbded2f6e07e6207327f65b235f5e8c56ae5b25d|fbded2f6>
- Pin ddtrace
until deprecated module can be evaluated (#1288)
DAGWorks-Inc/hamiltonGitHub
03/03/2025, 10:08 PMIOManager
to decouple I/O from computations
• It incorrectly states that asset descriptions have to be defined via metadata, which is not correct (they can be defined in native function docstrings or via the @asset(description=...
argument).
• issues with the second example:
• it incorrectly states that the Dagster job can't be executed in a local Python process
• it incorrectly states that I/O and computations are coupled (duplicate)
• the comparison between loading environment variables at runtime and providing configuration time references like dagster.EnvVar
does not make much sense. Dagster's configuration purposely enables deferring the setting of the exact configuration parameters (since Dagster runs can be executed remotely, e.g. in a Kubernetes pod, and the env var might not be available outside of the remote system). But nothing is preventing the user from setting the value with os.getenv
directly if needed.
Minor (in the main comparison table):
• important and unique Dagster features such as Declarative Materialization and Pipes are not mentioned
• data versioning comparison is a bit strange: it's not very clear how does Hamilton automatically identify code versions (e.g. how does it distinguish between refactoring-like changes and changes in the actual business logic). Dagster's data versioning system enforces explicit code version management to avoid unwanted expensive materializations of the entire asset graph (see: declarative automation).
• important Dagster integrations such as dagster-dbt are not mentioned
Current behavior
The Dagster example is not using relevant Dagster features and provides inaccurate information.
Expected behavior
The comparison between Hamilton and Dagster should use analogous features in both frameworks to be fair. In particular, it should use the IOManager
as it's one of the main selling points of Dagster:
import dagster as dg
import pandas as pd
@dg.asset
def topstory_ids() -> pd.DataFrame: ...
@dg.asset
def topstories(topstory_ids: pd.DataFrame): ...
Note that some of the popular IOManagers for Pandas and Polars also support loading a subset of the dataframe columns: @asset(metadata={"columns": ["title"]})
.
It should also provide accurate information on other topics mentioned above.
Additional context
Technically, this is not a bug, but I couldn't find a better label for this issue.
I am willing to help with improving these docs if my help is considered welcome!
DAGWorks-Inc/hamiltonGitHub
03/08/2025, 2:41 PMGitHub
03/10/2025, 2:29 AMGitHub
03/12/2025, 3:03 PMGitHub
03/12/2025, 6:11 PM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by skrawcz
<https://github.com/DAGWorks-Inc/hamilton/commit/a14aefa476bb4717cd63d6f2c585803e4194a0b5|a14aefa4>
- fix type hint in parallel-task.rst documentation. Use RST link formatting instead of markdown.
DAGWorks-Inc/hamiltonGitHub
03/16/2025, 12:21 AMlist_available_variables
has topological sort documented nad observed. No ordering specified now so we can make it more specific.
Describe alternatives you've considered
• add a topological sort utility function
• add this to HamiltonGraph
Additional context
See slack
DAGWorks-Inc/hamiltonGitHub
03/17/2025, 4:19 AMprefect
• This branch is based on #1287 and will need to be rebased if/when accepted
## Changes
This PR adds a pair of adapters (in hamiton.plugins.h_logging
) named LoggingAdapter
and AsyncLoggingAdapter
that can be used to log the following state in a the execution of a graph (supporting both V1/V2 drivers where applicable):
• Graph start (GraphExecutionHook
)
• Task grouping (TaskGroupingHook
)
• Task submission (TaskSubmissionHook
)
• Task pre-execution (TaskExecutionHook
))
• Node pre-execution (NodeExecutionHook
)
• Node post-execution (NodeExecutionHook
)
• Task post-execution (TaskExecutionHook
)
• Task resolution (TaskResolutionHook
)
• Graph completion (GraphExecutionHook
)
These adapters keep track of their current execution context by using an internal ContextVar
. This allows the log to have a context dependent prefix, by way of an internal custom log adapter, such as
Graph run 'c7236c13-94ca-4e5e-85a6-2f32af054736' - Starting graph execution
or
`Task 'expand-stargazer_url.0.block-stargazer_url' - Task completed [OK]
Additionally, a function called get_logger
was added that returns the custom log adapter which the user can use to create context-aware logs from within a node. For example the following log inside node a
...
from hamilton.plugins.h_logging import get_logger
def a() -> str:
logger = get_logger("name_or_logger_or_none")
logger.warning("Encountered a warning")
will generate the following:
Node 'a' - Encountered a warning
See the Notes section for some examples
## How I tested this
I added tests for both LoggingAdapter
and AsyncLoggingAdapter
. The tests for LoggingAdapter
examine the V1 driver with and without the FutureAdapter
and the V2 driver with the synchronous, multi-threading, multi-process, ray, and dask task-based executors. The tests may be a little brittle because some are dependent on undocumented task id naming conventions. Open to suggestions 😄.
## Examples
Here I would like to present some examples. I used a logging configuration based on rich - but any will suffice.
### Synchronous - non-branching
Standard node-based Hamilton graphs using the LoggingAdapter
...
def a() -> str:
return "a"
def b(a: str) -> str:
return a + " b"
def c(b: str) -> str:
return b + " c"
will produce the following ...
INFO Graph run '24fd50b2-760b-4cf2-b1e1-61476c18e9b3' - Starting graph execution
DEBUG Node 'a' - Starting execution without dependencies
INFO Node 'a' - Finished execution [OK]
DEBUG Node 'b' - Starting execution with dependencies 'a'
INFO Node 'b' - Finished execution [OK]
DEBUG Node 'c' - Starting execution with dependencies 'b'
INFO Node 'c' - Finished execution [OK]
INFO Graph run '24fd50b2-760b-4cf2-b1e1-61476c18e9b3' - Finished graph execution [OK]
### Synchronous - branching
Branching graphs with multiple possible paths using the LoggingAdapter
(with or without the FutureAdapter
)...
def a() -> str:
return "a"
def b() -> str:
return "b"
def c() -> str:
return "c"
def d(a: str, b: str) -> str:
return a + " " + b + " d"
def e(c: str) -> str:
return c + " e"
def f(d: str, e: str) -> str:
return d + " " + e + " f"
will produce the following (with a potentially different order)...
INFO Graph run '31d4bc1d-8020-4e24-91a5-732a03497f1a' - Starting graph execution
DEBUG Node 'c' - Submitting async node without dependencies
DEBUG Node 'a' - Submitting async node without dependencies
DEBUG Node 'b' - Submitting async node without dependencies
DEBUG Node 'd' - Submitting async node with dependencies 'a', 'b'
DEBUG Node 'e' - Submitting async node with dependencies 'c'
DEBUG Node 'f' - Submitting async node with dependencies 'd', 'e'
INFO Node 'c' - Finished execution [OK]
INFO Node 'a' - Finished execution [OK]
INFO Node 'b' - Finished execution [OK]
INFO Node 'e' - Finished execution [OK]
INFO Node 'd' - Finished execution [OK]
INFO Node 'f' - Finished execution [OK]
INFO Graph run '31d4bc1d-8020-4e24-91a5-732a03497f1a' - Finished graph execution [OK]
### Asynchronous - Branching
Async branching graphs are also supported using the AsyncLoggingAdapter
but must be used with the async driver. There is a little weakness in this adapter dealing with the current state of hooks in the async adapter - see the code for more details.
async def a() -> str:
return "a"
async def b() -> str:
return "b"
async def c() -> str:
return "c"
async def d(a: str, b: str) -> str:
return a + " " + b + " d"
async def e(c: str) -> str:
return c + " e"
async def f(d: str, e: str) -> str:
return d + " " + e + " f"
These will produce the following (again with a potentially different order)...
DEBUG Node 'a' - Submitting async node without dependencies
DEBUG Node 'b' - Submitting async node without dependencies
DEBUG Node 'd' - Submitting async node with dependencies 'a', 'b'
DEBUG Node 'c' - Submitting async node without dependencies
DEBUG Node 'e' - Submitting async node with dependencies 'c'
DEBUG Node 'f' - Submitting async node with dependencies 'd', 'e'
INFO Node 'a' - Finished execution [OK]
INFO Node 'b' - Finished execution [OK]
INFO Node 'c' - Finished execution [OK]
INFO Node 'd' - Finished execution [OK]
INFO Node 'e' - Finished execution [OK]
INFO Node 'f' - Finished execution [OK]
INFO Graph run '38deee89-4a91-4253-b913-ce3c1e60b791' - Finished graph execution [OK]
### Task based
Task-based executors (synchronous, threading, processing, ray, dask ,...) are also supported with the LoggingAdapter
. Note however that user context logs from inside a node may (threading, ray) or may not (multiprocessing, dask) be supported. Some executors provide log pass throughs (ray) the produce slightly different results.
def b(a: int) -> int:
return a
def c(b: int) -> Parallelizable[int]:
for i in range(b):
yield i
def d(c: int) -> int:
return 2 * c
def e(d: Collect[int]) -> int:
return sum(d)
def f(e: int) -> int:
return e
These will produce the following (with a potentially different order) when a = 2
...
```
INFO Graph run 'dc4ec112-ba50-44ae-8b64-11b0ad099f74' - Starting graph execution
INFO Graph run 'dc4ec112-ba50-44ae-8b64-11b0ad099f74' - Using inputs 'a'
INFO Graph run 'dc4ec112-ba50-44ae-8b64-11b0ad099f74' - Dynamic DAG detected; task-based logging is enabled
DEBUG Task 'a' - Initializing new task and submitting to executor
DEBUG Task 'a' - Task completed [OK]
DEBUG Task 'b' - Initializing new task and submitting to executor
DEBUG Task 'b' - Starting execution
DEBUG Task 'b' - Starting execution with dependencies 'a'
DEBUG Task 'b' - Node 'b' - Finished execution [OK]
DEBUG Task 'b' - Finished execution [Ok]
INFO Task 'b' - Task completed [OK]
DEBUG Task 'expand-c' - Initializing new task and submitting to executor
DEBUG Task 'expand-c' - Starting execution of nodes 'c'
DEBUG Task 'expand-c' - Starting execution with dependencies 'b'
DEBUG Task 'expand-c' - Node 'c' - Finished execution [OK]
DEBUG Task 'expand-c' - Finished execution [Ok]
INFO …
DAGWorks-Inc/hamiltonGitHub
03/21/2025, 12:19 AMxgboost
errors in CI related to test_xgboost_booster_json_writer
and test_xgboost_booster_json_reader
where the following error is encountered:
Check failed: base_score > 0.0f && base_score < 1.0f: base_score must be in (0,1) for logistic loss, got: 0
Currently blocking #1287 and #1294
## Changes
I added a default base_score
to the fitted_xgboost_booster
fixture.
## How I tested this
N/A
## Notes
Oddly enough this, these errors never show up on my machine (Windows). Which is usually reversed!
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
03/21/2025, 3:59 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/25422f485725b95fedea4c4d83a6b701b320befe|25422f48>
- Add base_score
to fitted_xgboost_booster
🤞 (#1295)
DAGWorks-Inc/hamiltonGitHub
03/21/2025, 4:12 AMGitHub
03/21/2025, 4:12 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/b4a03e7a1a6f1b96ad7d5757b9ed22f0abb16ab7|b4a03e7a>
- Bumps sf-hamilton-ui version to 0.0.17 (#1296)
DAGWorks-Inc/hamiltonGitHub
03/25/2025, 3:43 PMfrom __future__ import annotations
and hamilton.function_modifiers.datasaver
in conjunction. Since having the annotations included is a default in e.g. ruff
(see https://docs.astral.sh/ruff/rules/future-required-type-annotation/), it can lead to friction.
# Current behavior
Using from __future__ import annotations
breaks datasaver usage.
The internal test if return_annotation not in (dict, Dict)
fails, since the return_annotation
becomes a string in this scenario
## Stack Traces
Traceback (most recent call last):
File "~/playground/hamilton_saver.py", line 6, in <module>
@datasaver() # you need ()
^^^^^^^^^^^
File "~/.venv/lib/python3.11/site-packages/hamilton/function_modifiers/base.py", line 60, in replace__call__
return call_fn(self, fn)
^^^^^^^^^^^^^^^^^
File "~/.venv/lib/python3.11/site-packages/hamilton/function_modifiers/base.py", line 102, in __call__
self.validate(fn)
File "~/.venv/lib/python3.11/site-packages/hamilton/function_modifiers/adapters.py", line 864, in validate
raise InvalidDecoratorException(f"Function: {fn.__qualname__} must return a dict.")
hamilton.function_modifiers.base.InvalidDecoratorException: Function: save_json_data must return a dict.
## Steps to replicate behavior
1. Take datasaver example from https://hamilton.dagworks.io/en/latest/reference/decorators/datasaver/
2. Add from __future__ import annotations
at the top of the file
## Library & System Information
Ubuntu 24.04
Python 3.11.11
hamilton 1.86.1
# Expected behavior
datasaver should work with __future__.annotations
?
# Note
simply extending the check to if return_annotation not in (dict, Dict, 'dict'):
fixes the issue.
DAGWorks-Inc/hamiltonGitHub
03/28/2025, 4:04 PMfunction_modifiers.adapters.datasaver.validate
.
## Changes
• modified datasaver.validate
to check against __future__.annotations
compatible type hints
## How I tested this
• added from __future__ import annotations
to test_adapters
• ran pytest tests/function_modifiers
→ test failed
• made above changes
• ran pytest tests/function_modifiers
→ test succeeded
## Notes
correct_ds_function
in test_adapters.py
uses an alias for `dict`: dict_ = dict
. However, for the string comparison this is not resolved to `dict`; added dict_
to the validation for now.GitHub
03/29/2025, 6:12 PM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/06e1c209ee34a184516885e78db2d5d1a233a282|06e1c209>
- Add Task Submission / Return Hooks (#1287)
DAGWorks-Inc/hamiltonGitHub
03/29/2025, 6:15 PM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/cda7a9608b43d1660273f2f249ef49df638b1d23|cda7a960>
- Add Context-Aware Synchronous/Asynchronous Logging Adapters (#1294)
DAGWorks-Inc/hamiltonGitHub
03/29/2025, 6:22 PMGitHub
03/29/2025, 6:47 PM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/7437f038f8f5a1a670b82e55b59274c34c88cbe5|7437f038>
- Bumps sf-hamilton version from 1.87.0 to 1.88.0 (#1299)
DAGWorks-Inc/hamiltonGitHub
03/29/2025, 6:55 PMddtrace
until deprecated module can be evaluated by @cswartzvi in #1288
• Fix a Type Hint and Link in Parallel Execution Documentation by @bustosalex1 in #1292
• Fix xgboost
errors in CI by @cswartzvi in #1295
• Bumps sf-hamilton-ui version to 0.0.17 by @elijahbenizzy in #1296
• Add Task Submission / Return Hooks by @cswartzvi in #1287
• Add Context-Aware Synchronous/Asynchronous Logging Adapters by @cswartzvi in #1294
• Bumps sf-hamilton version from 1.87.0 to 1.88.0 by @elijahbenizzy in #1299
## New Contributors
• @jonas-meyer made their first contribution in #1284
Full Changelog: sf-hamilton-1.87.0...sf-hamilton-1.88.0
DAGWorks-Inc/hamiltonGitHub
03/29/2025, 6:55 PMGitHub
03/31/2025, 6:54 PMPreview
tab and select the appropriate sub-template:
• <?expand=1&template=HAMILTON_CONTRIB_PR_TEMPLATE.md|sf-hamilton-contrib template>
Else, if not, please remove this block of text.
--- PR TEMPLATE INSTRUCTIONS (2) ---
[Short description explaining the high-level reason for the pull request]
## Changes
## How I tested this
## Notes
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
04/03/2025, 6:16 AM/hamilton3
and then serving the UI from that subpath.
## Notes
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
04/06/2025, 2:57 AMget_file_metadata
function to correctly handle Windows drive paths where the scheme
from parse.urlparse
may include the Windows drive letter.
### Testing Fixture Updates:
• `tests/caching`: Because the metadata_store
and result_store
used the same temporary directory, deletions during clean-up were running into Window's file share locking. Switched to the tmp_path_factory
fixture and decoupled the paths for the metadata_store
and result_store
### Environment Variable Mocking:
• `tests/plugins/test_pandas_extensions.py`: Added mocking for the TZDIR
environment variable in the test_pandas_orc_reader
test. Note this is due to how Windows interacts with the IANA timezone database.
• `tests/test_telemetry.py`: Added mocking for the HAMILTON_TELEMETRY_ENABLED
environment variable in telemetry configuration tests. Previous tests were changing os.environ
directly leading to issues if the user already had ``HAMILTON_TELEMETRY_ENABLED` set.
### Platform-Specific Test Adjustments:
• `tests/plugins/test_plotly_extensions.py`: Added a platform check to skip the test_plotly_static_writer
test on Windows. There are some issue with using the plotly
dependency kaleido
to generate static images on Windows.
## How I tested this
N/A
## Notes
N/A
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
04/06/2025, 1:34 PMunpack_fields
which allows for the extraction of fields from a tuple output (a cross between extract_columns
and extract_fields
).
## Changes
• Added unpack_fields
decorator to hamilton/function_modifiers/__init__.py
and implemented its logic in hamilton/function_modifiers/expanders.py
. This decorator enables the extraction of fields from a tuple output, expanding a single function into multiple nodes.
## How I tested this
• Added multiple test cases in tests/function_modifiers/test_expanders.py
to validate the functionality of the unpack_fields
decorator, including tests for valid and invalid type annotations and different tuple configurations.
## Notes
• Example usage with an fixed size tuple:
@unpack_fields("X_train" "X_validation", "X_test")
def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Randomly split data into train, validation, test"""
X_train, X_validation, X_test = random_split(X)
return X_train, X_validation, X_test
• Example usage with a subset of fixed length tuples:
@unpack_fields("X_train")
def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Randomly split data into train, validation, test"""
X_train, X_validation, X_test = random_split(X)
return X_train, X_validation, X_test
• Example usage with an indeterminate length tuple:
@unpack_fields("X_train" "X_validation", "X_test")
def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, ...]:
"""Randomly split data into train, validation, test"""
X_train, X_validation, X_test = random_split(X)
return X_train, X_validation, X_test
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
04/07/2025, 2:44 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/25d18812a57377e553abe9f55f95172cf531ea06|25d18812>
- Fix local (Windows) tests (#1302)
DAGWorks-Inc/hamiltonGitHub
04/07/2025, 2:45 AM<https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
by elijahbenizzy
<https://github.com/DAGWorks-Inc/hamilton/commit/fa85f36afb6fbdc14864ab7682e13b90796138e0|fa85f36a>
- Wires through REACT_APP_HAMILTON_SUB_PATH for docker (#1301)
DAGWorks-Inc/hamiltonGitHub
04/12/2025, 11:15 PMAwesome thanks. @omsawant-coder how about writing up a documentation section on https://hamilton.dagworks.io/en/latest/how-tos/microservice/ that uses the example linked?
So task would be:
• Provide an explanation of how you could use Hamilton in a FastAPI webservice using Asynchronous python, using the example code we already have.Originally posted by @skrawcz in #1186 DAGWorks-Inc/hamilton
GitHub
05/09/2025, 3:02 AMextract_fields
so that it can infer field types from the type annotation.
Important: This PR is based on the branch in #1303 and not main
. Recommend merging that branch first. Sorry for the confusion!
## Changes
• Isolated the field extraction logic in a helper function called _process_extract_fields
this function determines field types when necessary before calling the preexisting helper _validate_extract_fields
• The extract_fields
class now calls _process_extract_fields
directly (instead of _validate_extract_fields
)
• Documentation on using extract_fields
was updated to include unpacked field names, list of field names, and the previously undocumented TypedDict
## How I tested this
• Added test cases to validate the functionality of the extract_fields
decorator with inferred field types
• Updated and consolidated existing annotation checks to handle explicit field types, inferred field types, and TypedDicts
## Notes
To use this feature you must specify a generic dictionary with valid type paramerters - therefore it will only work for homogenous dictionaries. For example, the following would extract the standard X_train
, X_test
, y_train
, and y_test
as np.ndarray
by using unpacked field names:
@extract_fields('X_train', 'X_test' 'y_train' 'y_test') # unpacked field names
def train_test_split_func(...) -> Dict[str, np.ndarray]:
...
return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}
You can also pass a list of field names to the first argument:
@extract_fields(['X_train', 'X_test' 'y_train' 'y_test']) # list of field names
def train_test_split_func(...) -> Dict[str, np.ndarray]:
...
return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}
This also preserves backward compatibility with non-generic dictionaries:
@extract_fields(dict( # fields specified as a dictionary
X_train=np.ndarray,
X_validation=np.ndarray,
X_test=np.ndarray,
))
def train_test_split_func(...) -> Dict:
...
return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamiltonGitHub
05/09/2025, 3:54 AMdlt
in dlt-hub/dlt#2430 moved the loader_file_format
parameter from pipeline.normalize
to pipeline.extract
. This caused CI tests for the dlt
plugin to fail (most notably in #1305).
## Changes
Updated tests/plugins/test_dlt_extensions.py
, moving loader_file_format
from pipeline.normalize
to pipeline.extract
.
## How I tested this
Covered by existing tests/plugins/test_dlt_extensions.py
.
## Notes
N/A
## Checklist
• PR has an informative and human-readable title (this will be pulled into the release notes)
• Changes are limited to a single goal (no scope creep)
• Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
• Any change in functionality is tested
• New functions are documented (with a description, list of inputs, and expected output)
• Placeholder code is flagged / future TODOs are captured in comments
• Project documentation has been updated if adding/changing functionality.
DAGWorks-Inc/hamilton