https://github.com/stitchfix/hamilton logo
Join Slack
Powered by
# github-changelog
  • g

    GitHub

    03/03/2025, 5:39 PM
    #1196 Expanding Lifecycle Adapters for Dynamic DAGs / Parallel Execution Issue created by cswartzvi Is your feature request related to a problem? Please describe. I hit a bit of a snag while creating some custom multi-level progress bar lifecycle adapters for task-based parallel DAGs (with
    rich
    for the curious). Currently, for task-based DAGs,
    TaskExecutionHook
    will only fire before and after a task is executed. The hooks have no knowledge of the overall task landscape, including: 1. Number (and index) of tasks in the current group 2. Overall groups in the graph 3. Details about the expander task parameterization 4. Type of current task (expander, collector, etc.) 5. Spawning task ID (if available) Note: Item 1 was originally discussed on Slack: https://hamilton-opensource.slack.com/archives/C03MANME6G5/p1728403433108319 Describe the solution you'd like After speaking with @elijahbenizzy, an initial implementation for item 1 was suggested that modifies the
    TaskImplementation
    object to store the current task index and the total number of tasks. This information would then be wired through various methods in the
    ExecutionState
    class and be eventually passed to the lifecycle hooks
    run_after_task_execution
    and
    run_before_task_execution
    on
    TaskExecutionHook
    . While implementing the above in a test branch (https://github.com/cswartzvi/hamilton/tree/update_task_execution_hook) I found that it was still difficult to create a multi-level progress bar without some of the information in item 2-5. To that end I also added: •
    spawning_task_id
    and
    purpose
    to the methods and hooks associated with
    TaskExecutionHook
    • Created a new hook
    post_task_group
    that runs after the tasks are grouped • Created a new hook
    post_task_expand
    that runs after the expander task is parameterized With these additional changes (also in the branch above) I was able to create my coveted multi-level progress bar: class TaskProgressHook(TaskExecutionHook, TaskGroupingHook, GraphExecutionHook): def __init__(self) -> None: self._console = rich.console.Console() self._progress = rich.progress.Progress(console=self._console) def run_before_graph_execution(self, **kwargs: Any): pass def run_after_graph_execution(self, **kwargs: Any): self._progress.stop() # in case progress thread is lagging def run_after_task_grouping(self, *, tasks: List[TaskSpec], **kwargs): self._progress.add_task("Running Task Groups:", total=len(tasks)) self._progress.start() def run_after_task_expansion(self, *, parameters: dict[str, Any], **kwargs): self._progress.add_task("Running Parallelizable:", total=len(parameters)) def run_before_task_execution(self, *, purpose: NodeGroupPurpose, **kwargs): if purpose == NodeGroupPurpose.GATHER: self._progress.advance(self._progress.task_ids[0]) self._progress.stop_task(self._progress.task_ids[-1]) def run_after_task_execution(self, *, purpose: NodeGroupPurpose, **kwargs): if purpose == NodeGroupPurpose.EXECUTE_BLOCK: self._progress.advance(self._progress.task_ids[-1]) else: self._progress.advance(self._progress.task_ids[0]) Multi-Level-Progress Maybe I reached a little too far with this for my own selfish goals 😄, either way please let me know if you would be interested in a PR for any, or all, of the changes to the task lifecycle adapters (heck, I would also be willing to add rich plugins if you like that as well). Thanks! Additional context Currently, the build-in lifecycle adapter
    ProgressBar
    has an indeterminate length for task-based DAGs. DAGWorks-Inc/hamilton
    • 1
    • 1
  • g

    GitHub

    03/03/2025, 6:13 PM
    1 new commit pushed to
    <https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
    by skrawcz
    <https://github.com/DAGWorks-Inc/hamilton/commit/fbded2f6e07e6207327f65b235f5e8c56ae5b25d|fbded2f6>
    - Pin
    ddtrace
    until deprecated module can be evaluated (#1288) DAGWorks-Inc/hamilton
  • g

    GitHub

    03/03/2025, 10:08 PM
    #1289 Dagster comparison is not correct Issue created by danielgafni Hi! I was curious about Hamilton because I was looking for a lightweight DAG library. Coming from Dagster, I got naturally interested in the Hamilton vs Dagster comparison and found this page in the docs: https://hamilton.dagworks.io/en/latest/code-comparisons/dagster/ I noticed it does not provide accurate information about Dagster and the code examples are not using some of Dagster's main features. More concrete: • issues with the first example: • It does not utilize the
    IOManager
    to decouple I/O from computations • It incorrectly states that asset descriptions have to be defined via metadata, which is not correct (they can be defined in native function docstrings or via the
    @asset(description=...
    argument). • issues with the second example: • it incorrectly states that the Dagster job can't be executed in a local Python process • it incorrectly states that I/O and computations are coupled (duplicate) • the comparison between loading environment variables at runtime and providing configuration time references like
    dagster.EnvVar
    does not make much sense. Dagster's configuration purposely enables deferring the setting of the exact configuration parameters (since Dagster runs can be executed remotely, e.g. in a Kubernetes pod, and the env var might not be available outside of the remote system). But nothing is preventing the user from setting the value with
    os.getenv
    directly if needed. Minor (in the main comparison table): • important and unique Dagster features such as Declarative Materialization and Pipes are not mentioned • data versioning comparison is a bit strange: it's not very clear how does Hamilton automatically identify code versions (e.g. how does it distinguish between refactoring-like changes and changes in the actual business logic). Dagster's data versioning system enforces explicit code version management to avoid unwanted expensive materializations of the entire asset graph (see: declarative automation). • important Dagster integrations such as dagster-dbt are not mentioned Current behavior The Dagster example is not using relevant Dagster features and provides inaccurate information. Expected behavior The comparison between Hamilton and Dagster should use analogous features in both frameworks to be fair. In particular, it should use the
    IOManager
    as it's one of the main selling points of Dagster: import dagster as dg import pandas as pd @dg.asset def topstory_ids() -> pd.DataFrame: ... @dg.asset def topstories(topstory_ids: pd.DataFrame): ... Note that some of the popular IOManagers for Pandas and Polars also support loading a subset of the dataframe columns:
    @asset(metadata={"columns": ["title"]})
    . It should also provide accurate information on other topics mentioned above. Additional context Technically, this is not a bug, but I couldn't find a better label for this issue. I am willing to help with improving these docs if my help is considered welcome! DAGWorks-Inc/hamilton
  • g

    GitHub

    03/08/2025, 2:41 PM
    #1290 Question about Dynamic Task Branching in Hamilton Issue created by ReCodeLife I'm new to Hamilton and have a question about implementing dynamic task branching based on a function's return value. I'm trying to create a data flow where, after running a function get_number(), the next task to execute depends on its return value. For example, if it returns 1, I want to execute func1(), and if it returns 2, I want to execute func2(). Here's a simplified version of my code: def get_number(): # In reality, this function might have complex logic return 1 # or sometimes 2 def func1(number): # Logic to handle when get_number() returns 1 pass def func2(number): # Logic to handle when get_number() returns 2 pass Since Hamilton is based on static DAGs, I'm struggling to find a way to implement this dynamic branching. I've checked the documentation but haven't found a clear solution. Does anyone know if Hamilton supports this kind of dynamic task branching? If not, what would be the recommended approach or pattern to achieve this functionality? Thank you for your help and guidance! DAGWorks-Inc/hamilton
    • 1
    • 1
  • g

    GitHub

    03/10/2025, 2:29 AM
    #1291 findspark not found in an example notebook Issue created by yungchidanielcho Short description explaining the high-level reason for the new issue. # Current behavior go to example notebook and run it. Can't import findspark ## Stack Traces (If applicable) ## Screenshots (If applicable) ## Steps to replicate behavior 1.https://colab.research.google.com/github/dagworks-inc/hamilton/blob/main/examples/spark/pyspark_feature_catalog/example_usage.ipynb#scrollTo=wpnAHHzM3DLp ## Library & System Information E.g. python version, hamilton library version, linux, etc. # Expected behavior # Additional context Add any other context about the problem here. DAGWorks-Inc/hamilton
  • g

    GitHub

    03/12/2025, 3:03 PM
    #1292 Fix a Type Hint and Link in Parallel Execution Documentation Pull request opened by bustosalex1 Just addressing 2 small issues I noticed in the Parallel Execution documentation. Thanks for working on this great project! ## Changes • Fixed (what I presume) is a typo in the docs for Parallel Execution, as well as a link that was formatted for markdown instead of RST. ## How I tested this • Built docs locally and verified updates. ## Checklist • [ x] PR has an informative and human-readable title (this will be pulled into the release notes) • Changes are limited to a single goal (no scope creep) • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered. • Any change in functionality is tested • New functions are documented (with a description, list of inputs, and expected output) • Placeholder code is flagged / future TODOs are captured in comments • Project documentation has been updated if adding/changing functionality. DAGWorks-Inc/hamilton
    • 1
    • 1
  • g

    GitHub

    03/12/2025, 6:11 PM
    1 new commit pushed to
    <https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
    by skrawcz
    <https://github.com/DAGWorks-Inc/hamilton/commit/a14aefa476bb4717cd63d6f2c585803e4194a0b5|a14aefa4>
    - fix type hint in parallel-task.rst documentation. Use RST link formatting instead of markdown. DAGWorks-Inc/hamilton
  • g

    GitHub

    03/16/2025, 12:21 AM
    #1293 Topologically sorted order for `list_available_variables` Issue created by elijahbenizzy Is your feature request related to a problem? Please describe. People want this -- it's a nice guarantee to have. Describe the solution you'd like
    list_available_variables
    has topological sort documented nad observed. No ordering specified now so we can make it more specific. Describe alternatives you've considered • add a topological sort utility function • add this to
    HamiltonGraph
    Additional context See slack DAGWorks-Inc/hamilton
  • g

    GitHub

    03/17/2025, 4:19 AM
    #1294 Add Context-Aware Synchronous/Asynchronous Logging Adapters Pull request opened by cswartzvi This PR is a follow-on to #1287 utilizing the new task submission and resolution hooks to create context-aware logging adapters. Disclaimers: • I fully realize this may be too much for base Hamilton. I have been using a version of this at work for a few months and it performs well. Please do not feel any sort of bad if this is rejected - I can easily continue to use it as a separate extension • The details for context-aware logging, and structure of the logs, were larger inspired by
    prefect
    • This branch is based on #1287 and will need to be rebased if/when accepted ## Changes This PR adds a pair of adapters (in
    hamiton.plugins.h_logging
    ) named
    LoggingAdapter
    and
    AsyncLoggingAdapter
    that can be used to log the following state in a the execution of a graph (supporting both V1/V2 drivers where applicable): • Graph start (
    GraphExecutionHook
    ) • Task grouping (
    TaskGroupingHook
    ) • Task submission (
    TaskSubmissionHook
    ) • Task pre-execution (
    TaskExecutionHook
    )) • Node pre-execution (
    NodeExecutionHook
    ) • Node post-execution (
    NodeExecutionHook
    ) • Task post-execution (
    TaskExecutionHook
    ) • Task resolution (
    TaskResolutionHook
    ) • Graph completion (
    GraphExecutionHook
    ) These adapters keep track of their current execution context by using an internal
    ContextVar
    . This allows the log to have a context dependent prefix, by way of an internal custom log adapter, such as
    Copy code
    Graph run 'c7236c13-94ca-4e5e-85a6-2f32af054736' - Starting graph execution
    or
    Copy code
    `Task 'expand-stargazer_url.0.block-stargazer_url' - Task completed [OK]
    Additionally, a function called
    get_logger
    was added that returns the custom log adapter which the user can use to create context-aware logs from within a node. For example the following log inside node
    a
    ... from hamilton.plugins.h_logging import get_logger def a() -> str: logger = get_logger("name_or_logger_or_none") logger.warning("Encountered a warning") will generate the following:
    Copy code
    Node 'a' - Encountered a warning
    See the Notes section for some examples ## How I tested this I added tests for both
    LoggingAdapter
    and
    AsyncLoggingAdapter
    . The tests for
    LoggingAdapter
    examine the V1 driver with and without the
    FutureAdapter
    and the V2 driver with the synchronous, multi-threading, multi-process, ray, and dask task-based executors. The tests may be a little brittle because some are dependent on undocumented task id naming conventions. Open to suggestions 😄. ## Examples Here I would like to present some examples. I used a logging configuration based on rich - but any will suffice. ### Synchronous - non-branching Standard node-based Hamilton graphs using the
    LoggingAdapter
    ... def a() -> str: return "a" def b(a: str) -> str: return a + " b" def c(b: str) -> str: return b + " c" will produce the following ...
    Copy code
    INFO     Graph run '24fd50b2-760b-4cf2-b1e1-61476c18e9b3' - Starting graph execution
    DEBUG    Node 'a' - Starting execution without dependencies
    INFO     Node 'a' - Finished execution [OK]
    DEBUG    Node 'b' - Starting execution with dependencies 'a'
    INFO     Node 'b' - Finished execution [OK]
    DEBUG    Node 'c' - Starting execution with dependencies 'b'
    INFO     Node 'c' - Finished execution [OK]
    INFO     Graph run '24fd50b2-760b-4cf2-b1e1-61476c18e9b3' - Finished graph execution [OK]
    ### Synchronous - branching Branching graphs with multiple possible paths using the
    LoggingAdapter
    (with or without the
    FutureAdapter
    )... def a() -> str: return "a" def b() -> str: return "b" def c() -> str: return "c" def d(a: str, b: str) -> str: return a + " " + b + " d" def e(c: str) -> str: return c + " e" def f(d: str, e: str) -> str: return d + " " + e + " f" will produce the following (with a potentially different order)...
    Copy code
    INFO     Graph run '31d4bc1d-8020-4e24-91a5-732a03497f1a' - Starting graph execution
    DEBUG    Node 'c' - Submitting async node without dependencies
    DEBUG    Node 'a' - Submitting async node without dependencies
    DEBUG    Node 'b' - Submitting async node without dependencies
    DEBUG    Node 'd' - Submitting async node with dependencies 'a', 'b'
    DEBUG    Node 'e' - Submitting async node with dependencies 'c'
    DEBUG    Node 'f' - Submitting async node with dependencies 'd', 'e'
    INFO     Node 'c' - Finished execution [OK]
    INFO     Node 'a' - Finished execution [OK]
    INFO     Node 'b' - Finished execution [OK]
    INFO     Node 'e' - Finished execution [OK]
    INFO     Node 'd' - Finished execution [OK]
    INFO     Node 'f' - Finished execution [OK]
    INFO     Graph run '31d4bc1d-8020-4e24-91a5-732a03497f1a' - Finished graph execution [OK]
    ### Asynchronous - Branching Async branching graphs are also supported using the
    AsyncLoggingAdapter
    but must be used with the async driver. There is a little weakness in this adapter dealing with the current state of hooks in the async adapter - see the code for more details. async def a() -> str: return "a" async def b() -> str: return "b" async def c() -> str: return "c" async def d(a: str, b: str) -> str: return a + " " + b + " d" async def e(c: str) -> str: return c + " e" async def f(d: str, e: str) -> str: return d + " " + e + " f" These will produce the following (again with a potentially different order)...
    Copy code
    DEBUG    Node 'a' - Submitting async node without dependencies
    DEBUG    Node 'b' - Submitting async node without dependencies
    DEBUG    Node 'd' - Submitting async node with dependencies 'a', 'b'
    DEBUG    Node 'c' - Submitting async node without dependencies
    DEBUG    Node 'e' - Submitting async node with dependencies 'c'
    DEBUG    Node 'f' - Submitting async node with dependencies 'd', 'e'
    INFO     Node 'a' - Finished execution [OK]
    INFO     Node 'b' - Finished execution [OK]
    INFO     Node 'c' - Finished execution [OK]
    INFO     Node 'd' - Finished execution [OK]
    INFO     Node 'e' - Finished execution [OK]
    INFO     Node 'f' - Finished execution [OK]
    INFO     Graph run '38deee89-4a91-4253-b913-ce3c1e60b791' - Finished graph execution [OK]
    ### Task based Task-based executors (synchronous, threading, processing, ray, dask ,...) are also supported with the
    LoggingAdapter
    . Note however that user context logs from inside a node may (threading, ray) or may not (multiprocessing, dask) be supported. Some executors provide log pass throughs (ray) the produce slightly different results. def b(a: int) -> int: return a def c(b: int) -> Parallelizable[int]: for i in range(b): yield i def d(c: int) -> int: return 2 * c def e(d: Collect[int]) -> int: return sum(d) def f(e: int) -> int: return e These will produce the following (with a potentially different order) when
    a = 2
    ... ``` INFO Graph run 'dc4ec112-ba50-44ae-8b64-11b0ad099f74' - Starting graph execution INFO Graph run 'dc4ec112-ba50-44ae-8b64-11b0ad099f74' - Using inputs 'a' INFO Graph run 'dc4ec112-ba50-44ae-8b64-11b0ad099f74' - Dynamic DAG detected; task-based logging is enabled DEBUG Task 'a' - Initializing new task and submitting to executor DEBUG Task 'a' - Task completed [OK] DEBUG Task 'b' - Initializing new task and submitting to executor DEBUG Task 'b' - Starting execution DEBUG Task 'b' - Starting execution with dependencies 'a' DEBUG Task 'b' - Node 'b' - Finished execution [OK] DEBUG Task 'b' - Finished execution [Ok] INFO Task 'b' - Task completed [OK] DEBUG Task 'expand-c' - Initializing new task and submitting to executor DEBUG Task 'expand-c' - Starting execution of nodes 'c' DEBUG Task 'expand-c' - Starting execution with dependencies 'b' DEBUG Task 'expand-c' - Node 'c' - Finished execution [OK] DEBUG Task 'expand-c' - Finished execution [Ok] INFO … DAGWorks-Inc/hamilton
    • 1
    • 1
  • g

    GitHub

    03/21/2025, 12:19 AM
    #1295 Fix `xgboost` errors in CI Pull request opened by cswartzvi This is an attempt to fix
    xgboost
    errors in CI related to
    test_xgboost_booster_json_writer
    and
    test_xgboost_booster_json_reader
    where the following error is encountered:
    Copy code
    Check failed: base_score > 0.0f && base_score < 1.0f: base_score must be in (0,1) for logistic loss, got: 0
    Currently blocking #1287 and #1294 ## Changes I added a default
    base_score
    to the
    fitted_xgboost_booster
    fixture. ## How I tested this N/A ## Notes Oddly enough this, these errors never show up on my machine (Windows). Which is usually reversed! ## Checklist • PR has an informative and human-readable title (this will be pulled into the release notes) • Changes are limited to a single goal (no scope creep) • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered. • Any change in functionality is tested • New functions are documented (with a description, list of inputs, and expected output) • Placeholder code is flagged / future TODOs are captured in comments • Project documentation has been updated if adding/changing functionality. DAGWorks-Inc/hamilton
    • 1
    • 1
  • g

    GitHub

    03/21/2025, 3:59 AM
    1 new commit pushed to
    <https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
    by elijahbenizzy
    <https://github.com/DAGWorks-Inc/hamilton/commit/25422f485725b95fedea4c4d83a6b701b320befe|25422f48>
    - Add
    base_score
    to
    fitted_xgboost_booster
    🤞 (#1295) DAGWorks-Inc/hamilton
  • g

    GitHub

    03/21/2025, 4:12 AM
    #1296 Bumps sf-hamilton-ui version to 0.0.17 Pull request opened by elijahbenizzy DAGWorks-Inc/hamilton
    • 1
    • 1
  • g

    GitHub

    03/21/2025, 4:12 AM
    1 new commit pushed to
    <https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
    by elijahbenizzy
    <https://github.com/DAGWorks-Inc/hamilton/commit/b4a03e7a1a6f1b96ad7d5757b9ed22f0abb16ab7|b4a03e7a>
    - Bumps sf-hamilton-ui version to 0.0.17 (#1296) DAGWorks-Inc/hamilton
  • g

    GitHub

    03/25/2025, 3:43 PM
    #1297 hamilton.function_modifiers.datasaver does not work with __future__.annotations Issue created by kreczko I am not sure if this is expected, I could not find a related issue (apologies if it exists). At the moment it is not possible to use
    from __future__ import annotations
    and
    hamilton.function_modifiers.datasaver
    in conjunction. Since having the annotations included is a default in e.g.
    ruff
    (see https://docs.astral.sh/ruff/rules/future-required-type-annotation/), it can lead to friction. # Current behavior Using
    from __future__ import annotations
    breaks datasaver usage. The internal test
    if return_annotation not in (dict, Dict)
    fails, since the
    return_annotation
    becomes a string in this scenario ## Stack Traces
    Copy code
    Traceback (most recent call last):
      File "~/playground/hamilton_saver.py", line 6, in <module>
        @datasaver()  # you need ()
         ^^^^^^^^^^^
      File "~/.venv/lib/python3.11/site-packages/hamilton/function_modifiers/base.py", line 60, in replace__call__
        return call_fn(self, fn)
               ^^^^^^^^^^^^^^^^^
      File "~/.venv/lib/python3.11/site-packages/hamilton/function_modifiers/base.py", line 102, in __call__
        self.validate(fn)
      File "~/.venv/lib/python3.11/site-packages/hamilton/function_modifiers/adapters.py", line 864, in validate
        raise InvalidDecoratorException(f"Function: {fn.__qualname__} must return a dict.")
    hamilton.function_modifiers.base.InvalidDecoratorException: Function: save_json_data must return a dict.
    ## Steps to replicate behavior 1. Take datasaver example from https://hamilton.dagworks.io/en/latest/reference/decorators/datasaver/ 2. Add
    from __future__ import annotations
    at the top of the file ## Library & System Information Ubuntu 24.04 Python 3.11.11 hamilton 1.86.1 # Expected behavior datasaver should work with
    __future__.annotations
    ? # Note simply extending the check to
    if return_annotation not in (dict, Dict, 'dict'):
    fixes the issue. DAGWorks-Inc/hamilton
  • g

    GitHub

    03/28/2025, 4:04 PM
    #1298 Fix: issue #1297: datasaver does not work with __future__.annotations Pull request opened by kreczko Fixes issue #1297 by changing return_annotation retrieval
    function_modifiers.adapters.datasaver.validate
    . ## Changes • modified
    datasaver.validate
    to check against
    __future__.annotations
    compatible type hints ## How I tested this • added
    from __future__ import annotations
    to test_adapters • ran
    pytest tests/function_modifiers
    → test failed • made above changes • ran
    pytest tests/function_modifiers
    → test succeeded ## Notes
    correct_ds_function
    in
    test_adapters.py
    uses an alias for `dict`:
    dict_ = dict
    . However, for the string comparison this is not resolved to `dict`; added
    dict_
    to the validation for now.
    I do not understand why this is done in the tests and should, in theory, not impact "normal" use. Feedback on this is welcome. ## Checklist • PR has an informative and human-readable title (this will be pulled into the release notes) • Changes are limited to a single goal (no scope creep) • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered. • Any change in functionality is tested • New functions are documented (with a description, list of inputs, and expected output) • Placeholder code is flagged / future TODOs are captured in comments • Project documentation has been updated if adding/changing functionality. DAGWorks-Inc/hamilton
  • g

    GitHub

    03/29/2025, 6:12 PM
    1 new commit pushed to
    <https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
    by elijahbenizzy
    <https://github.com/DAGWorks-Inc/hamilton/commit/06e1c209ee34a184516885e78db2d5d1a233a282|06e1c209>
    - Add Task Submission / Return Hooks (#1287) DAGWorks-Inc/hamilton
  • g

    GitHub

    03/29/2025, 6:15 PM
    1 new commit pushed to
    <https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
    by elijahbenizzy
    <https://github.com/DAGWorks-Inc/hamilton/commit/cda7a9608b43d1660273f2f249ef49df638b1d23|cda7a960>
    - Add Context-Aware Synchronous/Asynchronous Logging Adapters (#1294) DAGWorks-Inc/hamilton
  • g

    GitHub

    03/29/2025, 6:22 PM
    #1299 Bumps sf-hamilton version from 1.87.0 to 1.88.0 Pull request opened by elijahbenizzy Version bump DAGWorks-Inc/hamilton
    • 1
    • 1
  • g

    GitHub

    03/29/2025, 6:47 PM
    1 new commit pushed to
    <https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
    by elijahbenizzy
    <https://github.com/DAGWorks-Inc/hamilton/commit/7437f038f8f5a1a670b82e55b59274c34c88cbe5|7437f038>
    - Bumps sf-hamilton version from 1.87.0 to 1.88.0 (#1299) DAGWorks-Inc/hamilton
  • g

    GitHub

    03/29/2025, 6:55 PM
    Release - sf-hamilton-1.88.0 New release published by elijahbenizzy ## What's Changed • Add ability to run hamilton-ui on domain subpath by @jonas-meyer in #1284 • Pin
    ddtrace
    until deprecated module can be evaluated by @cswartzvi in #1288 • Fix a Type Hint and Link in Parallel Execution Documentation by @bustosalex1 in #1292 • Fix
    xgboost
    errors in CI by @cswartzvi in #1295 • Bumps sf-hamilton-ui version to 0.0.17 by @elijahbenizzy in #1296 • Add Task Submission / Return Hooks by @cswartzvi in #1287 • Add Context-Aware Synchronous/Asynchronous Logging Adapters by @cswartzvi in #1294 • Bumps sf-hamilton version from 1.87.0 to 1.88.0 by @elijahbenizzy in #1299 ## New Contributors • @jonas-meyer made their first contribution in #1284 Full Changelog: sf-hamilton-1.87.0...sf-hamilton-1.88.0 DAGWorks-Inc/hamilton
  • g

    GitHub

    03/29/2025, 6:55 PM
    Deployment to github-pages by elijahbenizzy DAGWorks-Inc/hamilton
  • g

    GitHub

    03/31/2025, 6:54 PM
    #1300 Add Swapnil Dewalkar to contributors list in README Pull request opened by swapdewalkar --- PR TEMPLATE INSTRUCTIONS (1) --- Looking to submit a Hamilton Dataflow to the sf-hamilton-contrib module? If so go the the
    Preview
    tab and select the appropriate sub-template: • <?expand=1&template=HAMILTON_CONTRIB_PR_TEMPLATE.md|sf-hamilton-contrib template> Else, if not, please remove this block of text. --- PR TEMPLATE INSTRUCTIONS (2) --- [Short description explaining the high-level reason for the pull request] ## Changes ## How I tested this ## Notes ## Checklist • PR has an informative and human-readable title (this will be pulled into the release notes) • Changes are limited to a single goal (no scope creep) • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered. • Any change in functionality is tested • New functions are documented (with a description, list of inputs, and expected output) • Placeholder code is flagged / future TODOs are captured in comments • Project documentation has been updated if adding/changing functionality. DAGWorks-Inc/hamilton
  • g

    GitHub

    04/03/2025, 6:16 AM
    #1301 Wires through REACT_APP_HAMILTON_SUB_PATH for docker Pull request opened by skrawcz We missed exposing this in the base docker file for the frontend such that it was missed when the docker file was built. ## Changes • wires through the REACT_APP_HAMILTON_SUB_PATH var ## How I tested this • locally by building and changing the value to
    /hamilton3
    and then serving the UI from that subpath. ## Notes ## Checklist • PR has an informative and human-readable title (this will be pulled into the release notes) • Changes are limited to a single goal (no scope creep) • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered. • Any change in functionality is tested • New functions are documented (with a description, list of inputs, and expected output) • Placeholder code is flagged / future TODOs are captured in comments • Project documentation has been updated if adding/changing functionality. DAGWorks-Inc/hamilton
    • 1
    • 1
  • g

    GitHub

    04/06/2025, 2:57 AM
    #1302 Fix local (Windows) tests Pull request opened by cswartzvi I have been encountering errors and/or failures when running the test suite, locally, on a Windows machine. This pull request includes several changes aimed at allowing the test suite to pass. All changes, expect the first one, are confined to the offending tests. ## Changes ### File Handling Improvements: • `hamilton/io/utils.py`: Enhanced the
    get_file_metadata
    function to correctly handle Windows drive paths where the
    scheme
    from
    parse.urlparse
    may include the Windows drive letter. ### Testing Fixture Updates: • `tests/caching`: Because the
    metadata_store
    and
    result_store
    used the same temporary directory, deletions during clean-up were running into Window's file share locking. Switched to the
    tmp_path_factory
    fixture and decoupled the paths for the
    metadata_store
    and
    result_store
    ### Environment Variable Mocking: • `tests/plugins/test_pandas_extensions.py`: Added mocking for the
    TZDIR
    environment variable in the
    test_pandas_orc_reader
    test. Note this is due to how Windows interacts with the IANA timezone database. • `tests/test_telemetry.py`: Added mocking for the
    HAMILTON_TELEMETRY_ENABLED
    environment variable in telemetry configuration tests. Previous tests were changing
    os.environ
    directly leading to issues if the user already had ``HAMILTON_TELEMETRY_ENABLED` set. ### Platform-Specific Test Adjustments: • `tests/plugins/test_plotly_extensions.py`: Added a platform check to skip the
    test_plotly_static_writer
    test on Windows. There are some issue with using the
    plotly
    dependency
    kaleido
    to generate static images on Windows. ## How I tested this N/A ## Notes N/A ## Checklist • PR has an informative and human-readable title (this will be pulled into the release notes) • Changes are limited to a single goal (no scope creep) • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered. • Any change in functionality is tested • New functions are documented (with a description, list of inputs, and expected output) • Placeholder code is flagged / future TODOs are captured in comments • Project documentation has been updated if adding/changing functionality. DAGWorks-Inc/hamilton
    • 1
    • 1
  • g

    GitHub

    04/06/2025, 1:34 PM
    #1303 Add new function modifier `upack_fields` Pull request opened by cswartzvi Added a new modifier called
    unpack_fields
    which allows for the extraction of fields from a tuple output (a cross between
    extract_columns
    and
    extract_fields
    ). ## Changes • Added
    unpack_fields
    decorator to
    hamilton/function_modifiers/__init__.py
    and implemented its logic in
    hamilton/function_modifiers/expanders.py
    . This decorator enables the extraction of fields from a tuple output, expanding a single function into multiple nodes. ## How I tested this • Added multiple test cases in
    tests/function_modifiers/test_expanders.py
    to validate the functionality of the
    unpack_fields
    decorator, including tests for valid and invalid type annotations and different tuple configurations. ## Notes • Example usage with an fixed size tuple: @unpack_fields("X_train" "X_validation", "X_test") def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: """Randomly split data into train, validation, test""" X_train, X_validation, X_test = random_split(X) return X_train, X_validation, X_test • Example usage with a subset of fixed length tuples: @unpack_fields("X_train") def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: """Randomly split data into train, validation, test""" X_train, X_validation, X_test = random_split(X) return X_train, X_validation, X_test • Example usage with an indeterminate length tuple: @unpack_fields("X_train" "X_validation", "X_test") def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, ...]: """Randomly split data into train, validation, test""" X_train, X_validation, X_test = random_split(X) return X_train, X_validation, X_test ## Checklist • PR has an informative and human-readable title (this will be pulled into the release notes) • Changes are limited to a single goal (no scope creep) • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered. • Any change in functionality is tested • New functions are documented (with a description, list of inputs, and expected output) • Placeholder code is flagged / future TODOs are captured in comments • Project documentation has been updated if adding/changing functionality. DAGWorks-Inc/hamilton
  • g

    GitHub

    04/07/2025, 2:44 AM
    1 new commit pushed to
    <https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
    by elijahbenizzy
    <https://github.com/DAGWorks-Inc/hamilton/commit/25d18812a57377e553abe9f55f95172cf531ea06|25d18812>
    - Fix local (Windows) tests (#1302) DAGWorks-Inc/hamilton
  • g

    GitHub

    04/07/2025, 2:45 AM
    1 new commit pushed to
    <https://github.com/DAGWorks-Inc/hamilton/tree/main|main>
    by elijahbenizzy
    <https://github.com/DAGWorks-Inc/hamilton/commit/fa85f36afb6fbdc14864ab7682e13b90796138e0|fa85f36a>
    - Wires through REACT_APP_HAMILTON_SUB_PATH for docker (#1301) DAGWorks-Inc/hamilton
  • g

    GitHub

    04/12/2025, 11:15 PM
    #1304 Documentation: add explainer for FastAPI Issue created by skrawcz
    Awesome thanks. @omsawant-coder how about writing up a documentation section on https://hamilton.dagworks.io/en/latest/how-tos/microservice/ that uses the example linked?
    So task would be:
    • Provide an explanation of how you could use Hamilton in a FastAPI webservice using Asynchronous python, using the example code we already have.
    Originally posted by @skrawcz in #1186 DAGWorks-Inc/hamilton
  • g

    GitHub

    05/09/2025, 3:02 AM
    #1305 Update extract fields Pull request opened by cswartzvi Updated the existing function modifier
    extract_fields
    so that it can infer field types from the type annotation. Important: This PR is based on the branch in #1303 and not
    main
    . Recommend merging that branch first. Sorry for the confusion! ## Changes • Isolated the field extraction logic in a helper function called
    _process_extract_fields
    this function determines field types when necessary before calling the preexisting helper
    _validate_extract_fields
    • The
    extract_fields
    class now calls
    _process_extract_fields
    directly (instead of
    _validate_extract_fields
    ) • Documentation on using
    extract_fields
    was updated to include unpacked field names, list of field names, and the previously undocumented
    TypedDict
    ## How I tested this • Added test cases to validate the functionality of the
    extract_fields
    decorator with inferred field types • Updated and consolidated existing annotation checks to handle explicit field types, inferred field types, and
    TypedDicts
    ## Notes To use this feature you must specify a generic dictionary with valid type paramerters - therefore it will only work for homogenous dictionaries. For example, the following would extract the standard
    X_train
    ,
    X_test
    ,
    y_train
    , and
    y_test
    as
    np.ndarray
    by using unpacked field names: @extract_fields('X_train', 'X_test' 'y_train' 'y_test') # unpacked field names def train_test_split_func(...) -> Dict[str, np.ndarray]: ... return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...} You can also pass a list of field names to the first argument: @extract_fields(['X_train', 'X_test' 'y_train' 'y_test']) # list of field names def train_test_split_func(...) -> Dict[str, np.ndarray]: ... return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...} This also preserves backward compatibility with non-generic dictionaries: @extract_fields(dict( # fields specified as a dictionary X_train=np.ndarray, X_validation=np.ndarray, X_test=np.ndarray, )) def train_test_split_func(...) -> Dict: ... return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...} ## Checklist • PR has an informative and human-readable title (this will be pulled into the release notes) • Changes are limited to a single goal (no scope creep) • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered. • Any change in functionality is tested • New functions are documented (with a description, list of inputs, and expected output) • Placeholder code is flagged / future TODOs are captured in comments • Project documentation has been updated if adding/changing functionality. DAGWorks-Inc/hamilton
  • g

    GitHub

    05/09/2025, 3:54 AM
    #1306 Fix `dlt` plugin with changes to `loader_file_format` Pull request opened by cswartzvi A recent change to
    dlt
    in dlt-hub/dlt#2430 moved the
    loader_file_format
    parameter from
    pipeline.normalize
    to
    pipeline.extract
    . This caused CI tests for the
    dlt
    plugin to fail (most notably in #1305). ## Changes Updated
    tests/plugins/test_dlt_extensions.py
    , moving
    loader_file_format
    from
    pipeline.normalize
    to
    pipeline.extract
    . ## How I tested this Covered by existing
    tests/plugins/test_dlt_extensions.py
    . ## Notes N/A ## Checklist • PR has an informative and human-readable title (this will be pulled into the release notes) • Changes are limited to a single goal (no scope creep) • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered. • Any change in functionality is tested • New functions are documented (with a description, list of inputs, and expected output) • Placeholder code is flagged / future TODOs are captured in comments • Project documentation has been updated if adding/changing functionality. DAGWorks-Inc/hamilton