https://github.com/stitchfix/hamilton logo
Join Slack
Powered by
# improvement-ideas
  • s

    Slackbot

    09/02/2022, 12:30 AM
    This message was deleted.
    👀 1
    s
    b
    • 3
    • 3
  • e

    Elijah Ben Izzy

    10/11/2022, 4:56 PM
    Hey folks — would love some thoughts on this: https://github.com/stitchfix/hamilton/issues/208. Not 100% sure its in Hamilton’s wheelhouse (depends on implementation), but certainly an interesting idea.
  • s

    Slackbot

    11/02/2022, 9:31 AM
    This message was deleted.
    s
    e
    w
    • 4
    • 6
  • s

    Slackbot

    12/12/2022, 12:27 AM
    This message was deleted.
    s
    i
    • 3
    • 11
  • s

    Stefan Krawczyk

    12/27/2022, 6:41 AM
    Hi Everyone — I have a change up to add telemetry to Hamilton so we can better understand how everyone is using all the different features. If you’d like to take a look, please do so here - https://github.com/stitchfix/hamilton/pull/255 ; you should be able to leave comments on the PR, if not, please leave them in the corresponding github issue #248. My plan is to put it into a release candidate this week so people can try it out.
  • s

    Stefan Krawczyk

    01/02/2023, 5:00 PM
    @Gregory Jeffrey I was playing around with Polars and found that we don’t have the best support for it yet -- https://github.com/stitchfix/hamilton/pull/263 IIRC you were using Polars. Any thoughts/improvement ideas?
  • s

    Slackbot

    01/23/2023, 9:52 PM
    This message was deleted.
    👍 3
    g
    s
    • 3
    • 3
  • s

    Stefan Krawczyk

    01/25/2023, 6:46 PM
    Here’s a sneak peek at what’s going to be released soon - the ability to visualize your DAG in a notebook easily. Questions/comments appreciated 🙂
    👍 1
    🔥 4
  • e

    Elijah Ben Izzy

    02/16/2023, 10:13 PM
    <!channel> — want community input on this! Exciting API addition — we plan to more naturally support SQL: https://github.com/stitchfix/hamilton/discussions/315. You shouldn’t have to manage connections/query on your own. Want to know whether this would be useful for you, and what API you prefer (we have a few options). Please vote/comment in the github discussion!
    ❤️ 2
  • s

    Slackbot

    04/06/2023, 2:57 AM
    This message was deleted.
    e
    m
    j
    • 4
    • 17
  • s

    Slackbot

    04/21/2023, 5:04 AM
    This message was deleted.
    👀 1
    s
    • 2
    • 3
  • s

    Slackbot

    04/23/2023, 5:57 AM
    This message was deleted.
    👀 1
    e
    d
    • 3
    • 6
  • s

    Slackbot

    08/27/2023, 10:52 PM
    This message was deleted.
    👀 1
    s
    j
    • 3
    • 3
  • s

    Slackbot

    08/30/2023, 8:59 AM
    This message was deleted.
    e
    • 2
    • 1
  • s

    Slackbot

    01/04/2024, 6:30 PM
    This message was deleted.
    s
    a
    +3
    • 6
    • 13
  • p

    Pieter Wijkstra

    03/16/2024, 9:29 PM
    Have you ever thought about an export of the visualization to d2 (d2lang.com)? This integrates with Quarto.org which I'm considering for documenting my codebase...
    s
    t
    • 3
    • 6
  • g

    Gilad Rubin

    07/22/2024, 10:13 PM
    Have you heard of intake data catalog (Take 2)? I think it might be a good idea to connect it to the materializes. It contains a lot of sources to read from and different "readers" https://intake.readthedocs.io/en/latest/walkthrough2.html#simple-example
  • s

    Stefan Krawczyk

    07/22/2024, 10:18 PM
    haven’t. DLT is one we’ve been thinking about.
    g
    • 2
    • 1
  • i

    Iliya R

    08/10/2024, 8:33 PM
    Is there something like a workplan for the project, a list of upcoming features, a feature voting platform, or anything of the sort?
    s
    t
    • 3
    • 7
  • i

    Iliya R

    08/22/2024, 6:53 PM
    Since I didn't see a dedicated github issue on this matter (the closest being #301), I'd like to discuss the topic of nested parallelization. Can you guys tell me what the main obstacle to implementing this is? Is it related to associating the parallel and collect? Task distributions? ...?
    e
    • 2
    • 3
  • i

    Iliya R

    08/26/2024, 12:41 PM
    The
    ANN
    family of checks: https://docs.astral.sh/ruff/rules/#flake8-annotations-ann
  • j

    Jernj Frank

    10/08/2024, 12:29 PM
    Standardising pipelines: Being able to create an abstract DAG factory. You develop a DAG schema, and it guarantees that certain nodes are in that DAG that can be used outside of Hamilton as interface points. The nodes in the abstract schema can force correct types and meta-vertices. It could be enough to check if a certain node is upstream from another node and allow the actual DAG implementation to insert in between nodes. It would mimic the popular factory pattern in OOP. The schema nodes could be implemented like abstract methods that get overriden if same-name function is imported into driver. Inspired by this question.
    t
    b
    • 3
    • 6
  • c

    Charles Swartz

    10/08/2024, 4:03 PM
    Expanding
    TaskExecutionHook
    First, some background … I have recently been creating custom
    rich
    -based lifestyle adapters, mainly building on the existing
    PrintLn
    and
    ProgressBar
    . I hit a little bit of a snag when using task-based parallel DAGs. Currently, for task-based DAGs,
    ProgressBar
    uses a bar with an unknown length because - by my assessment - the number of nodes is determined by the
    execution_path
    in
    GraphExecutionHook.run_before_graph_execution
    and this will generally not match the number of tasks. I found
    TaskExecutionHook
    and this allows me to log task information, but I do not see a way to determine the number of tasks in this hook. Just to give this some concrete context – my idea was to create a two-level progressbar for task-based DAGs, a static one that tracks overall tasks and an ephemeral one that tracks the groups within the tasks. Would you be open to altering the
    TaskExecutionHook
    hook in a way that makes this information available? Perhaps where the tasks are initially grouped in
    TaskBasedGraphExecutor.execute
    ? If so, let me know and I will open an issue and/or PR. Thanks!
    👀 1
    👍 1
    e
    t
    • 3
    • 3
  • j

    Justin Donaldson

    10/09/2024, 9:39 PM
    Hi folks, I wanted to start an interest-check thread for streaming and real-time inference, and also maybe throw in a little more notebook magic as well. I'll post some experiments here as I go.
    👀 3
    e
    m
    • 3
    • 2
  • p

    Piotr Bieszczad

    04/01/2025, 6:19 AM
    Not an idea, but I believe it belongs in this channel: (About the VSCode extension) > There are many features that we’d be interested in implementing. Let us know on Slack your favorite ones! I would really like to see: > • Go To Definition: jump to where the node defined > • Go To References: jump to where the node is a dependency Especially, because when using
    parameterize
    /
    'extract
    /
    step(...).named([...]
    a lot of strings is created, and not being able to navigate using them, makes it difficult to debug.
  • e

    Elijah Ben Izzy

    04/01/2025, 6:02 PM
    Hey! To be clear this is in reference to the VSCode extension?
    👍 2
  • c

    Charles Swartz

    04/04/2025, 2:59 AM
    @Piotr Bieszczad or the sake of completeness, I would also like to add "Rename a node across locations" to your list. Note I have been looking into contributing some of these features - nothing to show yet.
    p
    • 2
    • 1
  • c

    Charles Swartz

    04/04/2025, 3:07 AM
    I had a couple of ideas revolving around function modifiers (I would be happy to work on both of these if there is interest): • A new modifier called
    unpack_fields
    (a cross between
    extract_columns
    and
    extract_fields
    ). It would expect the decorated function to return a tuple and unpack field names corresponding to elements in that tuple. For example, the following would create two fields
    text_field="Hello"
    and `int_field=42`:
    Copy code
    @unpack_fields("text_field", "int_field")
    def A() -> Tuple[str, int]:
        return "Hello", 42
    • Update the existing modifier
    extract_fields
    so that it will accept a list of field names (in addition to the backward compatible dict) and then determine the field types from the type annotation. This would only work for homogenous dictionaries, but it would reduce some redundant keystrokes. For example, the following would extract the standard
    X_train
    ,
    X_test
    ,
    y_train
    , and
    y_test
    as `np.ndarray`:
    Copy code
    @extract_fields(['X_train', 'X_test' 'y_train' 'y_test'])
    def train_test_split_func(...) -> Dict[str, np.ndarray]:
        ...
        return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}
    Note that there may be a way to make this accept variadic field names as well, but it might be tricky to preserve complete backward compatibility. For example:
    Copy code
    @extract_fields('X_train', 'X_test' 'y_train' 'y_test')
    def train_test_split_func(...) -> Dict[str, np.ndarray]:
        ...
        return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}
    s
    e
    t
    • 4
    • 8
  • m

    Mattias Fornander US

    04/24/2025, 6:54 PM
    I see you are visualizing Pydantic classes in Hamilton UI. Nice work! Any plans to add support for attrs? https://www.attrs.org ? Would it just be a matter of adding an
    attrs_stats.py
    next to pydantic_stats.py ?
    s
    • 2
    • 1
  • i

    Iliya R

    04/26/2025, 5:35 AM
    I just learned of context7 (automated RAG creation from github repos), figured it might be good to add hamilton and burr to it... So I did.
    🙌 2