Hamilton Open Source #improvement-ideas

Slackbot

09/02/2022, 12:30 AM

This message was deleted.

👀 1

Elijah Ben Izzy

10/11/2022, 4:56 PM

Hey folks — would love some thoughts on this: https://github.com/stitchfix/hamilton/issues/208. Not 100% sure its in Hamilton’s wheelhouse (depends on implementation), but certainly an interesting idea.

Slackbot

11/02/2022, 9:31 AM

This message was deleted.

Slackbot

12/12/2022, 12:27 AM

This message was deleted.

Stefan Krawczyk

12/27/2022, 6:41 AM

Hi Everyone — I have a change up to add telemetry to Hamilton so we can better understand how everyone is using all the different features. If you’d like to take a look, please do so here - https://github.com/stitchfix/hamilton/pull/255 ; you should be able to leave comments on the PR, if not, please leave them in the corresponding github issue #248. My plan is to put it into a release candidate this week so people can try it out.

Stefan Krawczyk

01/02/2023, 5:00 PM

@Gregory Jeffrey I was playing around with Polars and found that we don’t have the best support for it yet -- https://github.com/stitchfix/hamilton/pull/263 IIRC you were using Polars. Any thoughts/improvement ideas?

Slackbot

01/23/2023, 9:52 PM

This message was deleted.

👍 3

Stefan Krawczyk

01/25/2023, 6:46 PM

Here’s a sneak peek at what’s going to be released soon - the ability to visualize your DAG in a notebook easily. Questions/comments appreciated 🙂

👍 1

🔥 4

Elijah Ben Izzy

02/16/2023, 10:13 PM

<!channel> — want community input on this! Exciting API addition — we plan to more naturally support SQL: https://github.com/stitchfix/hamilton/discussions/315. You shouldn’t have to manage connections/query on your own. Want to know whether this would be useful for you, and what API you prefer (we have a few options). Please vote/comment in the github discussion!

❤️ 2

Slackbot

04/06/2023, 2:57 AM

This message was deleted.

Slackbot

04/21/2023, 5:04 AM

This message was deleted.

👀 1

Slackbot

04/23/2023, 5:57 AM

This message was deleted.

👀 1

Slackbot

08/27/2023, 10:52 PM

This message was deleted.

👀 1

Slackbot

08/30/2023, 8:59 AM

This message was deleted.

Slackbot

01/04/2024, 6:30 PM

This message was deleted.

Pieter Wijkstra

03/16/2024, 9:29 PM

Have you ever thought about an export of the visualization to d2 (d2lang.com)? This integrates with Quarto.org which I'm considering for documenting my codebase...

Gilad Rubin

07/22/2024, 10:13 PM

Have you heard of intake data catalog (Take 2)? I think it might be a good idea to connect it to the materializes. It contains a lot of sources to read from and different "readers" https://intake.readthedocs.io/en/latest/walkthrough2.html#simple-example

Stefan Krawczyk

07/22/2024, 10:18 PM

haven’t. DLT is one we’ve been thinking about.

Iliya R

08/10/2024, 8:33 PM

Is there something like a workplan for the project, a list of upcoming features, a feature voting platform, or anything of the sort?

Iliya R

08/22/2024, 6:53 PM

Since I didn't see a dedicated github issue on this matter (the closest being #301), I'd like to discuss the topic of nested parallelization. Can you guys tell me what the main obstacle to implementing this is? Is it related to associating the parallel and collect? Task distributions? ...?

Iliya R

08/26/2024, 12:41 PM

The

ANN

family of checks: https://docs.astral.sh/ruff/rules/#flake8-annotations-ann

Jernj Frank

10/08/2024, 12:29 PM

Standardising pipelines: Being able to create an abstract DAG factory. You develop a DAG schema, and it guarantees that certain nodes are in that DAG that can be used outside of Hamilton as interface points. The nodes in the abstract schema can force correct types and meta-vertices. It could be enough to check if a certain node is upstream from another node and allow the actual DAG implementation to insert in between nodes. It would mimic the popular factory pattern in OOP. The schema nodes could be implemented like abstract methods that get overriden if same-name function is imported into driver. Inspired by this question.

Charles Swartz

10/08/2024, 4:03 PM

Expanding

TaskExecutionHook

First, some background … I have recently been creating custom

rich

-based lifestyle adapters, mainly building on the existing

PrintLn

and

ProgressBar

. I hit a little bit of a snag when using task-based parallel DAGs. Currently, for task-based DAGs,

ProgressBar

uses a bar with an unknown length because - by my assessment - the number of nodes is determined by the

execution_path

GraphExecutionHook.run_before_graph_execution

and this will generally not match the number of tasks. I found

TaskExecutionHook

and this allows me to log task information, but I do not see a way to determine the number of tasks in this hook. Just to give this some concrete context – my idea was to create a two-level progressbar for task-based DAGs, a static one that tracks overall tasks and an ephemeral one that tracks the groups within the tasks. Would you be open to altering the

TaskExecutionHook

hook in a way that makes this information available? Perhaps where the tasks are initially grouped in

TaskBasedGraphExecutor.execute

? If so, let me know and I will open an issue and/or PR. Thanks!

👀 1

👍 1

Justin Donaldson

10/09/2024, 9:39 PM

Hi folks, I wanted to start an interest-check thread for streaming and real-time inference, and also maybe throw in a little more notebook magic as well. I'll post some experiments here as I go.

👀 3

Piotr Bieszczad

04/01/2025, 6:19 AM

Not an idea, but I believe it belongs in this channel: (About the VSCode extension) > There are many features that we’d be interested in implementing. Let us know on Slack your favorite ones! I would really like to see: > • Go To Definition: jump to where the node defined > • Go To References: jump to where the node is a dependency Especially, because when using

parameterize

'extract

step(...).named([...]

a lot of strings is created, and not being able to navigate using them, makes it difficult to debug.

Elijah Ben Izzy

04/01/2025, 6:02 PM

Hey! To be clear this is in reference to the VSCode extension?

👍 2

Charles Swartz

04/04/2025, 2:59 AM

@Piotr Bieszczad or the sake of completeness, I would also like to add "Rename a node across locations" to your list. Note I have been looking into contributing some of these features - nothing to show yet.

Charles Swartz

04/04/2025, 3:07 AM

I had a couple of ideas revolving around function modifiers (I would be happy to work on both of these if there is interest): • A new modifier called

unpack_fields

(a cross between

extract_columns

and

extract_fields

). It would expect the decorated function to return a tuple and unpack field names corresponding to elements in that tuple. For example, the following would create two fields

text_field="Hello"

and `int_field=42`:

Copy code

@unpack_fields("text_field", "int_field")
def A() -> Tuple[str, int]:
    return "Hello", 42

• Update the existing modifier

extract_fields

so that it will accept a list of field names (in addition to the backward compatible dict) and then determine the field types from the type annotation. This would only work for homogenous dictionaries, but it would reduce some redundant keystrokes. For example, the following would extract the standard

X_train

X_test

y_train

, and

y_test

as `np.ndarray`:

Copy code

@extract_fields(['X_train', 'X_test' 'y_train' 'y_test'])
def train_test_split_func(...) -> Dict[str, np.ndarray]:
    ...
    return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}

Note that there may be a way to make this accept variadic field names as well, but it might be tricky to preserve complete backward compatibility. For example:

Copy code

@extract_fields('X_train', 'X_test' 'y_train' 'y_test')
def train_test_split_func(...) -> Dict[str, np.ndarray]:
    ...
    return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}

Mattias Fornander US

04/24/2025, 6:54 PM

I see you are visualizing Pydantic classes in Hamilton UI. Nice work! Any plans to add support for attrs? https://www.attrs.org ? Would it just be a matter of adding an

attrs_stats.py

next to pydantic_stats.py ?

Iliya R

04/26/2025, 5:35 AM

I just learned of context7 (automated RAG creation from github repos), figured it might be good to add hamilton and burr to it... So I did.

🙌 3