Hamilton Open Source #contribute

Elijah Ben Izzy

07/02/2022, 6:56 PM

Hey folks! Cool new release upcoming that should make it easier for everyone to contribute. We're adding the ability to build ad-hoc DAGs from functions (rather than modules). This makes it super easy to draft up a unit test without handling resources.

Elijah Ben Izzy

07/02/2022, 6:56 PM

Here's the PR -- planning to merge this weekend! https://github.com/stitchfix/hamilton/pull/145

👍 1

Elijah Ben Izzy

07/13/2022, 9:56 PM

Hey folks! Released the RC version for data quality, planning to release tomorrow. Wanted to give y'all a chance to test it out!

pip install sf-hamilton==1.9.0rc0

Elijah Ben Izzy

08/16/2022, 3:40 AM

Hey folks! New RC version -- would love testers! Some features include: • AsyncDriver for hamilton in a web service • New default validators (allowing

None

for the output) • Support for

Union

type • A refactor of the

parametrize*

family -- this includes a new

@parameterize

decorator that can parameterize across dependency sources/values • Misc. bug fixes. Reach out if you have any questions! Installing is easy -- just run

pip install sf-hamilton==1.10.0rc0

Slackbot

01/10/2023, 5:48 PM

This message was deleted.

Slackbot

12/22/2023, 9:00 PM

This message was deleted.

Konstantin Tyapochkin

03/18/2024, 8:11 PM

@Stefan Krawczyk @Elijah Ben Izzy Hi guys! Is it ok that I created a draft PR about integrations with AWS so I can continue working on it? If not, I can just remove it and recreate it after it is ready. The PR: https://github.com/DAGWorks-Inc/hamilton/pull/768

👀 2

🔥 2

👍 1

Tom Barber

03/21/2024, 2:26 PM

So on this polars lazyframe stuff @Stefan Krawczyk I've basically added another polars plugin to the source which uses lazyframe instead of dataframe. First up a) is that the correct path for whats missing in Hamilton, it seems to work here now I can toss lazyframes around and then run collect at the end and have it materialize them b) if I can get the other readers and writers squared away do you want it as a PR into the codebase?

Fran Boon

04/01/2024, 4:13 PM

As per my Intro, we need to compile our models to several different platforms. We want to be able to set ray.remote options to target the right nodes (currently using Custom Resources , e.g.

resources={"A2":1}

) Currently it seems that this could be pretty easily achieved by (ab)using Hamilton's Tags feature:

Copy code

@tag(**{"ray.resources": json.dumps({"A2": 1}))
def my_hamilton_node_fn_which_needs_an_A2(...) -> ...:
   ...

RayGraphAdapter.execute_node() would be modified to:

Copy code

ray_options = {tag[4:]: json.loads(value) for tag, value in tags.items() if tag.startswith("ray.")}
return ray.remote(raify(node.callable), **ray_options).remote(**kwargs)

Any concerns with taking this approach? Any better options?

Stefan Krawczyk

04/02/2024, 5:23 PM

@Konstantin Tyapochkin some of the code we sketched for reference:

Copy code

dr = driver.Builder().with_modules(data_loading, feature_engineering, model_training, model_evaluation).with_adapter(...).build()

# one sagemaker job on small machine
data_set = dr.execute(["data_set_v1"], inputs={...})

# one sage maker job on large machine with GPU
model = dr.execute(["model_v1"], inputs={...}, override={"data_set_v1": data_set})

# one sagemaker job on small machine
evaluation = dr.execute(["evaluation_v1"], inputs={...}, override={"model_v1": model})

# some ideas on config structure?
config = {
    "tasks": [{"name": "data_set_v1", "sagemaker": ["machine.small"], "artifacts": ["data_set_v1"]},
              {"name": "model_v1", "sagemaker": ["machine.gpu"], "artifacts": ["model1"]},
              {"name": "model_v2", "sagemaker": ["machine.gpu"], "artifacts": ["model2"]},
              {"name": "evaluation_v1", "sagemaker": ["machine.small"]}]
}

sagemaker_pipeline_code = SageMakerPipelineBuilder(dr, config).compile()
airflow_pipeline_code = AirflowPipelineBuilder(dr, config).compile()

👍 1

👀 1

Jay

05/15/2024, 4:29 PM

Hi, is there a way to get all the graphs that are loaded into the driver?

Stefan Krawczyk

05/21/2024, 7:05 PM

@Thierry Jean @Gilad Rubin we can chat here around the experimentation and hyper parameter stuff

👍 1

Jernej Frank

07/24/2024, 2:37 PM

Hello, I needed to make a small change to the backend Docker for our orchestration system: https://github.com/DAGWorks-Inc/hamilton/pull/1065 let me know if I should change anything to get it merged. Thanks!

Iliya R

08/07/2024, 4:33 PM

I've just created a (draft) PR for adding a

pyproject.toml

. Will appreciate any feedback, especially with regard to testing this.

❤️ 2

Iliya R

08/07/2024, 8:08 PM

Are you guys particularly attached to flake8, or can we switch to ruff?

Iliya R

08/08/2024, 6:53 AM

Re import sorting - do we want

hamilton_sdk

to be its own section, or 1st party (i.e. grouped with

hamilton

) or 3rd party (grouped with

pytest

etc)? There's some inconsistency in the files with that regard.

Iliya R

08/08/2024, 11:12 PM

I created another PR to enable ruff. The sdk unit tests are failing, but I'm not sure why.

👀 2

Iliya R

08/20/2024, 7:11 PM

I've been looking at

parameterize_extract_columns

and saw that it requires

ParameterizedExtract

objects. Suggestion: have it accept some sort of named/ordered fields (e.g. list of dicts, or list of tuples) then wrap them internally in

ParameterizedExtract

. It saves an import and is a tiny bit more elegant imho. wdyt?

Slackbot

08/22/2024, 6:52 PM

This message was deleted.

Jernej Frank

08/23/2024, 11:49 PM

I installed the new pre-commit hooks and keep running into a weird ruff error. I'm not familiar with ruff, any ideas?

Copy code

ruff.....................................................................Failed
- hook id: ruff
- exit code: 2

error: TOML parse error at line 176, column 1
    |
176 | [tool.ruff.format]
    | ^^^^^^^^^^^^^^^^^^
wanted exactly 1 element, more than 1 element

black....................................................................Passed
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
fix requirements.txt.................................(no files to check)Skipped
check python ast.........................................................Passed

Fran Boon

08/26/2024, 9:08 AM

Custom CA cert support for [Async]HamiltonTracker: https://github.com/DAGWorks-Inc/hamilton/pull/1105 I wonder if we should share a single session object across all the functions in both these...reusing the session object generally improves performance. Also wonder if we can switch to modern-style Type Hints by using:

from __future__ import annotations

Fran Boon

09/01/2024, 3:01 PM

The AsyncDriver currently just works with the AsyncGraphAdapter. I would like it to work with a RayGraphAdapter. I am aware that Ray Tasks cannot themselves be async (if they wish to benefit from this then they need to start an async event loop inside the task) However I can see some benefit (not yet measurable, so I may be wrong!) in having all the coordination be async: HamiltonTracker, MLFlow, Ray task submission. Is this something that you are already considering? I am happy to take a look if not. Would your guidance be to extend the RayGraphAdapter to auto-detect when it is running in an Event loop or to subclass as AsyncRayGraphAdapter?

Jernj Frank

09/12/2024, 7:10 PM

Added the ability to override nodes from later imported modules: https://github.com/DAGWorks-Inc/hamilton/pull/1134 The only part I am unsure about is how to update docs.

Iliya R

09/19/2024, 7:08 AM

Quick question - why do we have

"sqlalchemy==1.4.49; python_version == '3.7.*'",

in pyproject.toml, if the minimum supported python version is 3.8?

Iliya R

09/19/2024, 7:43 AM

Second question - regarding python 3.13 support - what are your plans for adding that (to CI + docs)? According to the python website, the released RC2 "is expected to become the final 3.13.0 release" - so any tests can already be done with that version.

Call to action

We strongly encourage maintainers of Python projects to prepare their projects for 3.13 compatibilities during this phase

Viktor

10/03/2024, 11:15 AM

I have found this collection with Python DE resources. They are yet missing Hamilton. This may be a good spot to be featured for free. There's 10 Forks and 73 Stars on it. https://github.com/vajol/python-data-engineering-resources/blob/main/resources/orchestration-tools.md

👀 1

🙌 1

Iliya R

10/20/2024, 1:39 PM

Hi guys, can you please fix this very minor issue (this is a warning raised by pytest):

Copy code

hamilton\function_modifiers\macros.py:1522: SyntaxWarning: invalid escape sequence '\*'

What needs to be done is add a

to the beginning of the docstring (i.e.

"""

r"""

) and change

\*\*

on the aforementioned line to

**

👀 1

Jernj Frank

11/18/2024, 1:11 AM

Hey, quick question: is there a guide somewhere how to add google colab and github badges to example notebooks?