Hamilton Open Source #general

Volker Lorrmann

08/29/2024, 12:10 PM

@Carl Trachte Created by DallE3 using this prompt: Dataflow becomes a Flower. Flowerpower!

Carl Trachte

08/31/2024, 6:41 AM

More logo. A blog post about Blogger and svg - it did not go particularly well, but the logo is pretty as always. https://pyright.blogspot.com/2024/08/scaleable-vector-graphics-svg.html

Carl Trachte

08/31/2024, 1:49 PM

One last logo post (for now). https://pyright.blogspot.com/2024/08/scalable-vector-graphics-followup.html

🙌 2

Carl Trachte

09/27/2024, 8:21 PM

Another svg / logo / graph blog post (minimal). https://pyright.blogspot.com/2024/09/dag-hamilton-graph-presented-as-svg-in.html

🙌 1

Cooper Snyder

09/30/2024, 12:31 PM

hey just getting started looking into Hamilton, I think there might be some examples out there but I was wondering if anyone had examples or a blog of testing approaches/organization in a repo using Hamilton to structure all of the pipelines. I was imagining a really powerful setup would be to have like the tox testing environment spin up a hamilton server and for all of the unit tests to be registered in there so anyone could clone the repo and inspect the pipeline flows? I know this is a bit of a wide question and i can see many different ways of doing it but was wondering what everyones' approaches are to that? Thank you

Cooper Snyder

10/07/2024, 9:48 PM

hey thanks for the help above, but I have another basic question, I want to be able to expose a "pure" function version of a dag flow so a user/developer can leverage the full flow in a macro orchestrator but also have the core 'algorithm' ready to go if they want to lift and shift it out to another runtime or maybe even into another bigger dag... it feels like an antipattern to have a Hamilton dag for each part of the ETL but im feeling a pattern of something like this for helping long term memory as the amount of dags grows in a code base

Copy code

class OrchestratableTask(BaseModel):

def setup(self, *args, **kwargs):
   #environment, application, runtime specific setup.

def extract(self, *args, **kwargs):
    #external state and external data from target system

def run_pure_transform(self, *args, **kwargs):
    #pure, deterministic (enough) function based on inputs

def load(self, *args, **kwargs):
    # load results to external database

def run_transform_w_io_side_effects(self, *args, **kwargs):
   extracted_data = self.extract()
   transformed_data = self.run_pure_transform(extracted_data)
   self.load(extracted_data)

if __name__ == __main__:

    # add arg parser
    task = OrchestratableTask()
    task.setup(*args, **kwargs)
    task.run_transform_w_io_side_effects(*args, **kwargs)

where id have like a command/strategy pattern with args and kwargs controlling the behavior of the functions flow (i know it'd go into those config when decorators), and have whatever business logic right there in the transform flow, but im running into the code smells of mixing object oriented with functional, doing like a hamilton dag for each step and then another hamilton dag for those dags (i dont think this works well...) but im feeling a bit analysis paralysis; has anyone run into this idea or anything like it? any criticism for that design? I feel like from reading the docs, idiomatically you'd just make it one hamilton dag with the dataloader and datasavers and config.when decorators, but I REALLY wanted to try to make it obvious to developers that those are the main 4 abstractions required for a singular OrchestratableTask and let someone pip install package that houses all of the subclass tasks and be able to run the pure function however they like in a discovery environment like a notebook. Is this overcomplicating it with the Task class? Thank you!

👀 1

David Medinets

10/11/2024, 3:43 AM

Hello. Is the Hamilton documentation in a PDF file I can download for offline reading?

Jonas Meyer-Ohle

10/16/2024, 3:20 PM

Hi there, thanks for creating Hamilton, it's been a blast using it. Before I create a bug report, I have a question about using hamilton pandera dataframe validators with polars. I saw the following linkedin post. It mentions there that using the check_output decorator should is supported for panderas + polars. However I'm getting the following error when running the minimal example found here: https://github.com/jonas-meyer/hamilton_polars_pandera

Actual error: No registered subclass of BaseDefaultValidator is available for arg: schema and type <class 'polars.dataframe.frame.DataFrame'>. This either means (a) this arg-type contribution isn't supported or (b) this has not been added yet (but should be). In the case of (b), we welcome contributions. Get started at <http://github.com/dagworks-inc/hamilton|github.com/dagworks-inc/hamilton>.

I stepped through the following file a bit: https://github.com/DAGWorks-Inc/hamilton/blob/main/hamilton/data_quality/pandera_validators.py#L9 And it seems like the polars plugin isn't part of the supported extensions, I'm assuming this is the issue? Thanks!

👀 1

Justin Donaldson

10/21/2024, 5:51 PM

I read through this pandera integration post. Anybody have a trip report on Polars? I'm debating whether to rely on it for a data pipeline project that targets dealing with serialized embeddings.

Volker Lorrmann

10/28/2024, 8:40 AM

Hi guys, I have updated and refactored Flowerpower. Give it a try. I am happy for any feedback. FlowerPower is a simple workflow framework based on two fantastic Python libraries: • Hamilton: Creates DAGs from your pipeline functions • APScheduler: Handles pipeline scheduling https://github.com/legout/flowerpower

🔥 2

🙌 2

Andres MM

10/29/2024, 10:08 AM

Quick question Error: Hamilton does not consider these types to be equivalent. If you believe they are equivalent, please reach out to the developers. Note that, if you have types that are equivalent for your purposes, you can create a graph adapter that checks the types against each other in a more lenient manner. What I want would be

Copy code

def bar_union(x: pd.Series) -> t.Union[int, pd.Series]:
    try: 
return x
def foo_bar(bar_union: int) -> int:
    return bar + 1

Viktor

11/28/2024, 6:14 PM

Has anyone yet explored ways to integrate Serverless / Cloud Functions (Azure, AWS, DO etc.) in Hamilton DAGs? • Some compute requires more powerful resources than the Python environment Hamilton is running in. • Sometimes – the other way around – e.g. web hooks or events are better implemented as Serverless Functions. Are there any examples of how this may be set up?

Justin Donaldson

12/04/2024, 10:58 PM

Hey folks, happy Wednesday. I had a question about scenarios for ML training. There's a nice example for iris on the website : https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/model_examples/scikit-learn However, I'm interested in models that have more complex pipelines (e.g. text transformation). It's relatively easy to set up a training pipeline for it, and then it's possible to override the pipeline stages with some data for an inference pipeline, but that just winds up feeling super fragile.

Volker Lorrmann

12/10/2024, 9:51 AM

Hi guys, are there any examples using @subdag or even @parameterized_subdag? I consider adding a feature to flowerpower to chain multiple pipelines. I think subdags are the way to go, right?

Paul

01/28/2025, 11:13 AM

Hi All, Does anyone know how I could import a set of modules but have them underneath a namespace? (the aim is allow engineres to explore/autocomplete a bit easier from within jupyter notebooks)

Copy code

import ourlib.modules as mod
my_modules = [mod.load_csv, mod.compute1, mod.compute2, mod_save_csv]

dr = driver.Builder()
           .with_modules(*my_modules)
           .with_config(my_config)

# where file system is strutured - ourlib - modules - load_csv.py - compute1.py - compute2.py - save_csv.py

Keshav Ravi

01/28/2025, 11:14 AM

Hi Guys, I was configuring WrenAI with Ollama using docker in windows platform. Everything seems to be fine but at last for generating the questions I'm getting as below. Can someone help. I have sufficient amount of memory. 2025-01-28 164207 ------------------------------------------------------------------- 2025-01-28 164207 2025-01-28 164207 E0128 111207.745 32 wren-ai-service:60] An error occurred during question recommendation generation: litellm.APIError: APIError: OpenAIException - Error code: 500 - {'error': {'message': 'model requires more system memory (40.8 GiB) than is available (3.5 GiB)', 'type': 'api_error', 'param': None, 'code': None}} 2025-01-28 164207 INFO: 172.18.0.5:37970 - "GET /v1/question-recommendations/efe80dc2-8009-42a9-959d-b5319a524d16 HTTP/1.1" 200 OK Regards

Trần Hoàng Nguyên

02/17/2025, 6:40 AM

Hi guys, I wonder if there are any guides or standards on doing retries

Elijah Ben Izzy

02/27/2025, 3:56 PM

This might be a good q for the wren AI team 🙂

Slackbot

02/27/2025, 9:51 PM

This message was deleted.

Evan Lutins

03/12/2025, 2:43 PM

Hey guys - is there any way to hide

ValidationResult

nodes from the DAG when calling

visualize_execution()

? I tried passing in

bypass_validation=True

argument but didnt work. here is the code used to generate my dag.

Copy code

dr.visualize_execution(
    final_vars=outputs,
    inputs=inputs,
    bypass_validation=True
)

The returned DAG contains a

{node-name}_raw

and

{node-name}_validator

for each node decorated with a

@check_output

. Ideally I would just like a single

{node_name}

represented in the DAG

Victor Bouzas

03/14/2025, 10:07 AM

Hey guys, quick question. Is there a simple way to get the topologically sorted list of nodes from an Driver/Execution?

Volker Lorrmann

03/26/2025, 10:06 AM

Hi guys, the hamilton ray adapter is using Ray Workflows. The ray docs mentions, that Ray Workflows is deprecated and will be removed. https://docs.ray.io/en/latest/workflows/index.html Are there any plans to update the hamilton ray adapter? Thanks!

Bob Gregory

04/09/2025, 5:47 PM

I'm looking at using Hamilton to replace a home-grown quasi declarative way of setting up feature transforms for models. We often want to test the effect of different lags or aggregations on model performance, and so an experiment might have 15,30,45,60...90 minute lags defined for a particular feature. What's the right way to represent that as a hamilton function?

parameterize_value

? The "reusing_functions" example in your repo focuses on subdags instead.

Liang junjie

07/21/2025, 3:15 AM

👋 大家好！

Liang junjie

07/21/2025, 3:16 AM

InternalError.Algo.InvalidParameter: Value error

Liang junjie

07/21/2025, 3:16 AM

Liang junjie

07/21/2025, 3:17 AM

I have a error from service docker detail

Liang junjie

07/21/2025, 3:17 AM

openai.BadRequestError: Error code: 400 - {'error': {'code': 'InvalidParameter', 'param': None, 'message': '<400> InternalError.Algo.InvalidParameter: Value error, batch size is invalid, it should not be larger than 10.: input.contents', 'type': 'InvalidParameter'}, 'id': '8e303665-0ecf-9954-954e-e8bb52c46f17', 'request_id': '8e303665-0ecf-9954-954e-e8bb52c46f17'}

Cooper Snyder

08/12/2025, 3:27 PM

Hey quick beginner question, can the dags in hamilton/burr be executed from clicking a button in the UI or only from code?

Gavin Kistner

09/16/2025, 4:58 PM

Just wanted to point out the elegance of the return dict from

execute()

matching the dict passed as ~inputs~_*overrides*_, such that it provides really easy checkpoint-style processing.

Copy code

# foo and bar are expensive to compute
setup = {…}
midpoint = dr.execute(["foo", "bar"], inputs=setup)

# Now we can use foo and bar as overrides
vars1 = {…}
results1 = dr.execute(["baz"], inputs=setup | vars1, overrides=midpoint)

vars2 = {…}
results2 = dr.execute(["baz"], inputs=setup | vars2, overrides=midpoint)

Nice!

❤️ 1