https://github.com/stitchfix/hamilton logo
Join Slack
Powered by
# general
  • c

    Carl Trachte

    07/04/2024, 11:39 PM
    Thanks for the kind words, Elijah. Feel free to do whatever you want with the blog. I'm working on the stuff Stefan pointed out.
    awthanks 2
  • c

    Carl Trachte

    07/06/2024, 1:22 AM
    The follow up blog post on tinkering with the graphviz output - it may not be of relevance to you. But you do get mentioned a fair bit and I want to make sure I don't misrepresent anything. Thanks. https://pyright.blogspot.com/2024/07/graphviz-editing-dag-hamilton-graph-dot.html
    e
    • 2
    • 2
  • c

    Carl Trachte

    07/07/2024, 5:33 PM
    I took the big PNG Hamilton logo Stefan gave me and ran it through and online PNG to SVG converter. It appears to be legit and renders pretty nicely. Thanks.
    hamiltonlogolarge.svg
    🙌 3
    s
    • 2
    • 2
  • v

    Vadim Ogranovich

    07/29/2024, 7:08 PM
    I was looking, to no avail, for a way to set
    final_vars
    in the Builder, something like
    driver.Builder().with_final_vars(...)...
    . I understand I can achieve the same effect via
    dr.execute(final_vars=final_vars)
    , however providing final_vars at the build stage has a potential of greatly reducing the build time, or doesn't it? What am I missing?
    s
    • 2
    • 5
  • c

    Carl Trachte

    08/10/2024, 1:58 AM
    Another blog post. Not super germane, but it does link to the project, present output, and talk about the project a little. Thanks. https://pyright.blogspot.com/2024/08/embedding-svg-in-graphviz-generated-svg.html
    🙌 2
    s
    • 2
    • 2
  • v

    Volker Lorrmann

    08/20/2024, 3:31 PM
    Hi guys, I am a huge fan of hamilton (many thanks to @Stefan Krawczyk and @Elijah Ben Izzy), but I´ve missed a easy way to schedule my hamilton dataflows and futhermore I´d like to be able to parameterize my as much of my production deployment using yaml files. To address these "shortcomings" I have created my own python library/framework called FlowerPower. https://github.com/legout/flowerpower FlowerPower is a simple workflow framework based on the fantastic python libraries Hamilton and APScheduler (Advanced Python Scheduler). Hamilton is used as the core engine to create Directed Acyclic Graphs (DAGs) from your pipeline functions and execute them in a controlled manner. It is highly recommended to read the Hamilton documentation and check out their examples to understand the core concepts of FlowerPower. APScheduler is used to schedule the pipeline execution. You can schedule the pipeline to run at a specific time, at a specific interval or at a specific cron expression. Furthermore, APScheduler can be used to run the pipeline in a distributed environment. In this case you need to setup a data store (e.g. postgres, mongodb, mysql, sqlite) to store the job information and an event broker (e.g. redis, mqtt) to communicate between the scheduler and the workers. At least a data store is required to persist the scheduled pipeline jobs after a worker restart, even if you run on a single machine. Regards, Volker
    👍 1
    ❤️ 3
    🚀 1
    t
    s
    +2
    • 5
    • 10
  • c

    Carl Trachte

    08/26/2024, 7:42 PM
    Logo stuff - colleague (our dba) on logo: "The star reminds me of a Spanish Shawl nudibranch" Not sure if it was intentional. The logo is one of the more attractive one's I've seen.
    😆 3
    💡 2
  • v

    Volker Lorrmann

    08/29/2024, 12:10 PM
    @Carl Trachte Created by DallE3 using this prompt: Dataflow becomes a Flower. Flowerpower!
  • c

    Carl Trachte

    08/31/2024, 6:41 AM
    More logo. A blog post about Blogger and svg - it did not go particularly well, but the logo is pretty as always. https://pyright.blogspot.com/2024/08/scaleable-vector-graphics-svg.html
  • c

    Carl Trachte

    08/31/2024, 1:49 PM
    One last logo post (for now). https://pyright.blogspot.com/2024/08/scalable-vector-graphics-followup.html
    🙌 2
    s
    • 2
    • 2
  • c

    Carl Trachte

    09/27/2024, 8:21 PM
    Another svg / logo / graph blog post (minimal). https://pyright.blogspot.com/2024/09/dag-hamilton-graph-presented-as-svg-in.html
    🙌 1
    e
    • 2
    • 3
  • c

    Cooper Snyder

    09/30/2024, 12:31 PM
    hey just getting started looking into Hamilton, I think there might be some examples out there but I was wondering if anyone had examples or a blog of testing approaches/organization in a repo using Hamilton to structure all of the pipelines. I was imagining a really powerful setup would be to have like the tox testing environment spin up a hamilton server and for all of the unit tests to be registered in there so anyone could clone the repo and inspect the pipeline flows? I know this is a bit of a wide question and i can see many different ways of doing it but was wondering what everyones' approaches are to that? Thank you
    e
    s
    • 3
    • 4
  • c

    Cooper Snyder

    10/07/2024, 9:48 PM
    hey thanks for the help above, but I have another basic question, I want to be able to expose a "pure" function version of a dag flow so a user/developer can leverage the full flow in a macro orchestrator but also have the core 'algorithm' ready to go if they want to lift and shift it out to another runtime or maybe even into another bigger dag... it feels like an antipattern to have a Hamilton dag for each part of the ETL but im feeling a pattern of something like this for helping long term memory as the amount of dags grows in a code base
    Copy code
    class OrchestratableTask(BaseModel):
    
    def setup(self, *args, **kwargs):
       #environment, application, runtime specific setup.
    
    def extract(self, *args, **kwargs):
        #external state and external data from target system
    
    def run_pure_transform(self, *args, **kwargs):
        #pure, deterministic (enough) function based on inputs
    
    def load(self, *args, **kwargs):
        # load results to external database
    
    def run_transform_w_io_side_effects(self, *args, **kwargs):
       extracted_data = self.extract()
       transformed_data = self.run_pure_transform(extracted_data)
       self.load(extracted_data)
    
    if __name__ == __main__:
    
        # add arg parser
        task = OrchestratableTask()
        task.setup(*args, **kwargs)
        task.run_transform_w_io_side_effects(*args, **kwargs)
    where id have like a command/strategy pattern with args and kwargs controlling the behavior of the functions flow (i know it'd go into those config when decorators), and have whatever business logic right there in the transform flow, but im running into the code smells of mixing object oriented with functional, doing like a hamilton dag for each step and then another hamilton dag for those dags (i dont think this works well...) but im feeling a bit analysis paralysis; has anyone run into this idea or anything like it? any criticism for that design? I feel like from reading the docs, idiomatically you'd just make it one hamilton dag with the dataloader and datasavers and config.when decorators, but I REALLY wanted to try to make it obvious to developers that those are the main 4 abstractions required for a singular OrchestratableTask and let someone pip install package that houses all of the subclass tasks and be able to run the pure function however they like in a discovery environment like a notebook. Is this overcomplicating it with the Task class? Thank you!
    👀 1
    e
    s
    • 3
    • 3
  • d

    David Medinets

    10/11/2024, 3:43 AM
    Hello. Is the Hamilton documentation in a PDF file I can download for offline reading?
    s
    • 2
    • 36
  • j

    Jonas Meyer-Ohle

    10/16/2024, 3:20 PM
    Hi there, thanks for creating Hamilton, it's been a blast using it. Before I create a bug report, I have a question about using hamilton pandera dataframe validators with polars. I saw the following linkedin post. It mentions there that using the check_output decorator should is supported for panderas + polars. However I'm getting the following error when running the minimal example found here: https://github.com/jonas-meyer/hamilton_polars_pandera
    Actual error: No registered subclass of BaseDefaultValidator is available for arg: schema and type <class 'polars.dataframe.frame.DataFrame'>. This either means (a) this arg-type contribution isn't supported or (b) this has not been added yet (but should be). In the case of (b), we welcome contributions. Get started at <http://github.com/dagworks-inc/hamilton|github.com/dagworks-inc/hamilton>.
    I stepped through the following file a bit: https://github.com/DAGWorks-Inc/hamilton/blob/main/hamilton/data_quality/pandera_validators.py#L9 And it seems like the polars plugin isn't part of the supported extensions, I'm assuming this is the issue? Thanks!
    👀 1
    t
    • 2
    • 6
  • j

    Justin Donaldson

    10/21/2024, 5:51 PM
    I read through this pandera integration post. Anybody have a trip report on Polars? I'm debating whether to rely on it for a data pipeline project that targets dealing with serialized embeddings.
    t
    m
    e
    • 4
    • 29
  • v

    Volker Lorrmann

    10/28/2024, 8:40 AM
    Hi guys, I have updated and refactored Flowerpower. Give it a try. I am happy for any feedback. FlowerPower is a simple workflow framework based on two fantastic Python libraries: • Hamilton: Creates DAGs from your pipeline functions • APScheduler: Handles pipeline scheduling https://github.com/legout/flowerpower
    🔥 2
    🙌 2
    t
    • 2
    • 4
  • a

    Andres MM

    10/29/2024, 10:08 AM
    Quick question Error: Hamilton does not consider these types to be equivalent. If you believe they are equivalent, please reach out to the developers. Note that, if you have types that are equivalent for your purposes, you can create a graph adapter that checks the types against each other in a more lenient manner. What I want would be
    Copy code
    def bar_union(x: pd.Series) -> t.Union[int, pd.Series]:
        try: 
    return x
    def foo_bar(bar_union: int) -> int:
        return bar + 1
    t
    j
    • 3
    • 12
  • v

    Viktor

    11/28/2024, 6:14 PM
    Has anyone yet explored ways to integrate Serverless / Cloud Functions (Azure, AWS, DO etc.) in Hamilton DAGs? • Some compute requires more powerful resources than the Python environment Hamilton is running in. • Sometimes – the other way around – e.g. web hooks or events are better implemented as Serverless Functions. Are there any examples of how this may be set up?
    t
    • 2
    • 2
  • j

    Justin Donaldson

    12/04/2024, 10:58 PM
    Hey folks, happy Wednesday. I had a question about scenarios for ML training. There's a nice example for iris on the website : https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/model_examples/scikit-learn However, I'm interested in models that have more complex pipelines (e.g. text transformation). It's relatively easy to set up a training pipeline for it, and then it's possible to override the pipeline stages with some data for an inference pipeline, but that just winds up feeling super fragile.
    s
    e
    • 3
    • 28
  • v

    Volker Lorrmann

    12/10/2024, 9:51 AM
    Hi guys, are there any examples using @subdag or even @parameterized_subdag? I consider adding a feature to flowerpower to chain multiple pipelines. I think subdags are the way to go, right?
    s
    • 2
    • 2
  • p

    Paul

    01/28/2025, 11:13 AM
    Hi All, Does anyone know how I could import a set of modules but have them underneath a namespace? (the aim is allow engineres to explore/autocomplete a bit easier from within jupyter notebooks)
    Copy code
    import ourlib.modules as mod
    my_modules = [mod.load_csv, mod.compute1, mod.compute2, mod_save_csv]
    
    dr = driver.Builder()
               .with_modules(*my_modules)
               .with_config(my_config)
    # where file system is strutured - ourlib - modules - load_csv.py - compute1.py - compute2.py - save_csv.py
    s
    • 2
    • 3
  • k

    Keshav Ravi

    01/28/2025, 11:14 AM
    Hi Guys, I was configuring WrenAI with Ollama using docker in windows platform. Everything seems to be fine but at last for generating the questions I'm getting as below. Can someone help. I have sufficient amount of memory. 2025-01-28 164207 ------------------------------------------------------------------- 2025-01-28 164207 2025-01-28 164207 E0128 111207.745 32 wren-ai-service:60] An error occurred during question recommendation generation: litellm.APIError: APIError: OpenAIException - Error code: 500 - {'error': {'message': 'model requires more system memory (40.8 GiB) than is available (3.5 GiB)', 'type': 'api_error', 'param': None, 'code': None}} 2025-01-28 164207 INFO: 172.18.0.5:37970 - "GET /v1/question-recommendations/efe80dc2-8009-42a9-959d-b5319a524d16 HTTP/1.1" 200 OK Regards
    s
    • 2
    • 2
  • t

    Trần Hoàng Nguyên

    02/17/2025, 6:40 AM
    Hi guys, I wonder if there are any guides or standards on doing retries
    s
    • 2
    • 3
  • e

    Elijah Ben Izzy

    02/27/2025, 3:56 PM
    This might be a good q for the wren AI team 🙂
    t
    • 2
    • 1
  • s

    Slackbot

    02/27/2025, 9:51 PM
    This message was deleted.
    t
    a
    • 3
    • 2
  • e

    Evan Lutins

    03/12/2025, 2:43 PM
    Hey guys - is there any way to hide
    ValidationResult
    nodes from the DAG when calling
    visualize_execution()
    ? I tried passing in
    bypass_validation=True
    argument but didnt work. here is the code used to generate my dag.
    Copy code
    dr.visualize_execution(
        final_vars=outputs,
        inputs=inputs,
        bypass_validation=True
    )
    The returned DAG contains a
    {node-name}_raw
    and
    {node-name}_validator
    for each node decorated with a
    @check_output
    . Ideally I would just like a single
    {node_name}
    represented in the DAG
    e
    • 2
    • 3
  • v

    Victor Bouzas

    03/14/2025, 10:07 AM
    Hey guys, quick question. Is there a simple way to get the topologically sorted list of nodes from an Driver/Execution?
    t
    e
    s
    • 4
    • 8
  • v

    Volker Lorrmann

    03/26/2025, 10:06 AM
    Hi guys, the hamilton ray adapter is using Ray Workflows. The ray docs mentions, that Ray Workflows is deprecated and will be removed. https://docs.ray.io/en/latest/workflows/index.html Are there any plans to update the hamilton ray adapter? Thanks!
    s
    • 2
    • 13
  • b

    Bob Gregory

    04/09/2025, 5:47 PM
    I'm looking at using Hamilton to replace a home-grown quasi declarative way of setting up feature transforms for models. We often want to test the effect of different lags or aggregations on model performance, and so an experiment might have 15,30,45,60...90 minute lags defined for a particular feature. What's the right way to represent that as a hamilton function?
    parameterize_value
    ? The "reusing_functions" example in your repo focuses on subdags instead.
    e
    • 2
    • 4