Hi guys, I am a huge fan of hamilton (many thanks...
# general
v
Hi guys, I am a huge fan of hamilton (many thanks to @Stefan Krawczyk and @Elijah Ben Izzy), but I´ve missed a easy way to schedule my hamilton dataflows and futhermore I´d like to be able to parameterize my as much of my production deployment using yaml files. To address these "shortcomings" I have created my own python library/framework called FlowerPower. https://github.com/legout/flowerpower FlowerPower is a simple workflow framework based on the fantastic python libraries Hamilton and APScheduler (Advanced Python Scheduler). Hamilton is used as the core engine to create Directed Acyclic Graphs (DAGs) from your pipeline functions and execute them in a controlled manner. It is highly recommended to read the Hamilton documentation and check out their examples to understand the core concepts of FlowerPower. APScheduler is used to schedule the pipeline execution. You can schedule the pipeline to run at a specific time, at a specific interval or at a specific cron expression. Furthermore, APScheduler can be used to run the pipeline in a distributed environment. In this case you need to setup a data store (e.g. postgres, mongodb, mysql, sqlite) to store the job information and an event broker (e.g. redis, mqtt) to communicate between the scheduler and the workers. At least a data store is required to persist the scheduled pipeline jobs after a worker restart, even if you run on a single machine. Regards, Volker
👍 1
❤️ 3
🚀 1
t
It's awesome to see people building on top of Hamilton, and thanks for contributing back to the OS community! You might be interested by Rocketry for lightweight scheduling (no updates since 2022) https://github.com/Miksus/rocketry
v
Thanks for mentioning rocketry. I´ve used it in the passed, but it did not meet all my requirements for flowerpower, especially rocketry is missing a task/job queue. Using apscheduler with a postgres or redis as data store and event broker allows me to run my workflow in a distributed environment (e.g. several seperate workers).
👍 1
s
@Volker Lorrmann cool — will take a closer look sometime this week. Otherwise in terms of parameterization of Hamilton DAGs you might like today’s hamilton meetup, since @Gilad Rubin has been thinking about that well.
🙏 1
v
@Stefan Krawczyk I think the Readme.md and the hello world example (in
examples/hello_world
) should give you a good overview. How can I join the meetup?
👍 1
i
@Volker Lorrmann how does your library compare to airflow?
v
@Iliya R Flowerpower only addresses a small subset of airflows functionalities and purpose and is of course way less mature. Flowerpower is especially (and only) designed to run hamilton dataflows. It adds a scheduler/worker to Hamilton and allows to parameterize the dataflow and also the execution of the dataflow using yaml files. And by using a data store and event broker (e.g. postgres, redis, mqtt, mongodb,...) can deploy your scheduled dataflows on any remote machine, where you can start a Flowerpower worker.
gratitude thank you 1
e
This is really cool! Nice work!
🙏 1
v
This is the summary of of flowerpowers main features from claude ai 😉 The main features of the FlowerPower framework are: 1. Pipeline Workflows: FlowerPower uses the Hamilton library to create Directed Acyclic Graphs (DAGs) from pipeline functions, allowing you to define and execute complex workflow pipelines. 2. Scheduling and Execution: FlowerPower integrates the APScheduler library to allow scheduling of pipeline executions at specific times, intervals, or based on cron expressions. It supports running pipelines in a distributed environment with a data store and event broker. 3. Parameterization: FlowerPower allows you to parameterize your pipeline functions, either by setting default values or defining parameters in a configuration file. 4. Tracking and Monitoring: FlowerPower can integrate with the Hamilton UI to provide tracking and monitoring of pipeline executions, including visualization of the DAG. 5. Flexible Configuration: FlowerPower uses configuration files (
conf/pipelines.yml
,
conf/scheduler.yml
,
conf/tracker.yml
) to set up pipelines, scheduling, and tracking, allowing for easy customization. 6. Distributed Execution: FlowerPower supports running pipelines in a distributed environment by using a data store (e.g., PostgreSQL, MongoDB, SQLite) to persist job information and an event broker (e.g., Redis, MQTT) for communication between the scheduler and workers. 7. Easy Setup and Usage: FlowerPower provides command-line tools and Python APIs for initializing new projects, adding new pipelines, running and scheduling pipelines, and starting workers. Overall, FlowerPower aims to provide a simple and flexible workflow framework that combines the power of Hamilton and APScheduler to enable the creation and execution of complex data pipelines, with support for scheduling, distribution, and monitoring.