Stefan Krawczyk
11/22/2023, 9:15 PMpip install sf-hamilton==1.38.1
What’s changed since the last release? No new features, but a few examples and fixes.
• we have an example up that helps show how to use code from the `hub` .
• adds plotly materialization example.
• we’ve fixed a few minor issues, e.g. type annotations, code that interfaces with https://hub.dagworks.io/.
Thanksgiving
Since it’s Thanksgiving week here in the 🇺🇸 we’d just like to thank you all for being part of the community. I’m personally always excited to learn about all the different use cases and ways that people are applying Hamilton, as well as your thoughts and ideas for extending it.
Special shout outs gratitude thank you to all those that have contributed code this year (sorry if I miss someone) @Bryan Galindo, @Jordan Smith, @Flavia Santos, @Michal Siedlaczek, @cryptic, @Thierry Jean @Swapnil Dewalkar, @Ben Hack, @Elijah Ben Izzy + few others whose github handles I haven’t mapped to slack.
As well as a special awthanks to the people who have found 🐛 this year, and have helped us validate releases, e.g. @Jan Hurst,@Amos, @miek, @Seth Stokes, + a few more I’m sure I’m missing.
Thanks again, and we’re excited to continue to build more into Hamilton with you all.
Full Changelog: sf-hamilton-1.38.0...sf-hamilton-1.38.1Elijah Ben Izzy
11/27/2023, 9:34 PMpip install sf-hamilton==1.39.0
What’s changed?
• Bug fix/efficiency improvement for Parallelizable
(thanks @Seth Stokes for identifying)
• Improved Variable
object in driver so you can do stuff with your nodes! (thanks @miek for asking for this!)
• Better integration with @parameterize
+ some other decorators and Parallelizable
• Bug fix with Annotated
as input type (thanks @Sam Brockie! for finding)
• Better guard rails around ad_hoc_utils.create_temporary_module
(thanks @Wentao Lu for identifying!)
🎵 Release notes
📖 Full changelog
Releases like this with lots of usability improvements would be impossible without such a great community — so, to echo the post above, and in the sprit of thanksgiving, thanks to all of you for contributing, debugging, and sharing ideas!
Up next we’re fully launching hub.dagworks.io! Stay tuned.Stefan Krawczyk
11/28/2023, 7:05 PMJustin Donaldson
12/01/2023, 12:21 AMJustin Donaldson
12/01/2023, 12:22 AMJustin Donaldson
12/01/2023, 12:22 AMStefan Krawczyk
12/18/2023, 8:25 PM1.40.1
.
What’s changed for you:
• Includes larger error code block when a function encounters an error; so it’s harder to miss in the stack trace. E.g. you’ll see something like this now in your stack traces:
********************************************************************************
> spend_std_dev [my_functions.spend_std_dev()] encountered an error <
> Node inputs:
{'spend': 0 10
1 10
2 20
3 40
4 40
5 50
dtype: int64}
********************************************************************************
Thanks to @Gourav Kumar and a few others for inspiring the change.
• Fix for the Pandas NaN
validator - it can now handle empty series. Thanks @Michal Siedlaczek for the fix.
What’s new on the hub?
• There’s three new dataflows up that if used together helped create https://image-telephone.streamlit.app/ (see streamlit code here).
◦ Caption images
◦ Convert and save images to s3
◦ Generate images
Full change log.Stefan Krawczyk
12/19/2023, 6:05 PMStefan Krawczyk
12/20/2023, 6:11 PMStefan Krawczyk
12/21/2023, 12:41 AM1.41.0
!
What’s changed for you:
• we fixed a bug with Pandera + Dask dataframe & series support with check_output. Thanks @Hvuj for flagging!
• we exposed a more ergonomic way to set Ray configuration for the Ray task based executor.
Full change log hereElijah Ben Izzy
12/27/2023, 5:55 PMsf-hamilton==1.42.0
🎆
What’s new:
• Specify which columns a dataframe contains with @schema.output
(thanks to @Roel Bertens for the addition!)
• Customize every aspect of execution with lifecycle adapters
More in 🧵 — we’re really excited about the possibilities these unlock, so reach out if you have any ideas!
Full change log hereStefan Krawczyk
01/02/2024, 3:00 PM1.43.0
.
3. 🎙️ We’re excited to organize some meet-ups (largely virtual) this year.
📕* Blog Post*
This post is two parts — a high level overview, and a more technical deep dive on how to build ML Pipelines with Hamilton. Part I is meant to be read in one sitting, Part II is not 😉 .
✉️ `1.43.0`:
• The caching adapter got a pickler serializer
• We added a new materializer to write a stream of bytes to file.
🎙️ Meet ups:
• Stay tuned for more details, but we’ll be organizing a meet-up of this community, for this community, in February. Details to follow — we’ll likely poll a few of you for days & times before settling on a date. We’ll also be soliciting topics/speakers! So if you want to share about how you’re using Hamilton ping me. Else expect at least a roadmap discussion that we would love your input on.
Thanks for being part of this community we’re excited for the roadmap and year ahead. 🚀Elijah Ben Izzy
01/02/2024, 8:15 PM@schema.output
decorator. Thanks @Roel Bertens for flagging!
pip install sf-hamilton==1.43.1
Release notes: https://github.com/DAGWorks-Inc/hamilton/releases/tag/sf-hamilton-1.43.1Stefan Krawczyk
01/12/2024, 2:30 PM1.44.0
What’s new:
• The ability to
◦ (a) pass in lists of strings for tags
◦ (b) pass in a “query” to filter what is returned from dr.list_available_variables()
So you can now do this:
@tag( business_value=["CA", "US"])
def combo_node():
...
@tag( business_value=["US"] )`
def only_us_node():
...
@tag( business_value=["CA"], some_other_tag="BAR")
def only_ca_node():
...
and then filter what’s returned based on them:
# case A: returns only_ca_node() and combo_node()
dr.list_available_variables(tag_filter=dict(business_lines=["CA"]))
# case B: returns only_us_node() and combo_node()
dr.list_available_variables(tag_filter=dict(business_lines="US"))
# case C: returns all 3 nodes
dr.list_available_variables(tag_filter=dict(business_lines=["CA", "US"] ))
# case D: returns all 3 nodes
dr.list_available_variables(tag_filter=dict(business_lines=None ))
# case E: returns 0 nodes
dr.list_available_variables(tag_filter=dict(business_lines="UK" ))
# case F: returns 0 nodes
dr.list_available_variables(tag_filter=dict(business_lines="US", some_other_tag="FOO" )
# case G: returns 1 node - only_ca_node()
dr.list_available_variables(tag_filter=dict(business_lines="CA", some_other_tag="BAR" )
Thanks to @miek for the feature request.
Other updates:
• We’re revamping the main docs a little, trying to simplify it — if you find something/have thoughts, let us know (thanks @Thierry Jean).
• You can now watch my talk on . This is would be a talk to share with those that like software engineering principles.
• @Elijah Ben Izzy wrote a post discussing the trade-offs of how to structure your code, and how Hamilton helps here. This is a good post for those who think that Hamilton is too much structure, or want framing as to what a “platform” should ultimately be doing.
• If you’re interested in customizing Hamilton’s visualization, you might want to chime in on this discussion.
Thanks all, and have a great weekend!Elijah Ben Izzy
01/16/2024, 9:40 PMpip install sf-hamilton==1.45.0
What’s new?
• Added lifecycle validators to enable static (node/graph level validations). Use these as you would a lifecycle adapter.
• Added the new HamiltonNode
and HamiltonGraph
object so you have a publicly-available way to browse/mange the DAG.
• Added a progress bar — this is another use of lifecycle customization. Thanks @emily rexer for the suggestion!
We also wrote about lifecycle adapters in this post - currently its probably the best place to get started. There’s an overview of the architecture/design with some examples.
To add to the exciting news, we’re hosting a hamilton meetup in February! Please fill out https://www.meetup.com/global-hamilton-open-source-user-group-meetup/ if you’re interested.Stefan Krawczyk
01/23/2024, 4:16 PMpip install sf-hamilton==1.46.0
What’s new?
• Datadog integration! You can now easily get trace spans tracked corresponding to your Hamilton code. See the blog section below for the write up. To use it, it makes use of the new lifecycle APIs and it’s a one line addition to use:
from hamilton.plugins import h_ddog
from hamilton import driver
datadog_hook = h_ddog.DDOGTracer(root_name="hamilton_dag_trace")
dr = (
driver
.Builder()
.with_modules(...)
.with_adapters(datadog_hook)
.build()
)
• We have an export_execution()
on the driver thanks to @Alec Hewitt (his first contribution 🍾). This allows you to export a JSON representation of what Hamilton is going to execute.
• For creating image files of your DAG, you now don’t have to specify an output format, we’ll try to infer it from the suffix of the file path provided, and default to png
. It’ll follow the following precedence:
The output file format is determined through the following steps, each one overwriting the previous one:
1. if `output_file_path` has no file extension, a PNG file is generated (e.g., `/path/to/file -> /path/to/file.png`)
2. if `output_file_path` has a file extension, graphviz will use the specified format (e.g., `/path/to/file.svg -> /path/to/file.svg`))
3. if a format value is specified for `render_kwargs={"format": "pdf"}`, it overrides any other inputs (e.g., /path/to/file.svg -> /path/to/file.pdf)
⚠️ If you used the .dot
file you may have to change things. Please reach out if this impacts you negatively.
What’s updated?
• We’re adding a dedicated “integrations” section in the docs. This is to help make it simpler and faster to determine how to use Hamilton with something.
◦ Check out the FastAPI integration notes.
◦ Check out the Streamlit integration notes.
New Blog:
• To accompanying the datadog integration, we wrote a post about it.
Find full release details here.Stefan Krawczyk
02/07/2024, 7:01 PMpip install sf-hamilton==1.48.0
whats new?
• Experiment Tracker and UI — see @Thierry Jean’s post in general.
• Adds GraphConstructionHook.
• Truncates inputs when errors are encountered. h/t @Michal Siedlaczek for flagging.
• Adds bypass_validation
key word argument for visualization functions - that way you can visualize things without having to provide inputs.
• Materializers/data-savers: fixes all local file loader metadata to have a uniform shape
• Fixes regression in visualization functions that stripped the path of the file. h/t @Roel Bertens for finding the 🐛 .
• Adds string contains and not contains validators for check output.
What’s updated?
• we added comparisons to Langchain’s LCEL and Hamilton to the docs.
• the hub now has:
◦ A conversational RAG example
◦ A FAISS RAG example
◦ A simple LLM evaluation grader example
New Blogs:
• Thierry’s post on building a lightweight experimentation tracking tool with Hamilton.Stefan Krawczyk
02/13/2024, 3:30 PMpip install sf-hamilton==1.49.1
What’s new?
• Adds saver/loader for excel in pandas extensions (#342) by @tyapochkin in #683
◦ this adds to our pandas “materializers”, so now you can inject a to.excel
call into your DAG.
• Fix path metadata by @elijahbenizzy in #686
• fix: fix typo in extract_columns decorator example by @ninoseki in #691
• Adds first pass at jupyter magic by @skrawcz in #689
◦ You can now more ergonomically iterate in a notebook with our very first Jupyter Notebook Magic!
# load the extension
%load_ext hamilton.plugins.jupyter_magic
Then in every cell you want to create a module on the fly:
%%cell_to_module -m MODULE_NAME --display --rebuild-drivers
def my_funcs()...
This will now hopefully improve the ergonomics of developing in a notebook with Hamilton, because the magic will:
1. create a python module called MODULE_NAME
on the fly
2. it will then inject a graphviz picture of it at the bottom because --display
was used.
3. it will auto rebuild drivers that depend on this module because --rebuild-drivers
was used.
4. if you need to see more arguments, try %%cell_to_module --help
to list them.
5. when you’re done, you can then save that cell to a module by swapping out the magic for %%writefile MODULE_NAME.py
!
You can play with the example here, or read about it in the docs. Thanks to @Thierry Jean for inspiring this new feature. A blog on this will drop this week, otherwise we’re excited for feedback and ideas on how to extend or improve it further.
🍾 New Contributors 🎇
• @Konstantin Tyapochkin ( @tyapochkin) made their first contribution in #683
• @Manabu Niseki (@ninoseki) made their first contribution in #691
Thank you for taking the time to improve Hamilton!
Reminder: Meetup next week!
Just a quick reminder about the meetup next week. We’re excited to learn from @Arthur Andres, as well as deep dive on a Hamilton topic or two. Please let comment in this thread or DM @Elijah Ben Izzy or myself for anything specific you’d like covered.
Full Changelog: sf-hamilton-1.48.0...sf-hamilton-1.49.1Stefan Krawczyk
02/14/2024, 7:13 PMpip install sf-hamilton==1.49.2
What’s in the patch:
• fix for @tag_outputs to tag intermediate “nodes”.
• enables JSON materializer to handle a top level list.
What’s in the blog post?
• This is a blog on creating and using Jupyter Magics to improve the notebook experience. This complements the new magic we pushed out yesterday for Hamilton and is an explainer on how to build one.
https://blog.dagworks.io/p/using-ipython-jupyter-magic-commands?r=2cg5z1&utm_campaign=post&utm_medium=webStefan Krawczyk
02/20/2024, 3:30 PMpip install sf-hamilton==1.50.1
!
What’s new?
• A new caching adapter under hamilton.plugins.h_diskcache
. This one hashes source code and inputs (i.e. finger prints things) and uses the diskcache library to pickle everything to disk… Thanks @Thierry Jean! How do you use this? Well here’s some code (see full example):
from hamilton import driver
from hamilton.plugins import h_diskcache # <--- this is what you import
import functions # your modules
# get the logger to view cache retrieval logging
import logging
logger = logging.getLogger("hamilton.plugins.h_diskcache")
logger.setLevel(logging.DEBUG) # or <http://logging.INFO|logging.INFO>
logger.addHandler(logging.StreamHandler())
# build driver with cache hook
dr = (
driver.Builder()
.with_modules(functions)
.with_adapters(h_diskcache.DiskCacheAdapter()) # <--- add it here
.build()
)
# use execute or materialize as usual
dr.execute(["C"])
# then run it again -- it will be cached
dr.execute(["C"])
# then go change some code -- you'll see only things that have changed will be recomputed..
dr.execute(["C"])
How does this differ from the CachingGraphAdapter? The existing CachingGraphAdapter requires you to tag functions, and you specify the format to serialize things into -tag(cache="parquet")
and you have to manage the cache state and drop it etc when code changes, or inputs.
Both ways to cache have their sweet spots, and we’d love feedback and we’re open to improving either of them!
• We also have updated the main hamilton docs to more clearly explain the basic constructs:
◦ functions & nodes
◦ driver
◦ visualization
◦ materialization
◦ function modifiers
◦ driver Builder object
Reminder
• 🎤 meet-up today - sign-up here.
◦ We’re excited to hear from @Arthur Andres, doing a deep dive on structuring projects, an overview of @subdag
and then chatting roadmap!Stefan Krawczyk
02/27/2024, 3:44 PMpip install sf-hamilton==1.51.1
!
What’s new?
• 📢 Announcing Office hours - roughly every Tuesday 9:30am PT, apart from when we have our meetup. We’ll throw a link in the #hamilton-help channel.
• 🔎 Vaex decorator support. Thanks @Konstantin Tyapochkin!
• 🔎 Hamilton CLI. Thanks @Thierry Jean!
◦ Along with an accompanying blog!
• 🪄 Jupyter magic for Hamilton now displays the DAG correctly in a databricks notebook!
• 🤝 Meet-up for March.
📢 Office hours!
We’re excited to have an hour a week anyone can drop. It’ll be roughly every Tuesday at 9:30am Pacific Time for about an hour, apart from when we’re holding our meet-up. To join, we’ll drop a google meet link in the #hamilton-help channel. It’s starting today!
For those in the community where this is in the middle of the night, reach out and we could do something ad-hoc for you.
🔎 Details - Vaex:
Vaex is another dataframe library. We’ve now got decorator support for it, so you can use it with @extract_columns. In addition we’ve added a basic vaex result builder.
Check out the respository example here.
🔎 Details - Hamilton CLI:
You’ll need to install the extra package
pip install sf-hamilton[cli]
Then verify it installed:
hamilton --help
Things you can do:
• “build” a module via the command line. e.g hamilton build module_v1.py
• “build and view a module” via the command line, e.g. hamilton view --output ./dag.png module_v1.py
• get the diff between the python module now, and some git reference (default is commit prior) — hamilton diff --view --output ./diff.png module_v1.py
Read more about it in the accompanying blog post, and docs. We see the CLI as a great tool to add to your CI step to help understand and see changes.
🪄 Jupyter magic for databricks notebooks
The recent ipython jupyter magic is now extended to display properly in a databricks notebook. Databricks notebooks didn’t natively display graphviz objects, so we had to adjust the code. So no change on your part to use, just use as usual, and now the graph will be displayed.
🤝 Meet-up for March
Sign up for the next meetup to be held on March 19th. @Roel Bertens will be giving a talk in the community spotlight corner about feature engineering.
For the deep dive section, we’re still taking suggestions. So if you’d like to know more about Hamilton, let us know.Stefan Krawczyk
02/27/2024, 5:30 PMStefan Krawczyk
03/05/2024, 5:31 PMStefan Krawczyk
03/12/2024, 3:42 PM1.53.0
and office hours in ~ 45 mins at 9:30am Pacific Time (meet.google.com/enx-bhus-fae).
What’s new
• Adds target_ parameter to save_to by @elijahbenizzy in #744
• cli
added config support and validate command by @zilto in #729
• Updates docstring of data adapters to be public facing by @elijahbenizzy in #752
• Adds FunctionInputOutputTypeChecker by @skrawcz in #757
Show casing an example of the Lifecycle APIs we released, there is now an adapter you can add that will at runtime validate the types of the inputs & output matches the expected annotated types on functions. To use it just you can do the following to add it to your driver:
from hamilton import base, driver, lifecycle
driver = (
driver.Builder()
.with_config({})
.with_modules(my_functions)
.with_adapters(
# this is a strict type checker for the input and output of each function.
lifecycle.FunctionInputOutputTypeChecker(),
)
.build()
)
Documentation updates:
• Adds Parallelism Caveats to documentation by @skrawcz in #745
• Adds more to parallel caveats by @skrawcz in #746
• docs/how-tos/
pre-commit by @zilto in #750
hub.dagworks.io and examples/
Updates:
• Examples: Example with pandas for split apply combine by @nhuray in #753 (see README)
• Adds document chunking example to hub by @skrawcz in #755
Blog:
• we have a new blog on using Hamilton for the ingestion part of RAG pipelines and then scaling that to Ray, Dask, and PySpark.
New contributor 🚀*:*
• @Nicolas Huray added the split-apply-combine example. Thank you! 🙏
Meet-up Next week
• Don’t forget to sign up for the meet-up next week
◦ @Roel Bertens will show feature engienering, while the deep-dive will be on parameterization/re-use of DAGs.
Full Changelog: sf-hamilton-1.52.0...sf-hamilton-1.53.0Elijah Ben Izzy
03/19/2024, 4:12 PMsf-hamilton==1.54.0
What’s new?
• Improvements to visualization
• new node versioning API — allows you to get node versions (stable hashes)
• New caching adapter — uses just stdlib (shelve
). Example:
from hamilton import driver
from hamilton.lifecycle.default import CacheAdapter
dr = (
driver
.Builder()
.with_modules(features, model, evaluation)
.with_adapters(
# now everything will be cached!
CacheAdapter()
)
.build()
)
Towards Data Science writeup on pre-commit hooks by @Thierry Jean!
• Uses Hamilton pre-commit hooks as an example
• https://towardsdatascience.com/custom-pre-commit-hooks-for-safer-code-changes-d8b8aa1b2ebb
Excited to see you all at 9:30 PT!Stefan Krawczyk
03/28/2024, 6:01 PM1.55.1
.
This includes a fix for graph.version which is used in the experiment tracker adapter.Thierry Jean
04/23/2024, 2:17 PMStefan Krawczyk
04/30/2024, 4:58 PMfix
closed CacheAdapter
by @zilto in #847
• Changes jupyter magic to create temporary files by @skrawcz in #855
📚 Documentation / Examples:
• Update documentation to resolve a small typo in glossary by @bustosalex1 in #857
• Update link in glossary to use reST formatting by @bustosalex1 in #858
----------------------------
Reminder: Office hours
----------------------------
• Tuesday, April 30th · 9:30 – 10:30am; Time zone: America/Los_Angeles
• Come ask questions/get help/etc.
• Use this link to join.
--------------
Blog Posts
--------------
If you want to know more about our motivations and features of the UI we direct you to this blog post.Stefan Krawczyk
05/01/2024, 8:43 PMStefan Krawczyk
05/02/2024, 2:49 PM