Elijah Ben Izzy
01/02/2024, 8:15 PM@schema.output
decorator. Thanks @Roel Bertens for flagging!
pip install sf-hamilton==1.43.1
Release notes: https://github.com/DAGWorks-Inc/hamilton/releases/tag/sf-hamilton-1.43.1Stefan Krawczyk
01/12/2024, 2:30 PM1.44.0
What’s new:
• The ability to
◦ (a) pass in lists of strings for tags
◦ (b) pass in a “query” to filter what is returned from dr.list_available_variables()
So you can now do this:
@tag( business_value=["CA", "US"])
def combo_node():
...
@tag( business_value=["US"] )`
def only_us_node():
...
@tag( business_value=["CA"], some_other_tag="BAR")
def only_ca_node():
...
and then filter what’s returned based on them:
# case A: returns only_ca_node() and combo_node()
dr.list_available_variables(tag_filter=dict(business_lines=["CA"]))
# case B: returns only_us_node() and combo_node()
dr.list_available_variables(tag_filter=dict(business_lines="US"))
# case C: returns all 3 nodes
dr.list_available_variables(tag_filter=dict(business_lines=["CA", "US"] ))
# case D: returns all 3 nodes
dr.list_available_variables(tag_filter=dict(business_lines=None ))
# case E: returns 0 nodes
dr.list_available_variables(tag_filter=dict(business_lines="UK" ))
# case F: returns 0 nodes
dr.list_available_variables(tag_filter=dict(business_lines="US", some_other_tag="FOO" )
# case G: returns 1 node - only_ca_node()
dr.list_available_variables(tag_filter=dict(business_lines="CA", some_other_tag="BAR" )
Thanks to @miek for the feature request.
Other updates:
• We’re revamping the main docs a little, trying to simplify it — if you find something/have thoughts, let us know (thanks @Thierry Jean).
• You can now watch my talk on . This is would be a talk to share with those that like software engineering principles.
• @Elijah Ben Izzy wrote a post discussing the trade-offs of how to structure your code, and how Hamilton helps here. This is a good post for those who think that Hamilton is too much structure, or want framing as to what a “platform” should ultimately be doing.
• If you’re interested in customizing Hamilton’s visualization, you might want to chime in on this discussion.
Thanks all, and have a great weekend!Elijah Ben Izzy
01/16/2024, 9:40 PMpip install sf-hamilton==1.45.0
What’s new?
• Added lifecycle validators to enable static (node/graph level validations). Use these as you would a lifecycle adapter.
• Added the new HamiltonNode
and HamiltonGraph
object so you have a publicly-available way to browse/mange the DAG.
• Added a progress bar — this is another use of lifecycle customization. Thanks @emily rexer for the suggestion!
We also wrote about lifecycle adapters in this post - currently its probably the best place to get started. There’s an overview of the architecture/design with some examples.
To add to the exciting news, we’re hosting a hamilton meetup in February! Please fill out https://www.meetup.com/global-hamilton-open-source-user-group-meetup/ if you’re interested.Stefan Krawczyk
01/23/2024, 4:16 PMpip install sf-hamilton==1.46.0
What’s new?
• Datadog integration! You can now easily get trace spans tracked corresponding to your Hamilton code. See the blog section below for the write up. To use it, it makes use of the new lifecycle APIs and it’s a one line addition to use:
from hamilton.plugins import h_ddog
from hamilton import driver
datadog_hook = h_ddog.DDOGTracer(root_name="hamilton_dag_trace")
dr = (
driver
.Builder()
.with_modules(...)
.with_adapters(datadog_hook)
.build()
)
• We have an export_execution()
on the driver thanks to @Alec Hewitt (his first contribution 🍾). This allows you to export a JSON representation of what Hamilton is going to execute.
• For creating image files of your DAG, you now don’t have to specify an output format, we’ll try to infer it from the suffix of the file path provided, and default to png
. It’ll follow the following precedence:
The output file format is determined through the following steps, each one overwriting the previous one:
1. if `output_file_path` has no file extension, a PNG file is generated (e.g., `/path/to/file -> /path/to/file.png`)
2. if `output_file_path` has a file extension, graphviz will use the specified format (e.g., `/path/to/file.svg -> /path/to/file.svg`))
3. if a format value is specified for `render_kwargs={"format": "pdf"}`, it overrides any other inputs (e.g., /path/to/file.svg -> /path/to/file.pdf)
⚠️ If you used the .dot
file you may have to change things. Please reach out if this impacts you negatively.
What’s updated?
• We’re adding a dedicated “integrations” section in the docs. This is to help make it simpler and faster to determine how to use Hamilton with something.
◦ Check out the FastAPI integration notes.
◦ Check out the Streamlit integration notes.
New Blog:
• To accompanying the datadog integration, we wrote a post about it.
Find full release details here.Stefan Krawczyk
02/07/2024, 7:01 PMpip install sf-hamilton==1.48.0
whats new?
• Experiment Tracker and UI — see @Thierry Jean’s post in general.
• Adds GraphConstructionHook.
• Truncates inputs when errors are encountered. h/t @Michal Siedlaczek for flagging.
• Adds bypass_validation
key word argument for visualization functions - that way you can visualize things without having to provide inputs.
• Materializers/data-savers: fixes all local file loader metadata to have a uniform shape
• Fixes regression in visualization functions that stripped the path of the file. h/t @Roel Bertens for finding the 🐛 .
• Adds string contains and not contains validators for check output.
What’s updated?
• we added comparisons to Langchain’s LCEL and Hamilton to the docs.
• the hub now has:
◦ A conversational RAG example
◦ A FAISS RAG example
◦ A simple LLM evaluation grader example
New Blogs:
• Thierry’s post on building a lightweight experimentation tracking tool with Hamilton.Stefan Krawczyk
02/13/2024, 3:30 PMpip install sf-hamilton==1.49.1
What’s new?
• Adds saver/loader for excel in pandas extensions (#342) by @tyapochkin in #683
◦ this adds to our pandas “materializers”, so now you can inject a to.excel
call into your DAG.
• Fix path metadata by @elijahbenizzy in #686
• fix: fix typo in extract_columns decorator example by @ninoseki in #691
• Adds first pass at jupyter magic by @skrawcz in #689
◦ You can now more ergonomically iterate in a notebook with our very first Jupyter Notebook Magic!
# load the extension
%load_ext hamilton.plugins.jupyter_magic
Then in every cell you want to create a module on the fly:
%%cell_to_module -m MODULE_NAME --display --rebuild-drivers
def my_funcs()...
This will now hopefully improve the ergonomics of developing in a notebook with Hamilton, because the magic will:
1. create a python module called MODULE_NAME
on the fly
2. it will then inject a graphviz picture of it at the bottom because --display
was used.
3. it will auto rebuild drivers that depend on this module because --rebuild-drivers
was used.
4. if you need to see more arguments, try %%cell_to_module --help
to list them.
5. when you’re done, you can then save that cell to a module by swapping out the magic for %%writefile MODULE_NAME.py
!
You can play with the example here, or read about it in the docs. Thanks to @Thierry Jean for inspiring this new feature. A blog on this will drop this week, otherwise we’re excited for feedback and ideas on how to extend or improve it further.
🍾 New Contributors 🎇
• @Konstantin Tyapochkin ( @tyapochkin) made their first contribution in #683
• @Manabu Niseki (@ninoseki) made their first contribution in #691
Thank you for taking the time to improve Hamilton!
Reminder: Meetup next week!
Just a quick reminder about the meetup next week. We’re excited to learn from @Arthur Andres, as well as deep dive on a Hamilton topic or two. Please let comment in this thread or DM @Elijah Ben Izzy or myself for anything specific you’d like covered.
Full Changelog: sf-hamilton-1.48.0...sf-hamilton-1.49.1Stefan Krawczyk
02/14/2024, 7:13 PMpip install sf-hamilton==1.49.2
What’s in the patch:
• fix for @tag_outputs to tag intermediate “nodes”.
• enables JSON materializer to handle a top level list.
What’s in the blog post?
• This is a blog on creating and using Jupyter Magics to improve the notebook experience. This complements the new magic we pushed out yesterday for Hamilton and is an explainer on how to build one.
https://blog.dagworks.io/p/using-ipython-jupyter-magic-commands?r=2cg5z1&utm_campaign=post&utm_medium=webStefan Krawczyk
02/20/2024, 3:30 PMpip install sf-hamilton==1.50.1
!
What’s new?
• A new caching adapter under hamilton.plugins.h_diskcache
. This one hashes source code and inputs (i.e. finger prints things) and uses the diskcache library to pickle everything to disk… Thanks @Thierry Jean! How do you use this? Well here’s some code (see full example):
from hamilton import driver
from hamilton.plugins import h_diskcache # <--- this is what you import
import functions # your modules
# get the logger to view cache retrieval logging
import logging
logger = logging.getLogger("hamilton.plugins.h_diskcache")
logger.setLevel(logging.DEBUG) # or <http://logging.INFO|logging.INFO>
logger.addHandler(logging.StreamHandler())
# build driver with cache hook
dr = (
driver.Builder()
.with_modules(functions)
.with_adapters(h_diskcache.DiskCacheAdapter()) # <--- add it here
.build()
)
# use execute or materialize as usual
dr.execute(["C"])
# then run it again -- it will be cached
dr.execute(["C"])
# then go change some code -- you'll see only things that have changed will be recomputed..
dr.execute(["C"])
How does this differ from the CachingGraphAdapter? The existing CachingGraphAdapter requires you to tag functions, and you specify the format to serialize things into -tag(cache="parquet")
and you have to manage the cache state and drop it etc when code changes, or inputs.
Both ways to cache have their sweet spots, and we’d love feedback and we’re open to improving either of them!
• We also have updated the main hamilton docs to more clearly explain the basic constructs:
◦ functions & nodes
◦ driver
◦ visualization
◦ materialization
◦ function modifiers
◦ driver Builder object
Reminder
• 🎤 meet-up today - sign-up here.
◦ We’re excited to hear from @Arthur Andres, doing a deep dive on structuring projects, an overview of @subdag
and then chatting roadmap!Stefan Krawczyk
02/27/2024, 3:44 PMpip install sf-hamilton==1.51.1
!
What’s new?
• 📢 Announcing Office hours - roughly every Tuesday 9:30am PT, apart from when we have our meetup. We’ll throw a link in the #hamilton-help channel.
• 🔎 Vaex decorator support. Thanks @Konstantin Tyapochkin!
• 🔎 Hamilton CLI. Thanks @Thierry Jean!
◦ Along with an accompanying blog!
• 🪄 Jupyter magic for Hamilton now displays the DAG correctly in a databricks notebook!
• 🤝 Meet-up for March.
📢 Office hours!
We’re excited to have an hour a week anyone can drop. It’ll be roughly every Tuesday at 9:30am Pacific Time for about an hour, apart from when we’re holding our meet-up. To join, we’ll drop a google meet link in the #hamilton-help channel. It’s starting today!
For those in the community where this is in the middle of the night, reach out and we could do something ad-hoc for you.
🔎 Details - Vaex:
Vaex is another dataframe library. We’ve now got decorator support for it, so you can use it with @extract_columns. In addition we’ve added a basic vaex result builder.
Check out the respository example here.
🔎 Details - Hamilton CLI:
You’ll need to install the extra package
pip install sf-hamilton[cli]
Then verify it installed:
hamilton --help
Things you can do:
• “build” a module via the command line. e.g hamilton build module_v1.py
• “build and view a module” via the command line, e.g. hamilton view --output ./dag.png module_v1.py
• get the diff between the python module now, and some git reference (default is commit prior) — hamilton diff --view --output ./diff.png module_v1.py
Read more about it in the accompanying blog post, and docs. We see the CLI as a great tool to add to your CI step to help understand and see changes.
🪄 Jupyter magic for databricks notebooks
The recent ipython jupyter magic is now extended to display properly in a databricks notebook. Databricks notebooks didn’t natively display graphviz objects, so we had to adjust the code. So no change on your part to use, just use as usual, and now the graph will be displayed.
🤝 Meet-up for March
Sign up for the next meetup to be held on March 19th. @Roel Bertens will be giving a talk in the community spotlight corner about feature engineering.
For the deep dive section, we’re still taking suggestions. So if you’d like to know more about Hamilton, let us know.Stefan Krawczyk
02/27/2024, 5:30 PMStefan Krawczyk
03/05/2024, 5:31 PMStefan Krawczyk
03/12/2024, 3:42 PM1.53.0
and office hours in ~ 45 mins at 9:30am Pacific Time (meet.google.com/enx-bhus-fae).
What’s new
• Adds target_ parameter to save_to by @elijahbenizzy in #744
• cli
added config support and validate command by @zilto in #729
• Updates docstring of data adapters to be public facing by @elijahbenizzy in #752
• Adds FunctionInputOutputTypeChecker by @skrawcz in #757
Show casing an example of the Lifecycle APIs we released, there is now an adapter you can add that will at runtime validate the types of the inputs & output matches the expected annotated types on functions. To use it just you can do the following to add it to your driver:
from hamilton import base, driver, lifecycle
driver = (
driver.Builder()
.with_config({})
.with_modules(my_functions)
.with_adapters(
# this is a strict type checker for the input and output of each function.
lifecycle.FunctionInputOutputTypeChecker(),
)
.build()
)
Documentation updates:
• Adds Parallelism Caveats to documentation by @skrawcz in #745
• Adds more to parallel caveats by @skrawcz in #746
• docs/how-tos/
pre-commit by @zilto in #750
hub.dagworks.io and examples/
Updates:
• Examples: Example with pandas for split apply combine by @nhuray in #753 (see README)
• Adds document chunking example to hub by @skrawcz in #755
Blog:
• we have a new blog on using Hamilton for the ingestion part of RAG pipelines and then scaling that to Ray, Dask, and PySpark.
New contributor 🚀*:*
• @Nicolas Huray added the split-apply-combine example. Thank you! 🙏
Meet-up Next week
• Don’t forget to sign up for the meet-up next week
◦ @Roel Bertens will show feature engienering, while the deep-dive will be on parameterization/re-use of DAGs.
Full Changelog: sf-hamilton-1.52.0...sf-hamilton-1.53.0Elijah Ben Izzy
03/19/2024, 4:12 PMsf-hamilton==1.54.0
What’s new?
• Improvements to visualization
• new node versioning API — allows you to get node versions (stable hashes)
• New caching adapter — uses just stdlib (shelve
). Example:
from hamilton import driver
from hamilton.lifecycle.default import CacheAdapter
dr = (
driver
.Builder()
.with_modules(features, model, evaluation)
.with_adapters(
# now everything will be cached!
CacheAdapter()
)
.build()
)
Towards Data Science writeup on pre-commit hooks by @Thierry Jean!
• Uses Hamilton pre-commit hooks as an example
• https://towardsdatascience.com/custom-pre-commit-hooks-for-safer-code-changes-d8b8aa1b2ebb
Excited to see you all at 9:30 PT!Stefan Krawczyk
03/28/2024, 6:01 PM1.55.1
.
This includes a fix for graph.version which is used in the experiment tracker adapter.Thierry Jean
04/23/2024, 2:17 PMStefan Krawczyk
04/30/2024, 4:58 PMfix
closed CacheAdapter
by @zilto in #847
• Changes jupyter magic to create temporary files by @skrawcz in #855
📚 Documentation / Examples:
• Update documentation to resolve a small typo in glossary by @bustosalex1 in #857
• Update link in glossary to use reST formatting by @bustosalex1 in #858
----------------------------
Reminder: Office hours
----------------------------
• Tuesday, April 30th · 9:30 – 10:30am; Time zone: America/Los_Angeles
• Come ask questions/get help/etc.
• Use this link to join.
--------------
Blog Posts
--------------
If you want to know more about our motivations and features of the UI we direct you to this blog post.Stefan Krawczyk
05/01/2024, 8:43 PMStefan Krawczyk
05/02/2024, 2:49 PMStefan Krawczyk
05/07/2024, 4:40 PM%%incr_cell_to_module doc_pipeline -i 1 --display
# module name, identifier to index this cell by, arguments for display
• No more dot files by default when creating images.
◦ When creating static images with Hamilton, dot files are now not created. If you’d like to have them, set. keep_dot=True
◦ If you were one of the few using the dot files, apologies, but feedback was overwhelmingly that no body wanted them by default.
📚 Documentation / Examples:
• We added an example that guides a user through a simple document processing pipeline for a RAG system.
• We added an example that uses the Hamilton SDK to integrate with the new Hamilton UI.
◦ for an overview of this example.
----------------------------
Reminder: Office hours
----------------------------
• Tuesday, May 7th · 9:30 – 10:30am; Time zone: America/Los_Angeles
• Come ask questions/get help/etc.
• Use this link to join <--- this is happening now.
-------------------------------------------
Random: we have a #ask-ai channel
-------------------------------------------
If you’d like to ask questions to a bot about Hamilton try the #ask-ai channel.
We’re using it to learn if there’s a bot experience to make sense. Note: part of it is powered by Hamilton 😉Thierry Jean
05/09/2024, 8:25 PMElijah Ben Izzy
05/13/2024, 11:09 PMHAMILTON_AUTH_MODE=permissive
as a variable!Stefan Krawczyk
05/15/2024, 2:00 PMmodule_name
param to module_from_source()
by @zilto in #894
◦ You can now create temporary modules with a specific name. We use this in the jupyter magic.
🚀 Hamilton UI & SDK (0.4.0) Features 🍾 :
• New docker containers have been pushed for the Hamilton UI! What’s in it:
• Updates JSON view to be cleaner + ensures table index column has fixe… by @elijahbenizzy in #901
◦ When viewing the data output that’s capture the JSON viewer is nicer.
• Hamilton restructure on FE code + removes enterprise from main repository by @elijahbenizzy in #900
◦ We removed the EE features into it’s own repository to keep this repository purely BSD-3 to not cause any confusion.
• Add namedtuple, pyspark, ibis, lc to SDK coverage by @skrawcz in #895
◦ We now capture more information. If you’re using Ibis or Pyspark for example, schema is captured as Hamilton code is run. We’d love your feedback here in how we could expose this information more.
🐛 Fixes:
• sf-hamilton: added missing config from legend by @zilto in #899
◦ fixes up static display to show config in the legend.
📚 Documentation / Examples:
• Add RAG doc example by @skrawcz in #891
◦ want to know how to create a simple document processing pipeline? This contains a notebook that helps you walk through building one with the goal of feeding this into a RAG system.
• examples/
and docs/code-comparison
Kedro by @zilto in #896
◦ we added a comparison on Kedro.
----------------------------
Reminder: Office hours
----------------------------
• They’re on most Tuesdays at 9:30am PT.
--------------------------------
Reminder: Meet-up Group
--------------------------------
• We have June & August scheduled.
◦ If you’d like to speak let me know. We’d love to showcase what everyone is building with Hamilton.
• Sign up here
-----------------------------
Blog Post & Recording
------------------------------
In case you missed , we’ve got a short write up of it on the blog.
We think this is a great way to introduce some of the motivation and where Hamilton shines for data work.
If you’re trying to convert colleagues or others to try Hamilton, send them the blog/video.Stefan Krawczyk
05/22/2024, 7:06 PM.execute()
.
• Adds HuggingFace DataSet Data Loader by @skrawcz in #912
# using the decorator syntax:
@load_from.hf_dataset(
path=value("fabiochiu/medium-articles"),
data_files=value("medium_articles.csv"),
split=value("train"),
)
def medium_articles(dataset: Dataset) -> Dataset:
"""Loads medium dataset into a hugging face dataset"""
return dataset
# passing it in at driver time
from_.hf_dataset(
target="NAME",
path="fabiochiu/medium-articles",
data_files="medium_articles.csv",
split="train"
)
• Adds HuggingFace DataSet Parquet Data Saver by @skrawcz in #912
◦ if your function outputs a HuggingFace Dataset, you can now write it easily to parquet.
# passing it in at driver time
to.parquet(id="parquet_saver", path=data_path) ...
• Adds HuggingFace DataSet to LanceDB Data Saver by @skrawcz in #912
# passing it in at driver time
to.lancedb(
id="load_into_lancedb",
db_client=...,
table_name=...,
columns_to_write=...
)
# using the decorator syntax --
@save_to.lancedb(
db_client=source("db_client"),
table_name=source("table_name"),
columns_to_write=source("columns_of_interest"),
output_name_="load_into_lancedb",
)
def final_dataset(
sampled_articles: Dataset,
retriever: SentenceTransformer,
ner_pipeline: base.Pipeline,
) -> Dataset:
...
🐛 Fixes:
• sf-hamilton: added missing config from legend by @zilto in #899
• hamilton-ui: fixes datetimes to be generous on either side for default run view by @elijahbenizzy in #909
📚 Documentation / Examples:
• Updated materializer documentation by @Thierry Jean
◦ it’s easier to see all the ways you can read/write data with Hamilton now.
• Adds LanceDB NER example using HuggingFace data & models @skrawcz in #912
◦ Repository example; open the notebook in google collab.
◦ This complements the below blog post.
----------------------------
Reminder: Office hours
----------------------------
• They’re on most Tuesdays at 9:30am PT.
--------------------------------
Reminder: Meet-up Group
--------------------------------
• We have June & August scheduled.
◦ If you’d like to speak let me know. We’d love to showcase what everyone is building with Hamilton.
• Sign up here
-----------------------------
Blog Post
------------------------------
Title: NER-powered Semantic Search using LanceDB + Hamilton + HuggingFace
What is it? A blueprint for building your own modular, maintainable, self-documenting, processing pipeline to extract entities for use in search and RAG contexts.
This shows how you can use Hamilton to build a pipeline loading data into lancedb, and also query over it.
https://blog.dagworks.io/p/ner-powered-semantic-search-usingStefan Krawczyk
06/05/2024, 6:25 PM# ---- my_module.py
# custom exception
class DoNotProceed(Exception):
pass
def wont_proceed() -> int:
raise DoNotProceed()
def will_proceed() -> int:
return 1
def never_reached(wont_proceed: int) -> int:
return 1 # this should not be reached
# ---- driver code
import my_module
from hamilton import driver
dr = (
driver.Builder()
.with_modules(my_module)
.with_adapters(
default.GracefulErrorAdapter(
error_to_catch=my_module.DoNotProceed,
sentinel_value=None
)
)
.build()
)
dr.execute(["will_proceed", "never_reached"]) # will return {'will_proceed': 1, 'never_reached': None}
Hamilton UI Update:
• When creating “an account” we made it clear it doesn’t need to be an email if you’re running it locally without any authentication.
🐛 Fixes:
• N/A this week
📚 Documentation / Examples:
• @Thierry Jean recorded a . If you’re looking to learn about the new Jupyter Magic updates, then this is a good video for you to watch!
----------------------------
Reminder: Office hours
----------------------------
• They’re on most Tuesdays at 9:30am PT.
--------------------------------
Reminder: Meet-up Group
--------------------------------
• We have June & August scheduled.
◦ If you’d like to speak let me know. We’d love to showcase what everyone is building with Hamilton.
◦ We’ll have June’s topic decided soon. We’re thinking show how Hamilton fits in with building a pipeline to ingest documents for retrieval augmented generation.
• Sign up here
There will be a blog post this week so will post an update here when it’s published!Stefan Krawczyk
06/13/2024, 6:10 PMfrom hamilton import driver
from hamilton.io.materialization import to
from hamilton.plugins.h_mlflow import MLFlowTracker
dr = (
driver.Builder()
.with_modules(model_training_2)
.with_adapters(MLFlowTracker()) # <-------- add this and it'll auto log a lot to MLFlow
.with_materializers(
to.mlflow(
id="trained_model__mlflow",
dependencies=["trained_model"],
register_as="my_new_model",
),
)
.build()
)
For more details see and this tutorial notebook.
🚀 Hamilton UI Update:
• Before you needed to have Docker installed to run the UI. Now you don’t!
◦ pip install "sf-hamilton[ui]"
◦ Then hamilton ui
to start it.
◦ This should enable you to quickly and easily explore your Hamilton DAGs — just add the adapter to your driver (follow the instructions in the UI) and then it’ll log to it; you don’t need to execute it to be able to see it in the UI.
🐛 Fixes:
• SDK: now how better guards around JSON serializable inputs.
• Hamilton: fix for parallelizable 🐛 . Thanks to @Volker Lorrmann for raising.
• Inputs can now be outputs, without them being defined in the DAG. Thanks to team at RTVEuroAGD for raising. E.g. this is useful if you want to pass in extra columns that you want to add to the output in the case of creating a pandas dataframe for example.
📚 Documentation / Examples:
• MLFlow tracker & data saver/loader example
• We’ve added links to running notebooks in google colab where it makes sense.
----------------------------
Reminder: Office hours
----------------------------
• They’re on most Tuesdays at 9:30am PT, except next week.
--------------------------------
Reminder: Meet-up Group
--------------------------------
• It’s next week! For next week we’ll go over:
◦ New functionality:
▪︎ Kedro Adapter
▪︎ MLFlow Tracker
▪︎ Locally running the Hamilton UI
◦ The deeper dive will be an introduction on “How to use Hamilton in a RAG context”, e.g. for document ingestion
• Sign up here
--------------------------------
Blog Post
--------------------------------
Title: Lean Data Automation: A Principal Components Approach
What is it about?
This week’s blog post was a collaboration with Runhouse. In it we describe how github actions + Hamilton + something like Runhouse, can get you very far in terms of data / ETL / ML work; no need to reach for a heavy weight orchestrator if you don’t need to.Stefan Krawczyk
06/20/2024, 7:34 PMfrom hamilton import driver
from hamilton.plugins import h_schema
# add to driver
validator_adapter = h_schema.SchemaValidator("./schemas")
dr = (
driver.Builder()
.with_modules(*my_modules)
.with_adapters(validator_adapter) # <--- add it here
.build()
)
# execute as you normally would
res = dr.execute(["OUTPUT_1", "OUTPUT_2"], inputs=...)
# print & extract the schemas captured
print(json.dumps(validator_adapter.json_schemas, indent=2))
See this example for a demonstration.
🚀 Hamilton UI & SDK Update:
• Thanks to @Seth Stokes for flagging a few bugs and issues. We have squashed them; if you find more, let us know.
🐛 Fixes:
• SDK: Fixes JSON serializability edge case for dataframe inputs.
• UI: fixes null run IDs
• Hamilton: check_output_custom validator can now take multiple validators of the same class.
• Hamilton: fixes link to privacy/telemetry that was printed out.
📚 Documentation / Examples:
• Added schema validator example.
----------------------------
Reminder: Office hours
----------------------------
• They’re on most Tuesdays at 9:30am PT.
--------------------------------
Reminder: Meet-up Group
--------------------------------
If you missed our meet-up this past week, you can catch the & slides.
Watch the to see:
◦ Some coverage / overview of new functionality:
▪︎ Kedro Adapter
▪︎ MLFlow Tracker
▪︎ GracefulError Adapter
▪︎ Locally running the Hamilton UI
◦ The deeper dive was an introduction on “How to use Hamilton in a RAG context”, e.g. for document ingestion
• Sign up here for August's meet-up
--------------------------------
Blog Post
--------------------------------
Title: Building a conversational GraphDB RAG agent with Hamilton, Burr, and FalkorDB
What is it about?
If you don’t know what a Graph DB is, this post is a light introduction to them. More specifically it shows how you could build an end to end retrieval augmented generation system using one. This is topical, since Vector DBs aren’t going to solve all your #RAG problems.
This post was done in collaboration with FalkorDB , specifically thanks to Roi Lipman for helping set up the example.
Even though the post is just an introduction. The path to production here is actually pretty straightforward. Hamilton & Burr give you the flexibility and the hooks so you can customize this for your production needs, unlike the headaches of other alternatives 😉.Stefan Krawczyk
06/28/2024, 12:06 AMfrom hamilton import async_driver
from hamilton_sdk import adapters
# initialize async tracker
tracker = adapters.AsyncHamiltonTracker(
project_id=9,
username="<mailto:elijah@dagworks.io|elijah@dagworks.io>",
dag_name="async_dag",
)
# use builder to construct driver
adr = (
await async_driver.Builder()
.with_config(config)
.with_modules(async_module)
.with_adapters(async_tracker) # add it here
.build()
)
# execute like normal
result = await adr.execute(outputs, inputs=...)
To get the async tracker you’ll need to do pip install --upgrade sf-hamilton-sdk
🚀 Hamilton UI Update: pip install --upgrade sf-hamilton-ui
• We’ve simplified navigation
• Added a few more function/node result summary renderings. Specifically the left hand side is simplified and renamed. to get the update.
◦ e.g. We now track more / render scikit learn related objects.
📚 🐛 Documentation / Examples / Fixes:
• Thanks to Alexander Cai , Nils Müller-Wendt , and Paul Larsen for helping fix up various typos.
----------------------------
Reminder: Office hours
----------------------------
• They’re on most Tuesdays at 9:30am PT.
--------------------------------
Reminder: Meet-up Group
--------------------------------
• Sign up here for August's meet-up
--------------------------------
Blog Post
--------------------------------
Title: Tracking Pipelines with MLFLow & Hamilton
What is it about?
This follows on from our announcement during Databricks Summit week, that Hamilton has an MLFlow integration. In this post we explain the two technologies, and then how Hamilton can help simplify your MLFlow integration pains!
Why did we build this? As a platform person, one of the most common forms of technical debt was always with “platform integrations”. For example, to run your code you always need that platform tool, e.g. mlflow. Now when you need to change or update that, you then need to find the 1000s of places with that integration to be changed. This is painful. With Hamilton and the new MLFlow adapter, that is no longer the case!
How does it work? With our integration, MLFlow is now a “layer”, or in Hamilton speak an adapter, that can be added and removed easily to a Hamilton Driver. This means that logic, e.g. your pipeline, is agnostic to the existence of MLFlow. This means it’s reusable and change able without having to touch MLFlow. While the MLFlow coupling is in a central place / one place per pipeline. This now means that if you wanted to switch out systems, or change something that impacts everyone, there is an order of magnitude less work that needs to happen, and in the case of Hamilton just change one-line of code.Stefan Krawczyk
07/02/2024, 1:30 PMsf-hamilton==1.69.0
• sf-hamilton-sdk==0.5.1
• sf-hamilton-ui==0.0.9
• sf-hamilton-lsp==0.1.0
---------------------------------------------------------------
TL;DR - some highlights:
• We have our VSCode extension in Alpha — we’d love your feedback.
• We’ve added a Narwhals extension
• Hamilton UI in local mode now is more configurable
• Polars 1.0 fixes
⭐*️ Hamilton’s VSCode Extension Alpha*
@Thierry Jean has been hard at work building out a VSCode plugin for Hamilton.
Current features:
• Dataflow visualization via Graphviz of your current module in view.
• Code completion suggestions when creating new functions
• Symbol navigation
We’re excited for what it can do, but we’ve still got plenty more left to build. Here’s two notable current limitations:
• Doesn’t cover all decorators; i.e. the visualization doesn’t handle @config.when
just yet.
• No click to definition ability.
See screenshot for an example of what it looks like. Read the documentation here.
Ask: if you use VSCode please try it out. We’d love the feedback.
🎉 sf-hamilton 1.69.0 New features 🍾*!*
• Narwhals now has a lifecycle adapter & result builder.
◦ Narwhals is a library trying to abstract how to express dataframe transformations agnostic of dataframe library.
◦ We now have an integration with it - see example:
from hamilton.plugins import h_narwhals, h_polars
# polars
dr = (
driver.Builder()
.with_config({"load": "polars"})
.with_modules(example) # <-- your logic written in the narwhals way
.with_adapters(
h_narwhals.NarwhalsAdapter(), # <--- add these two lines
h_narwhals.NarwhalsDataFrameResultBuilder(h_polars.PolarsDataFrameResult()),
)
.build()
)
result = dr.execute([example.group_by_mean, example.example1], inputs={"col_name": "a"})
Polars 1.0:
• Polars 1.0 was just released. We have updated Hamilton to be compatible with pre & post 1.0 Polars. If you find something broken, please create an issue. Otherwise our testing going forward will only support Polars 1.0.
🚀 Hamilton UI Update: pip install --upgrade sf-hamilton-ui
• The local mode version now allows you to pass in a hostname. So you can run it easily behind some domain name.
◦ To use it: HAMILTON_ALLOWED_HOSTS=MY_HOST_NAME hamilton ui
🐛 Fixes:
• Hamilton: data loaders now display the same as data savers by @zilto in #991
• Hamilton: .with_materializers()
added helpful error message by @zilto in #986
• Hamilton: Adds extra guards for plugin imports by @skrawcz in #994
• SDK: Gates checking git commit in SDK by @skrawcz in #1000
📚 Documentation / Examples / Fixes:
• Adds notebook to pandas-split-apply example by @skrawcz in #995
• VSCode docs
----------------------------
Reminder: Office hours
----------------------------
• They’re on most Tuesdays at 9:30am PT.
• Join today’s in a couple of hours at meet.google.com/enx-bhus-fae
--------------------------------
Reminder: Meet-up Group
--------------------------------
• Sign up here for August's meet-upElijah Ben Izzy
07/09/2024, 6:40 PMsf-hamilton==1.70.0
• sf-hamilton-sdk==0.5.3
• sf-hamilton-ui==0.0.11
---------------------------------------------------------
TL;DR - improvements!
• ✨ Added an adapter in the spark plugin to allow it to treat spark classic and spark-connect dataframes the same (this is particularly important on top of databricks)
• 👯 Upgraded the Graceful Error adapter to handle parallelism (thanks James Arruda!)
• 👓 Improvements to the async tracker for Hamilton
• 🐛 Fixes for the group/inject pattern (thanks to @Alexander Cai!)
• 💻 Multiple improvements/fixes for the local mode in the UI (both the UI + deployment mode)
⭐ Graceful Failure
This was introduced earlier, but we’ve significantly improved it recently (all a user contribution, thanks James!). The high-level is that you can use it to not fail the DAG if an upstream node fails, and instead bypass all downstream nodes. Here’s a simple example — you define an error to catch (so you don’t catch everything), as well as a sentinel value that will get cascaded through. It will continue as normal, but if it detects that an upstream node has failed, it will fail in itself.
dr = (
driver.Builder()
.with_modules(my_module)
.with_adapters(
default.GracefulErrorAdapter(
error_to_catch=DoNotProceed,
sentinel_value=None
)
)
.build()
)
dr.execute(["will_proceed", "never_reached"]) # will return {'will_proceed': value, 'never_reached': None}
It now works with the `Parallel[]`/`Collect[…]` constructs, and has a few different toggles. Read the docs here. It also has a new decorator @accept_error_sentinels
that allows you to pass in sentinel value and handle error in your own way.
⏩ Hamilton Async
We’ve been investing a lot in Hamilton’s async integration. You can read about how our friends at wren.ai leverage it to scale up to 1500+ concurrent requests!
• Their original writeup: https://blog.getwren.ai/how-do-we-rewrite-wren-ai-llm-service-to-support-1500-concurrent-users-online-9ba5c121afc3
• Our writeup on hamilton + async: https://blog.dagworks.io/p/async-dataflows-in-hamilton
----------------------------
Reminder: Office hours
----------------------------
• They’re on most Tuesdays at 9:30am PT.
• Join today’s in a couple of hours at meet.google.com/enx-bhus-fae
--------------------------------
Reminder: Meet-up Group
--------------------------------
• Sign up here for August's meet-upStefan Krawczyk
07/23/2024, 1:30 PMsf-hamilton==1.72.1
• sf-hamilton-ui==0.0.14
---------------------------------------------------------------
TL;DR - highlights:
• New lightweight way to expose metadata from saving & loading data by introducing two new decorators.
🎉 sf-hamilton 1.72.0 New features 🍾*!*
• Feature: @datasaver
& @dataloader
by @skrawcz in #983
With regular data savers/loaders (i.e. materializers) that code is abstracted away from your Hamilton dataflow. However if you’re not at the stage where you need to centralize, that is a little too much engineering. With this release we introduce two new decorators: @datasaver()
and @dataloader()
. This enables one to have the saving and loading code in their DAG and also expose extra metadata for it so that it display in the viz, and can be captured by the HamiltonTracker for display in the UI.
Example code:
import pandas as pd
from sklearn import datasets
from hamilton.function_modifiers import dataloader, datasaver
from <http://hamilton.io|hamilton.io> import utils as io_utils
@dataloader() # <---
def raw_data() -> tuple[pd.DataFrame, dict]:
data = datasets.load_digits()
df = pd.DataFrame(data.data, columns=[f"feature_{i}" for i in range(data.data.shape[1])])
metadata = io_utils.get_dataframe_metadata(df)
return df, metadata
def transformed_data(raw_data: pd.DataFrame) -> pd.DataFrame:
"""We can depend on the dataframe portion of raw_data here without issue"""
return raw_data
@datasaver() # <---
def saved_data(transformed_data: pd.DataFrame, filepath: str) -> dict:
transformed_data.to_csv(filepath)
metadata = io_utils.get_file_and_dataframe_metadata(filepath, transformed_data)
return metadata
The decorators assume & enforce types and ensure that downstream functions of loading (transformed_data()
above) only need to depend on the first argument’s type, not the whole tuple.
🚀 Hamilton UI Update: pip install --upgrade sf-hamilton-ui
• We fixed a small regression in the UI from the last release - some navigation links were broken.
🐛 Fixes:
• Removes print statements in pickle serializer - thanks to @kemaleren in #1049; their first contribution 🎆 .
📚 Documentation / Examples :
• Adds Hamilton MPG Translation Tutorial by @skrawcz in #1050
----------------------------
Reminder: Office hours
----------------------------
• They’re on most Tuesdays at 9:30am PT. There’s one today in two hours.
--------------------------------
Reminder: Meet-up Group
--------------------------------
• Sign up here for August's meet-up