This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

01/03/2024, 10:33 AM

This message was deleted.

Elijah Ben Izzy

01/03/2024, 3:18 PM

Yes! This is a feature that we just released (as a supported API). We have a blog post in flight that’ll document it well, but the tool is called the lifecycle API. Still improving the docs, but you can: (1) implement the class referenced here https://hamilton.dagworks.io/en/latest/concepts/customizing-execution/#execution-hooks (2) example is here in the PrintLnHook https://github.com/DAGWorks-Inc/hamilton/blob/main/hamilton/lifecycle/default.py (3) pass it into the driver as part of a list in the

adapter=

or use

with_adapters

and pass it in as *args if you’re using the new driver builder Should take 5 minutes although the blog post will make it much simpler just made the API public facing. If you’re having trouble I’m happy to share the draft of the post — it walks you through this.

Arthur Andres

01/03/2024, 3:48 PM

ah thanks, let me give it a try. I need to move to the driver builder API as well, it’s much better.

👍 1

Elijah Ben Izzy

01/03/2024, 3:49 PM

Yep! To be clear that’s not necessary for this but you absolutely should, it’s cleaner. Lmk if you haves any problems with the API for hooks, it’s still pretty new but well-tested and quite powerful.

Arthur Andres

01/03/2024, 6:40 PM

How do I pass the execution hook to the driver? I’m already using a custom-ish adapter:

.with_adapter(SimplePythonGraphAdapter(DictResult()))

Arthur Andres

01/03/2024, 6:42 PM

oh got it, just pass a list of adapters / hooks

Elijah Ben Izzy

01/03/2024, 7:11 PM

Yep! You shouldn’t need the simple graph adapter btw — it’s the default one. And you’ll want to use with_adapters (pluralized), it’s a bit more ergonomic

Arthur Andres

01/03/2024, 9:50 PM

Thanks for your help, I’ve managed to do what I wanted. I can log the shape and memory usage of arrow tables

Copy code

import logging
from typing import Any, Dict, Optional

import pyarrow as pa
import humanize
from hamilton.lifecycle import NodeExecutionHook

logger = logging.getLogger(__name__)

class LogTableStatsNodeExecutionHook(NodeExecutionHook):
    def run_before_node_execution(
        self,
        *,
        node_name: str,
        node_tags: Dict[str, Any],
        node_kwargs: Dict[str, Any],
        node_return_type: type,
        task_id: Optional[str],
        **future_kwargs: Any,
    ):
        pass

    def run_after_node_execution(
        self,
        *,
        node_name: str,
        node_tags: Dict[str, Any],
        node_kwargs: Dict[str, Any],
        node_return_type: type,
        result: Any,
        error: Optional[Exception],
        success: bool,
        task_id: Optional[str],
        **future_kwargs: Any,
    ):
        if isinstance(result, pa.Table):
            <http://logger.info|logger.info>(
                "Table %d: {result.num_rows:_d} * {result.num_columns}"
                f" = {humanize.naturalsize(result.nbytes)}",
            )

🙌 2

🔥 1

Elijah Ben Izzy

01/03/2024, 10:27 PM

Love it! We can easily add this in to Hamilton as a part of a pyarrow plugin if you think others wil use it. quick tip: You don’t have to include any params you don’t need — **future_kwargs will handle that

Stefan Krawczyk

01/16/2024, 12:29 AM

@Arthur Andres a pyarrow plugin would be great - it would end up here https://hamilton.dagworks.io/en/latest/reference/lifecycle-hooks/#available-adapters It also seems like we should add support for pyarrow in general.

Arthur Andres

01/16/2024, 8:29 AM

sure, though I’m not sure exactly what the plugin would do. At the moment I’m happy to use hamilton in plain mode. Each node outputs a

pyarrow.Table

that I then join together myself.

Arthur Andres

01/16/2024, 3:11 PM

BTW we’ve put hamilton in our production pipeline. Thanks for your help. 🙏

Elijah Ben Izzy

01/16/2024, 4:21 PM

That’s awesome! Glad to hear it made it in — thanks for your feedback/thoughts along the way! A plugin is pretty general, although we do have specific constructs. Some ideas (can share what code they would involve writing in a bit): 1. We have specific types for dataframes/series — we have a way of registering it so

extract_columns

works. I think this is a little less relevant to pyarrow (most people use tables), but might be worth exploring 2. Moving your pyarrow hooks into something like

plugins.h_pyarrow

3. Adding a result-builder to do the joining if its common logic Basically any bespoke piece that you did we might be able to fit into a plugin

Arthur Andres

01/16/2024, 5:26 PM

Beside the adapter that logs table size stats, I haven’t written anything arrow related in the hamilton framework. We do have some internal tooling that we use for pyarrow. For example we have a decorator to enforce schemas of output column (I believe you have something similar). But it is very opinionated. eg: what do you do with missing column, extra columns, columns that you could cast, do you fill non-nullable null with empty etc. And it is independent from hamilton. Though maybe it could be interesting. Like hamilton could leverage it to document the schema of the output. I’m not sure where you want to draw the line between the custom business logic of users . I guess maybe you’d want to reimplement what you’ve done for pandas, but for pyarrow?

Stefan Krawczyk

01/16/2024, 5:37 PM

I think those all sounds like reasonable features to have!

I guess maybe you’d want to reimplement what you’ve done for pandas, but for pyarrow?

yep supporting all “table” like data types is something we should have.

Open in Slack

Previous Next