Hi folks I m new to Hamilton and recently started using it i Hamilton Open Source #hamilton-help

Hi folks, I'm new to Hamilton and recently started...

Dhruv Sahi

03/26/2024, 11:43 AM

Hi folks, I'm new to Hamilton and recently started using it in a project. I'm running into problems with my project when I group my functions into a class. I've looked around the documentation and searched around here and have been unable to come up with an answer on how to include class instance methods as part of hamilton's function discovery. Would love any help if possible! Thank you. Here's a simple example: My main.py looks like this:

Copy code

# main.py
from typing import Optional, List
import polars as pl
from hamilton import driver, log_setup
import logging

from stages.raw.load_raw_data import RawDataStage
from stages.intermediate import transaction_clean, jurisdiction_clean
from config import get_config, PipelineConfig

class Pipeline:
    def __init__(self, _cfg: PipelineConfig) -> None:
        self._cfg = _cfg
        self.raw_data_stage = RawDataStage(_cfg)
        self.dr = driver.Driver(
            {},
            self.raw_data_stage,
            adapter=base.SimplePythonGraphAdapter(base.DictResult)
        )
        

    def run(self) -> None:
        result = self.dr.execute(final=vars=[self.raw_data_stage])

if __name__ == "__main__":
    _cfg = get_config()
    pipeline = Pipeline(_cfg=_cfg)
    pipeline.run()

My load_raw_data.py looks like this:

Copy code

from typing import List
import polars as pl
from hamilton.function_modifiers import tag
import logging

from config import PipelineConfig

logger = logging.getLogger(__name__)

INPUT_TABLES = [
    "table_1",
    "table_2",
]

class RawDataStage:
    
    def __init__(self, config: PipelineConfig) -> None:
        self._cfg = config
    
    @staticmethod
    def load_parquet(paths: List[str]) -> pl.LazyFrame:
        return pl.scan_parquet(paths)

    def run(self):
        data = {}
        for table in INPUT_TABLES:
            file_paths = self._cfg.file_paths
            data[table] = getattr(self, f"_read_{table}")(paths=file_paths)
        return data
    
    @tag(stage="load", input_type="table_1")
    def table_1(self, paths=List[str]) -> pl.LazyFrame:

        return self.load_parquet(paths=paths)

    @tag(stage="load", input_type="table_2")
    def table_2(self, paths=List[str]) -> pl.LazyFrame:
        return self.load_parquet(paths=paths)

Elijah Ben Izzy

03/26/2024, 12:54 PM

Welcome! So Hamilton does not (at the moment) allow for functions grouped into classes. They have to be grouped into modules. As a hint, your parameters will also want to be fully type-hinted (you’ll want

paths: List[str]

rather than

paths=List[str]

) For reasoning, classes have two purposes: 1. They hold state (E.G.

_cfg

above) 2. They group functions Hamilton DAGs are stateless, meaning that everything in the function is a parameter-level input. This encourages functions to be clearer to read/track — we know everything it takes in and thus don’t have to worry about both the way it was instantiated and the way it was called. I’m not sure what the

.run

step does (and where the

_read_table

function that is dynamically referred to, but your pipeline should look something like this (not fully tested)

Copy code

# my_module
def _load_parquet(paths: List[str]) -> pl.LazyFrame:
    return pl.scan_parquet(paths)

@tag(stage="load", input_type="table_1")
def table_1(table_1_paths: List[str]) -> p.LazyFrame:
    return _load_parquet(paths)

@tag(stage="load", input_type="table_2")
def table_2(table_2_paths: List[str]) -> p.LazyFrame:
    return _load_parquet(paths)

...
dr = driver.Builder().with_modules(my_module).build()
results = dr.execute(['table_1', 'table_2'], inputs={'paths_table_1' : ..., 'paths_table_2' : ...})

👍 1

Dhruv Sahi

03/26/2024, 1:02 PM

Thanks for the answer@Elijah Ben Izzy, that was really useful! I'll rework my code outside of classes. Does the Hamilton roadmap include adding this functionality?

Elijah Ben Izzy

03/26/2024, 1:05 PM

It’s a possibility but as of now, unlikely. There’s a direct equivalence between what you can do with classes and functions, and classes hide state and would be a second way to do things (we try to limit possible interfaces so there’s one way). That said, it’s not an uncommon request, so we’ll keep an open mind!

👍 1

Stefan Krawczyk

03/26/2024, 4:23 PM

@Dhruv Sahi just to add, we find that organizing functions into modules are (a) simpler than classes, (b) ensure that code is more decoupled from the context it is used, which then (c) leads to faster iterations/simpler maintenance cycles. But if you find otherwise we’d love the feedback.

Dhruv Sahi

03/27/2024, 11:14 AM

Understood Stefan! I think there may be a case but before I'm sure, I'd like to get more familiar with Hamilton's approach. Thanks for your response 🙂

Open in Slack

Previous Next