Hello I wondered whether it is possible to exclude functions Hamilton Open Source #hamilton-help

Hello. I wondered whether it is possible to exclud...

Alex Pavlides

04/18/2024, 9:02 AM

Hello. I wondered whether it is possible to exclude functions from using the RAY backend. I had a single pipeline with elastic queries and transformation code. I had to separate out the elasticsearch queries because it didn't work correctly with RAY. Now I have two services: elastic running with default driver and the transformation code running with the RAY driver. I believe Hamilton works on the backend by adding a RAY decorator to each function. Is is possible to exclude functions from adding this decorator? This is more for future reference.

Elijah Ben Izzy

04/18/2024, 1:07 PM

Hey! So currently this is the case (that they’re all running on ray), however, it should be an easy(ish) fix. First, do you mind sharing how you’re calling your code? I assume you’re using the Ray Graph Adapter?

Elijah Ben Izzy

04/18/2024, 1:28 PM

At a high level, if you’re using the ``RayGraphAdapter` we could easily add a flag

only_use_ray_if_decorated

(with a better name, then change this line to respect it, running locally if we haven’t gotten anything from the ray decorator: https://github.com/DAGWorks-Inc/hamilton/blob/d89b03e059143eb9581c50b624265185848ca782/hamilton/plugins/h_ray.py#L122. It feels like a nice extension. We also have the task-based orchestration which can help group/assign different ones, but its a bit more complex.

Alex Pavlides

04/18/2024, 1:36 PM

Hi Elijah, this is the relevant part of my run script:

Copy code

@main.command()
def run():

    if config["ray_backend"]:
        output_type = h_ray.RayGraphAdapter(result_builder=base.PandasDataFrameResult())
    else:
        output_type = base.SimplePythonDataFrameGraphAdapter()

    logger_hook = lifecycle.default.PrintLn(print_fn=<http://logger.info|logger.info>)

    dr = (
        driver.Builder()
        .with_modules(
            pipe_load_data,
            pipe_prep_data,
            pipe_entity_features,
            pipe_add_features,
            pipe_risk_metric,
        )
        .with_config(config)
        .with_adapters(logger_hook, output_type)
        .build()
    )

Alex Pavlides

04/18/2024, 1:37 PM

What you suggest sounds great.

Alex Pavlides

04/18/2024, 1:37 PM

I don't think I have looked at "task-based orchestration", will take a look at docs

Elijah Ben Izzy

04/18/2024, 1:38 PM

Got it! OK, so yeah, this should be an easy change. If you’re interested in contributing we’d love contributions, but we can also take a stab at some point soon. FWIW I think this is a common pattern, and extends nicely what @Fran Boon did. For task-based orchestration its much more powerful but less well documented TBH.

Elijah Ben Izzy

04/18/2024, 1:39 PM

Task-based uses the following: https://hamilton.dagworks.io/en/latest/concepts/parallel-task/. But it doesn’t need a dynamic # of tasks. I think that the ray one is good unless you really want something more powerful TBH (or need a dynamic set of tasks/nodes).

Alex Pavlides

04/18/2024, 1:45 PM

Thanks Elijah, I will definitely take a look at task-based docs soon. Id like to contribute if I find some time. I am currently deep into some integration testing but will return to this when I have some headspace.

Elijah Ben Izzy

04/18/2024, 1:46 PM

Great! Just reach out when you need it and we can help you out.

🙌 1

Open in Slack

Previous Next