This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

09/12/2023, 2:00 AM

This message was deleted.

Elijah Ben Izzy

09/12/2023, 2:31 AM

Welcome! I actually think that’s quite a clever solution TBH. It depends a bit on how you’re doing it — E.G. how many map operations there are and how many filter operations you have. If you have a few filter operations on a few specific features, you can filter each one of them, join to a dataframe, then

extract_columns

on that. Then you can conduct the rest of the map operation. That said, I’d actually continue doing it simliarly to how you’re doing it — it makes a lot of sense to keep the operations having the same index, especially if you have the filter on, say, some sentinal value. In that case, you could: 1. Write a custom results_builder to join them and filter, handling the index specially 2. Actually do that within a node (write a dataframe join that does the filtering) You could pass along the filter values, or just utilize the index to do the join. Makes sense?

👍 1

Stefan Krawczyk

09/12/2023, 2:33 AM

One clarifying question, does everything fit in memory? Or is that a concern?

Michael Chmutov

09/12/2023, 2:48 AM

Thank you both for prompt responses! The data does fit into memory - no complications expected there. I think custom

results_builder

is exactly what I was missing - thank you for the pointer!

👍 1

Stefan Krawczyk

09/12/2023, 2:52 AM

Cool. I would recommend pairing the custom result builder + new materializer functionality. Will link once at a keyboard, but check out the materializer example in the examples folder.

Stefan Krawczyk

09/12/2023, 4:23 AM

Here’s the example — https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/materialization ; this will allow you to create one driver, and then switch out/add how the final dataframe is “materialized” more easily and flexibly.

👍 1

2 Views

Open in Slack

Previous Next