Jesuseyi Fasuyi
05/29/2024, 3:47 AMElijah Ben Izzy
05/29/2024, 4:49 AMStefan Krawczyk
05/29/2024, 5:27 AMJesuseyi Fasuyi
05/29/2024, 1:12 PMJesuseyi Fasuyi
05/29/2024, 3:12 PMElijah Ben Izzy
05/29/2024, 3:55 PMnull
— then you just capture the null data (works if they’re directly user-facing — propagating null can be really ugly though if they’re not)
2. Run through the whole things with data validators — capture the results of the nodes + the data validator results, then select which ones to include (this is inefficient if there’s further computation, which is my guess, but worth thinking through)
3. Run the DAG in two pieces: (a) the data loading that needs validation, and (b) the metrics transforms. Only request the outputs corresponding to the inputs that validate. A little less explicit/hamiltonic, but could definitely work
4. Create a custom lifecycle adapter to catch, propogate, and discard the results of future errors.
(4) is a bit complicated but potentially powerful. The way it would work is:
1. Your nodes that “gracefully fail” throw a specific exception
2. You implement a NodeExecutionMethod
(https://hamilton.dagworks.io/en/latest/reference/lifecycle-hooks/NodeExecutionMethod/) to catch that exception and return a sentinel value
3. That hook also checks if any inputs are of that value, and also returns that value
4. You pass it to the driver and it replaces execution
This way you can short-circuit anything in the future that depends on busted data. Then at the end you get a dict back and handle it.
I really like this approach — need to prove it out though. Happy to sketch out some code and if its good we can add it to the std lib!Elijah Ben Izzy
05/29/2024, 4:29 PMJesuseyi Fasuyi
05/30/2024, 4:17 PMElijah Ben Izzy
05/30/2024, 4:17 PMElijah Ben Izzy
05/30/2024, 4:18 PMStefan Krawczyk
06/05/2024, 6:05 PM1.65.0
that has this adapter if you wanted to try it out.
Docs:
https://hamilton.dagworks.io/en/latest/reference/lifecycle-hooks/GracefulErrorAdapter/