Carl Trachte
07/04/2024, 11:39 PMCarl Trachte
07/06/2024, 1:22 AMCarl Trachte
07/07/2024, 5:33 PMVadim Ogranovich
07/29/2024, 7:08 PMfinal_vars
in the Builder, something like driver.Builder().with_final_vars(...)...
.
I understand I can achieve the same effect via dr.execute(final_vars=final_vars)
, however providing final_vars at the build stage has a potential of greatly reducing the build time, or doesn't it? What am I missing?Carl Trachte
08/10/2024, 1:58 AMVolker Lorrmann
08/20/2024, 3:31 PMCarl Trachte
08/26/2024, 7:42 PMVolker Lorrmann
08/29/2024, 12:10 PMCarl Trachte
08/31/2024, 6:41 AMCarl Trachte
08/31/2024, 1:49 PMCarl Trachte
09/27/2024, 8:21 PMCooper Snyder
09/30/2024, 12:31 PMCooper Snyder
10/07/2024, 9:48 PMclass OrchestratableTask(BaseModel):
def setup(self, *args, **kwargs):
#environment, application, runtime specific setup.
def extract(self, *args, **kwargs):
#external state and external data from target system
def run_pure_transform(self, *args, **kwargs):
#pure, deterministic (enough) function based on inputs
def load(self, *args, **kwargs):
# load results to external database
def run_transform_w_io_side_effects(self, *args, **kwargs):
extracted_data = self.extract()
transformed_data = self.run_pure_transform(extracted_data)
self.load(extracted_data)
if __name__ == __main__:
# add arg parser
task = OrchestratableTask()
task.setup(*args, **kwargs)
task.run_transform_w_io_side_effects(*args, **kwargs)
where id have like a command/strategy pattern with args and kwargs controlling the behavior of the functions flow (i know it'd go into those config when decorators), and have whatever business logic right there in the transform flow, but im running into the code smells of mixing object oriented with functional, doing like a hamilton dag for each step and then another hamilton dag for those dags (i dont think this works well...)
but im feeling a bit analysis paralysis; has anyone run into this idea or anything like it? any criticism for that design? I feel like from reading the docs, idiomatically you'd just make it one hamilton dag with the dataloader and datasavers and config.when decorators, but I REALLY wanted to try to make it obvious to developers that those are the main 4 abstractions required for a singular OrchestratableTask and let someone pip install package that houses all of the subclass tasks and be able to run the pure function however they like in a discovery environment like a notebook.
Is this overcomplicating it with the Task class?
Thank you!David Medinets
10/11/2024, 3:43 AMJonas Meyer-Ohle
10/16/2024, 3:20 PMActual error: No registered subclass of BaseDefaultValidator is available for arg: schema and type <class 'polars.dataframe.frame.DataFrame'>. This either means (a) this arg-type contribution isn't supported or (b) this has not been added yet (but should be). In the case of (b), we welcome contributions. Get started at <http://github.com/dagworks-inc/hamilton|github.com/dagworks-inc/hamilton>.
I stepped through the following file a bit: https://github.com/DAGWorks-Inc/hamilton/blob/main/hamilton/data_quality/pandera_validators.py#L9
And it seems like the polars plugin isn't part of the supported extensions, I'm assuming this is the issue?
Thanks!Justin Donaldson
10/21/2024, 5:51 PMVolker Lorrmann
10/28/2024, 8:40 AMAndres MM
10/29/2024, 10:08 AMdef bar_union(x: pd.Series) -> t.Union[int, pd.Series]:
try:
return x
def foo_bar(bar_union: int) -> int:
return bar + 1
Viktor
11/28/2024, 6:14 PMJustin Donaldson
12/04/2024, 10:58 PMVolker Lorrmann
12/10/2024, 9:51 AMPaul
01/28/2025, 11:13 AMimport ourlib.modules as mod
my_modules = [mod.load_csv, mod.compute1, mod.compute2, mod_save_csv]
dr = driver.Builder()
.with_modules(*my_modules)
.with_config(my_config)
# where file system is strutured
- ourlib
- modules
- load_csv.py
- compute1.py
- compute2.py
- save_csv.pyKeshav Ravi
01/28/2025, 11:14 AMTrần Hoàng Nguyên
02/17/2025, 6:40 AMElijah Ben Izzy
02/27/2025, 3:56 PMSlackbot
02/27/2025, 9:51 PMEvan Lutins
03/12/2025, 2:43 PMValidationResult
nodes from the DAG when calling visualize_execution()
? I tried passing in bypass_validation=True
argument but didnt work. here is the code used to generate my dag.
dr.visualize_execution(
final_vars=outputs,
inputs=inputs,
bypass_validation=True
)
The returned DAG contains a {node-name}_raw
and {node-name}_validator
for each node decorated with a @check_output
. Ideally I would just like a single {node_name}
represented in the DAGVictor Bouzas
03/14/2025, 10:07 AMVolker Lorrmann
03/26/2025, 10:06 AMBob Gregory
04/09/2025, 5:47 PMparameterize_value
? The "reusing_functions" example in your repo focuses on subdags instead.