Roy Kid
04/29/2024, 12:45 PMparameterize_extract_columns
, and I got an error about result building I think. My code like this:
@resolve(
when=ResolveAt.CONFIG_AVAILABLE,
decorate_with=lambda: parameterize_extract_columns(
*[ParameterizedExtract(tuple(exp.name), {"exp": value(exp)}) for exp in self]
),
)
def mapper(exp: Experiment) -> dict:
os.chdir(exp["run_dir"])
dr = driver.Builder().with_modules(*modules).build()
result = dr.materialize(*materializers, inputs=exp.param)
# result = dr.execute(inputs=exp.param, final_vars=["load_sin"])
return result
and the driver is like this:
dr = (
driver.Builder()
.with_modules(tracker)
.enable_dynamic_execution(allow_experimental_mode=True)
.with_execution_manager(execution_manager)
.with_adapter(base.SimplePythonGraphAdapter(base.DictResult()))
.with_config({settings.ENABLE_POWER_USER_MODE: True})
.build()
)
os.chdir(root)
results = dr.execute(
final_vars=[name for name in parameters],
)
I don't know how to resiger types to result builder, or parameterize_extract_columns
only support pd.dataframe?Elijah Ben Izzy
04/29/2024, 2:16 PMRoy Kid
04/29/2024, 2:19 PMFAILED tests/test_proj.py::TestProject::test_map_reduce - NotImplementedError: Cannot get column type for [<class 'dict'>]. Registered types are {'pandas': {'dataframe_type': <class 'pandas.core.frame.DataFrame'>, 'column_type': <class 'pandas.core.series.Series'>}}
Roy Kid
04/29/2024, 2:21 PMExperiment
with different name, how should I deal with this decorator(I have totally no clue, the "four columns, two for each parameterization" example is too difficult to understand.....)Elijah Ben Izzy
04/29/2024, 2:52 PMparameterize_extract_columns
bascially does both parameterize, and extract_columns.
1. parameterize over inputs
2. extract columns for each of those inputs
It does this with a dataclass for each one. Note this is… quite complex. However, the core problem is that you’re retruning a dict
from the function and it has to be a dataframe (whatever type). Specifically, it has to be a dataframe, because we run one node that returns that dataframe for each parameterization, and then extract from each of these. So the function gets repeated n
times.
Does this make sense? TBH there are likely better ways of doing this (it’s probably a little more complex than necessary), but I don’t have full context into what you’re working on…Roy Kid
04/29/2024, 3:05 PMparameterize
and Collectable[]
for Parallel Execution? If I want to run a DAG with different arguments several times, which one is better? You can find source code here: https://github.com/MolCrafts/molexp/blob/master/src/molexp/tracker.py. After parallel execution I want to reduce the result from different experimentsRoy Kid
04/29/2024, 3:20 PMMultiThreadingExecutor
, the parameterize
part still run one by one which makes me confused... https://github.com/MolCrafts/molexp/blob/bbd126c53fcb402e04e3ca8f4e8bf153a43a4a1c/src/molexp/project.py#L57Elijah Ben Izzy
04/29/2024, 4:23 PM@parameterize_extract_fields
. Which we don’t have (but open up an issue?). I’m curious — do you need one node for each field? Or one node for each dict? If its one node for each dict you’ll want to use @parameterize
Elijah Ben Izzy
04/29/2024, 4:25 PM@parameterize
is static/fixed
2. `Collect[…]`/`@Parallelizable[…]` is dynamic, decided at runtime. This uses a runtime-assigned (dynamic) output for a
The executor isn’t smart enough to go over @parameterize
— it can just repeat blocks between the Parallelizable[…]
and Collect[…]
nodes.
So, what I think you want is the parallelizable construct — have one node list out your combinations and declare Parallelizable
and have the other do Collect
Roy Kid
04/29/2024, 4:47 PMElijah Ben Izzy
04/29/2024, 4:53 PM