Slackbot
12/01/2023, 3:29 PMmiek
12/01/2023, 3:31 PMThierry Jean
12/01/2023, 4:57 PMThierry Jean
12/01/2023, 5:01 PM# module_a.py
# is a util function
def _string_to_lowercase(string):
return ...
# is a node
def set_dtypes(df: pd.DataFrame) -> pd.DataFrame:
return ...
# module_b.py
import module_a
def load_data(path: str) -> pd.DataFrame:
df = pd.read_parquet(path)
return module_a.set_dtypes(df)
def get_user_id(...) -> str:
user_id = ...
return module_a._string_to_lowercase(user_id)
miek
12/01/2023, 5:01 PMmiek
12/01/2023, 5:01 PMElijah Ben Izzy
12/01/2023, 5:02 PMThierry Jean
12/01/2023, 5:03 PMimportlib
into an ad-hoc module and registering it in sys
Thierry Jean
12/01/2023, 5:04 PMThierry Jean
12/01/2023, 5:04 PMElijah Ben Izzy
12/01/2023, 5:05 PMmiek
12/01/2023, 5:12 PMmiek
12/01/2023, 5:13 PMElijah Ben Izzy
12/01/2023, 5:13 PMElijah Ben Izzy
12/01/2023, 5:13 PMmiek
12/01/2023, 5:14 PMmiek
12/02/2023, 6:18 PM# nodes.py
from moduleA import *
from moduleB import *
Then call the driver with
import nodes
dr = driver (…,nodes,…)
lst = dr.list_all_variables()
The lst
does NOT contain the nodes from moduleA and moduleB… so somehow under the hood it doesn’t seem to go off tangent here…it feels like it should work but it doesn’t :-(
Will dig a bit more, just wanted to post an update here (AFK today)Elijah Ben Izzy
12/02/2023, 6:19 PMElijah Ben Izzy
12/02/2023, 6:20 PMmiek
12/02/2023, 6:20 PMElijah Ben Izzy
12/02/2023, 6:21 PMElijah Ben Izzy
12/02/2023, 6:22 PMmiek
12/02/2023, 6:23 PMmiek
12/02/2023, 6:26 PMmiek
12/02/2023, 6:26 PMElijah Ben Izzy
12/02/2023, 6:37 PMmiek
12/02/2023, 6:42 PMElijah Ben Izzy
12/02/2023, 6:53 PMfrom hamilton import driver
import sample_module
from types import ModuleType
from typing import List
import pkgutil
import importlib
def import_all(base_module: ModuleType) -> List[ModuleType]:
modules = []
for module_info in pkgutil.iter_modules(base_module.__path__):
module_name = f"{base_module.__name__}.{module_info.name}"
module = importlib.import_module(module_name)
modules.append(module)
return modules
all_modules = import_all(sample_module)
dr = driver.Driver({}, *all_modules)
print(dr.execute(["foo", "bar"]))
This is import_all.py
— overall structure is: (module_1
contains def foo
, module_2
contains def bar
). Haven’t tried it recursively, but might be easy enough.
.
├── import_all.py
└── sample_module
├── __init__.py
├── module_1.py
└── module_2.py
miek
12/02/2023, 6:58 PMElijah Ben Izzy
12/02/2023, 6:59 PMmiek
12/02/2023, 8:25 PMmiek
12/02/2023, 9:22 PMElijah Ben Izzy
12/02/2023, 9:31 PMmiek
12/06/2023, 3:29 AMmiek
12/06/2023, 3:31 AMmiek
12/06/2023, 3:35 AMElijah Ben Izzy
12/06/2023, 3:41 AM@subdag
to stitch them together, which essentially looks the same, but with a little more complexity. E.G. a subdag that specifies granularity, as well as some config stuff. So, doable to represent it in Hamilton.Elijah Ben Izzy
12/06/2023, 3:42 AMfrom hamilton import base
dr.materialize(
to.snowflake(
id="save_to_snowflake",
dependencies=["metric_1", "metric_2", ...],
table="...",
combine=base.PandasDataFrameResult()
),
inputs={...}
)
The cool thing about this is that its represented cen trally (I/O is not included in the dag), its customizable (you write the snowflake
adapter and register it), but it actually does do DAG operations — its effecitvely appending a save_to_snowflake
node to the end of the DAG and calling that — you can see this with the corresponding call visualize_materialization
).Elijah Ben Izzy
12/06/2023, 3:46 AMmiek
12/06/2023, 3:53 AMclass HamiltonTable(
table_name=“schema.xyz”,
nodes_to_incl = [ all nodes that go into the table here ]
# some more meta data field here
inputs=…
config=…
)
And then you have a method like
.run(persist=True)
If persist=True, it would call your materialize wrapper.
Looks like I’m on the right track heremiek
12/06/2023, 4:28 AMElijah Ben Izzy
12/06/2023, 4:38 AMElijah Ben Izzy
12/06/2023, 4:39 AMnodes_to_incl
. You can also do automated schema inspection/documentation that way…miek
12/06/2023, 4:49 AMStefan Krawczyk
12/06/2023, 6:10 AMmiek
12/07/2023, 3:07 AMElijah Ben Izzy
12/07/2023, 4:01 PM@subdag
, which is a slightly more advanced concept in Hamilton, so its worth thinking about the right way to expose that to your users!miek
12/07/2023, 5:00 PMElijah Ben Izzy
12/07/2023, 5:21 PM