Slackbot
11/10/2023, 11:11 PMAlec Hewitt
11/10/2023, 11:15 PMinjector
. Eg:
driver = Driver({'db_connection': injector.get(DbConnection)}, [data_loaders])
However if there are a lot of variables, this will be make the config quite long.Elijah Ben Izzy
11/10/2023, 11:25 PMdriver = Driver({...}, [data_loaders])
driver.execute([...], inputs={'db_connection' : data_loaders})
This is similar to what you suggested. If you have a few simple dependencies this is easy, but you’ll have to pass in a lot at runtime. If you want to instantiate it once then run it many times, you could happily also pass in as config if that works.
The next (slightly more complex) solution is pretty nifty IMO. You can define a context.py
module (or something like that) with all the variables you’ll want. This effectively replaces the complexity of the injector with python functions.
# context.py
def db_connection(env: str) -> DBConnection:
if env == "prod":
return create_prod_connection(...)
else:
return create_staging_connection
def other_thing_that_you_might_need(...) -> ...:
...
...
This way things get loaded dynamically (only if you need them). All you have to do is ensure that the right module gets added:
driver = Driver({'env': 'staging'}, [data_loaders, context])
Note these don’t use the injector library at all, but you could easily adapt it to do what you want — exactly as you suggested — so use the injector to pass it in. While I agree that it\s generally bad practice to use injector
just to pass it along — I think in this case its purely for bridging the two. That said, it could easily be somewhat verbose. You could use a parameterization to get this to be one function (just have one for each field:
@parameterize(
db_connection={"field" : value("db_connection")},
second_thing_from_injector={"field" : value("second_thing_from_inject")},
and_so_on=...
)
def injected_fields(injector: Injector) -> Any:
...
And extract_columns
will do the same thing — it’ll just resolve them non-dynamically. It would allow you to do different types, I think.Elijah Ben Izzy
11/10/2023, 11:33 PMcontext
module.
All this said, I think you could do another pretty easy approach that you might like:
vars_ = dr.list_available_variables()
normal_inputs = {...} # your current inputs
external_inputs = [item for item in vars_ if item.is_external_input]
inject = {
item.name: injector.get(item.type) for item in external_inputs
}
dr.execute([...], inputs = {**inject, **normal_inputs})
And bam! You just have to be careful to ensure that you’re just asking for types you have injections for.Alec Hewitt
11/10/2023, 11:39 PM@parameterize(
db_connection={"field" : value(DbConnection)},
table_name={"field" : value(TableName)},
)
def injected_fields(injector: Injector, field: Any) -> Any:
return injector.get(field)
Elijah Ben Izzy
11/10/2023, 11:41 PMAny
perfectly, but it’s worth a try, and we’re more than happy to fix if behavior is weird. Otherwise yeah, I think doing it before execution makes a lot of sense. It depends on if you want it to be lazily executed at all.Alec Hewitt
11/10/2023, 11:47 PMElijah Ben Izzy
11/10/2023, 11:51 PMsource/value
, so you can pass in the db connections, etc... And you can create your own.
https://hamilton.dagworks.io/en/latest/reference/io/available-data-adapters/Alec Hewitt
11/10/2023, 11:59 PMAlec Hewitt
11/22/2023, 10:31 PMinject
decorator to be used with the value decorator.
Eg:
# my_module/main_injector.py
from injector import Injector
from my_module import DbConnectionModule
injector = Injector([DbConnectionModule])
# my_module/data_loaders.py
from hamilton.function_modifiers import inject, value
from my_module.main_injector import injector
@inject(db_connection=value(injector.get(DbConnection)))
def load_data(db_connection: DbConnection) -> pd.DataFrame:
# Load data from the DbConnection
# my_module/main.py
from hamilton.driver import Driver
from my_module import data_loaders
driver = Driver({}, [data_loaders])
result = driver.execute(...)
Thank you again for your help on this. Really appreciate how responsive you are on this Slack channel.Stefan Krawczyk
11/22/2023, 10:43 PM# my_module/main_injector.py
from injector import Injector
from my_module import DbConnectionModule
injector = Injector([DbConnectionModule])
# my_module/data_loaders.py
from hamilton.function_modifiers import inject, value
def load_data(db_connection: DbConnection) -> pd.DataFrame:
# Load data from the DbConnection
# my_module/main.py
from hamilton.driver import Driver
from my_module import data_loaders
from my_module.main_injector import injector
driver = Driver({}, [data_loaders])
result = driver.execute(...,
inputs={"db_conection": injector.get(DbConnection)})
Alec Hewitt
11/22/2023, 10:51 PM@inject(db_connection=value(injector.get(DbConnection)))
def load_data(start: datetime, db_connection: DbConnection) -> pd.DataFrame:
# Load data from the DbConnection
Here it is quite obvious that start
is a parameter passed through from Hamilton, whereas db_connection
is coming from the injector
.Stefan Krawczyk
11/22/2023, 11:41 PMAlec Hewitt
11/22/2023, 11:41 PMStefan Krawczyk
11/22/2023, 11:41 PMElijah Ben Izzy
11/22/2023, 11:42 PMAlec Hewitt
11/22/2023, 11:43 PM