This message was deleted.
# hamilton-help
s
This message was deleted.
đź‘€ 1
s
hey @Amos! To confirm my understanding of your problem: • you have a fast API endpoint • for each request you will be running execute on a driver • the problem is that it appears the model is being loaded multiple times — rather than once for the life of the fastapi server (?) • any solution you’d want, should also work for the hamilton code to be run in click. Do I have that right?
If so then: I would suggest loading the model using something like this in fastAPI, caching it, and then injecting it as an input to hamilton execution. If you load the model as part of the Hamilton DAG, then I would run the DAG in the lifespan part but only exercise the Hamilton DAG to load the model, and then “cache” the result. Then when you come to execute, you can pass in the cached value as an override.
👍 1
a
Hi mate, great to see so much activity in this space. Yes, that's essentially right. There's a pattern we've used for customising DAG/Drivers via a simple factory function that essentially hands the user a callable for generating some data product based on a small config dictionary. We're repurposing that here by putting the call behind a FastAPI endpoint, and, in this case, a CLI, too. The issue is managing the model loading inside the async while also handling scenarios where the DAG config may be such that the model isn't needed at all. I was looking at this type of event async queue https://huggingface.co/docs/transformers/pipeline_webserver but maybe the Lifespan Events you pointed out are a better option? If I understand correctly, you're essentially saying treat the model as an input or node and supply an override to the execute call when using the FastAPI. I guess one would then also need some guards to not supply the override under circumstances where config has decided the node that uses the model doesn't need to be in the dag — is there a smart way of doing that? Hope all's well with you guys…
s
@Amos you could ask the constructed driver if it’s required? E.g. tag the node, and then create the driver, and ask are there any model tags? If so load them up, cache them. Then when you go to execute you can ask the same question, or just know if something is cached, it’s required… How many permutations of the config/driver would there be? Are you instantiating a new driver on a per request basis? or?
You could use a queue to front things like in that example, but that’s just trying to make sure you only instantiate things only once, and then have the possibility of batching requests — but whether you can batch depends on what you’re doing etc. and then the “things to consider” bit at the bottom of the HF link is pretty spot on with caveats.
a
Hi @Stefan Krawczyk, that's helpful, thanks. I'm running into an async issue with the experimental driver. Any thoughts?
Copy code
File "…python3.10/site-packages/hamilton/experimental/h_async.py", line 74, in execute_node
    task = asyncio.create_task(coroutine)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 336, in create_task
    loop = events.get_running_loop()
RuntimeError: no running event loop
ERROR    hamilton.experimental.h_async:h_async.py:171 -------------------------------------------------------------------
The model-loading node has no inputs and is essentially just meant to run this
model = pipeline(pipeline_type, model=model_name)
Actually I think that maybe something with the test client since it seems to be working okay in the live app.
👍 1
s
@Amos yeah for our async tests we need to mark them with
@pytest.mark.asyncio
… otherwise yeah that stack trace is saying it’s not being run with an event loop
a
That's what I did wrong … I swapped the
asyncio
mark for an
anyio
mark and forgot to check if it broke anything.
👍 1