This message was deleted.
# hamilton-help
s
This message was deleted.
👀 1
s
Good question! Hamilton’s base assumption is non-asyncio based, but you should be able to run it with some caveats. There are a few ways that come to mind. Let me write some code and get back to you in a bit.
s
Thanks @Stefan Krawczyk 🙏
e
Super excited to see this use-case btw — we’ve been talking about how we could run hamilton in an online setting but were looking for more real-world examples 🙂
🙌 1
s
Okay I think the quickest option to get unblocked is to split I/O, which is what I’m assuming requires asyncio, from Hamilton computation. E.g. within a fastapi app
Copy code
from hamilton import driver
dag = driver.Driver({...}, modules, adapter=...)

@app.get("/endpoint")
async def compute( ... ):
    data = await pull_from_db(...)
    result = dag.execute([output], inputs=data)
    # transform result for fastapi
    return result
s
@Elijah Ben Izzy happy to provide one 😀 Came across Hamilton once more thanks to @Stefan Krawczyk ‘s MLOps world talk and realized it has the potential to be a great solution for us. Happy to support where I can!
🙌 1
s
To run Hamilton in an asyncio based way with async functions within a running event loop, requires an async based driver at least — doesn’t seem hard, to do. If running Hamilton outside an event loop then it’s easy to call asyncio based operations within the function itself via
asyncio.run(…)
. Otherwise Hamilton can be run within a running event loop only if the underlying DAG has no async functions - which is what I’m suggesting as the stop gap measure.
@Simon Helmig created https://github.com/stitchfix/hamilton/issues/167 to track this.
🙏 1
@Simon Helmig is splitting the I/O out a possibility for you?
s
@Stefan Krawczyk that makes sense thank you! In principle it is possible; we do have a setup where we end up making DB calls quite deep within the logic, not just to read data but also to log outputs. No reason we couldn't extract this out other than having to refactor; from a performance perspective it would be better to limit round trips anyhow. I think using asyncio.run as you suggest for the I/O could also be a good solution here; I'll have to try this tomorrow and see if it works! In general it would be ideal if we could retain the input operations as function args similar to how Hamilton usually operates, but for output operations calling them explicitly is no trouble at all.
e
So yeah, I think the first two ideas are good short-term/getting unblocked. One thing that’s interesting re: db/asyncio is that you can use overrides to replace it. So, if you have a hamilton DAG in which one step yields the DB call, you can use an override to replace that with the outside-of-hamilton async db call. This is cool cause it allows: 1. The ability to unit test your DAGs by injecting data instead of that external operation 2. The ability to switch data providers (E.G. from sync outside of a service context to async inside one) 3. The ability to generate data as your service would in bulk in a natural way Not sure if (3) is useful but this is something that we’ve thought about
s
@Elijah Ben Izzy good point! For the data input steps this is great solution. This combined with calling the write operations with asyncio.run seems like it should cover things! Actually 3 is useful! Being able to replay requests and inject data to the service is a key consideration for us, which the override definitely facilitates.
🎉 1
s
@Simon Helmig just to clarify, using`asyncio.run()` when running within the fastapi app won’t work — you’ll get
Copy code
RuntimeError: asyncio.run() cannot be called from a running event loop
I’ll add some code examples to the github issue for clarity on what does and does not work.
s
@Stefan Krawczyk ahh right, yeah that makes sense… thanks for the clarification and code examples!
s
@Simon Helmig is the logging back to the DB critical to the application and does it need to be on the request path to return a web response?
s
@Stefan Krawczyk As it stands, yes the logging is critical, since we want to be certain of when particular components of our pipeline were executed. In principle we could log these start and stop times without writing to the DB; but this would require us to refactor the components to ensure they always return, so that we can write these outputs to the DB in batch at the end of the computation and ostensibly outside of Hamilton. In any case, we would need to write to the DB at some part of the request no matter what.
s
@Simon Helmig makes sense thanks. Sounds like the request doesn't need to be fast, or does it? Otherwise could you add some toy code to the issue that mirrors your structure of the DAG please? That way I can have a realistic example to develop against?
s
@Stefan Krawczyk Well it didn’t originally but we’re trying to improve performance at the moment; in part why we’re shifting to async-first. Absolutely, I’ll provide some today!
gratitude thank you 1
@Stefan Krawczyk here’s a Gist I put together that should give a flavour of what I mean. Essentially we have some top level function
pipeline
, which orchestrates
component1
and
component2
, either of which could be requesting data from external sources, and all of which log data into the db at the beginning and end of each function. Hope it makes sense, let me know if anything is unclear / I can support somehow!
👍 1
e
Hey, taking a quick peak at the code. Some qs: • How deep will the pipeline be? The one you have is like 2-3 nodes deep — curious if that’s a representation of what you’ve got or it’ll get more complex. • In
computation1
(and presumably
computation2
), you do three things — log, calculate, then log. is the idea of breaking up logging into two pieces for the purpose of measuring (E.G. time), or could it be one piece? Likely going to try to prototype an async hamilton implementation (I’ve actually done it in the past so I have some code I can dig up), but it will add a bit of complexity, so it’ll be on a branch for now (happy to walk you through installing it). OTOH, separating the compute from the externalities might help out with your use-case, especially when you want to log differently in bulk…
But obviously I’m not an expert on your code, so kinda just poking around now 🙂
Also thanks for the gist! Makes things super clear 🙌
Hey! Had some fun hacking -- would love your feedback. Basically, I've gotten (a slightly modified version of) your gist to work and I'm pretty happy with the solution. Slightly experimental, but we'd love your feedback on it. Turned out to be a few simple class additions! Note its not yet unit tested (coming soon), but it has an example + a README to play around with: https://github.com/stitchfix/hamilton/pull/171. To install, you can either: • check out and do
pip install -e .
• install directly from git
pip install "git+<https://github.com/stitchfix/hamilton@async-prototype>"
Next up I'll be adding some unit tests to ensure it won't break as we change things, but I'm feeling pretty confident it works
@Simon Helmig hey! This releases the
async
implementation! Looking forward to how it works for your case. https://hamilton-opensource.slack.com/archives/C03M34FM058/p1660621246625529