This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

08/02/2022, 5:59 PM

This message was deleted.

👀 1

Stefan Krawczyk

08/02/2022, 6:17 PM

Good question! Hamilton’s base assumption is non-asyncio based, but you should be able to run it with some caveats. There are a few ways that come to mind. Let me write some code and get back to you in a bit.

Simon Helmig

08/02/2022, 6:29 PM

Thanks @Stefan Krawczyk 🙏

Elijah Ben Izzy

08/02/2022, 6:37 PM

Super excited to see this use-case btw — we’ve been talking about how we could run hamilton in an online setting but were looking for more real-world examples 🙂

🙌 1

Stefan Krawczyk

08/02/2022, 6:45 PM

Okay I think the quickest option to get unblocked is to split I/O, which is what I’m assuming requires asyncio, from Hamilton computation. E.g. within a fastapi app

Copy code

from hamilton import driver
dag = driver.Driver({...}, modules, adapter=...)

@app.get("/endpoint")
async def compute( ... ):
    data = await pull_from_db(...)
    result = dag.execute([output], inputs=data)
    # transform result for fastapi
    return result

Simon Helmig

08/02/2022, 6:46 PM

@Elijah Ben Izzy happy to provide one 😀 Came across Hamilton once more thanks to @Stefan Krawczyk ‘s MLOps world talk and realized it has the potential to be a great solution for us. Happy to support where I can!

🙌 1

Stefan Krawczyk

08/02/2022, 6:50 PM

To run Hamilton in an asyncio based way with async functions within a running event loop, requires an async based driver at least — doesn’t seem hard, to do. If running Hamilton outside an event loop then it’s easy to call asyncio based operations within the function itself via

asyncio.run(…)

. Otherwise Hamilton can be run within a running event loop only if the underlying DAG has no async functions - which is what I’m suggesting as the stop gap measure.

Stefan Krawczyk

08/02/2022, 7:02 PM

@Simon Helmig created https://github.com/stitchfix/hamilton/issues/167 to track this.

🙏 1

Stefan Krawczyk

08/02/2022, 7:04 PM

@Simon Helmig is splitting the I/O out a possibility for you?

Simon Helmig

08/02/2022, 7:15 PM

@Stefan Krawczyk that makes sense thank you! In principle it is possible; we do have a setup where we end up making DB calls quite deep within the logic, not just to read data but also to log outputs. No reason we couldn't extract this out other than having to refactor; from a performance perspective it would be better to limit round trips anyhow. I think using asyncio.run as you suggest for the I/O could also be a good solution here; I'll have to try this tomorrow and see if it works! In general it would be ideal if we could retain the input operations as function args similar to how Hamilton usually operates, but for output operations calling them explicitly is no trouble at all.

Elijah Ben Izzy

08/02/2022, 7:25 PM

So yeah, I think the first two ideas are good short-term/getting unblocked. One thing that’s interesting re: db/asyncio is that you can use overrides to replace it. So, if you have a hamilton DAG in which one step yields the DB call, you can use an override to replace that with the outside-of-hamilton async db call. This is cool cause it allows: 1. The ability to unit test your DAGs by injecting data instead of that external operation 2. The ability to switch data providers (E.G. from sync outside of a service context to async inside one) 3. The ability to generate data as your service would in bulk in a natural way Not sure if (3) is useful but this is something that we’ve thought about

Simon Helmig

08/02/2022, 7:44 PM

@Elijah Ben Izzy good point! For the data input steps this is great solution. This combined with calling the write operations with asyncio.run seems like it should cover things! Actually 3 is useful! Being able to replay requests and inject data to the service is a key consideration for us, which the override definitely facilitates.

🎉 1

Stefan Krawczyk

08/02/2022, 7:48 PM

@Simon Helmig just to clarify, using`asyncio.run()` when running within the fastapi app won’t work — you’ll get

Copy code

RuntimeError: asyncio.run() cannot be called from a running event loop

Stefan Krawczyk

08/02/2022, 7:51 PM

I’ll add some code examples to the github issue for clarity on what does and does not work.

Simon Helmig

08/02/2022, 7:53 PM

@Stefan Krawczyk ahh right, yeah that makes sense… thanks for the clarification and code examples!

Stefan Krawczyk

08/02/2022, 8:06 PM

@Simon Helmig is the logging back to the DB critical to the application and does it need to be on the request path to return a web response?

Simon Helmig

08/03/2022, 4:36 AM

@Stefan Krawczyk As it stands, yes the logging is critical, since we want to be certain of when particular components of our pipeline were executed. In principle we could log these start and stop times without writing to the DB; but this would require us to refactor the components to ensure they always return, so that we can write these outputs to the DB in batch at the end of the computation and ostensibly outside of Hamilton. In any case, we would need to write to the DB at some part of the request no matter what.

Stefan Krawczyk

08/03/2022, 2:20 PM

@Simon Helmig makes sense thanks. Sounds like the request doesn't need to be fast, or does it? Otherwise could you add some toy code to the issue that mirrors your structure of the DAG please? That way I can have a realistic example to develop against?

Simon Helmig

08/03/2022, 3:30 PM

@Stefan Krawczyk Well it didn’t originally but we’re trying to improve performance at the moment; in part why we’re shifting to async-first. Absolutely, I’ll provide some today!

gratitude thank you 1

Simon Helmig

08/03/2022, 6:57 PM

@Stefan Krawczyk here’s a Gist I put together that should give a flavour of what I mean. Essentially we have some top level function

pipeline

, which orchestrates

component1

and

component2

, either of which could be requesting data from external sources, and all of which log data into the db at the beginning and end of each function. Hope it makes sense, let me know if anything is unclear / I can support somehow!

👍 1

Elijah Ben Izzy

08/05/2022, 8:00 PM

Hey, taking a quick peak at the code. Some qs: • How deep will the pipeline be? The one you have is like 2-3 nodes deep — curious if that’s a representation of what you’ve got or it’ll get more complex. • In

computation1

(and presumably

computation2

), you do three things — log, calculate, then log. is the idea of breaking up logging into two pieces for the purpose of measuring (E.G. time), or could it be one piece? Likely going to try to prototype an async hamilton implementation (I’ve actually done it in the past so I have some code I can dig up), but it will add a bit of complexity, so it’ll be on a branch for now (happy to walk you through installing it). OTOH, separating the compute from the externalities might help out with your use-case, especially when you want to log differently in bulk…

Elijah Ben Izzy

08/05/2022, 8:01 PM

But obviously I’m not an expert on your code, so kinda just poking around now 🙂

Elijah Ben Izzy

08/05/2022, 8:01 PM

Also thanks for the gist! Makes things super clear 🙌

Elijah Ben Izzy

08/07/2022, 7:45 PM

Hey! Had some fun hacking -- would love your feedback. Basically, I've gotten (a slightly modified version of) your gist to work and I'm pretty happy with the solution. Slightly experimental, but we'd love your feedback on it. Turned out to be a few simple class additions! Note its not yet unit tested (coming soon), but it has an example + a README to play around with: https://github.com/stitchfix/hamilton/pull/171. To install, you can either: • check out and do

pip install -e .

• install directly from git

pip install "git+<https://github.com/stitchfix/hamilton@async-prototype>"

Elijah Ben Izzy

08/07/2022, 7:51 PM

Next up I'll be adding some unit tests to ensure it won't break as we change things, but I'm feeling pretty confident it works

Elijah Ben Izzy

08/16/2022, 3:42 AM

@Simon Helmig hey! This releases the

async

implementation! Looking forward to how it works for your case. https://hamilton-opensource.slack.com/archives/C03M34FM058/p1660621246625529

3 Views

Open in Slack

Previous Next