This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

06/06/2023, 1:45 PM

This message was deleted.

Culver McWhirter

06/06/2023, 1:48 PM

some other stuff: • we'd like to avoid just combining all the SQLs into 1 big scary SQL since our data scientists may not always need all of those columns, and we're also switching to Hamilton to move away from our old featurestore that is just 1 giant scary SQL • I saw there was a

async

decorator but the README said

async

doesnt play well with other decorators, and we're using

extract_columns

to break the dataframes from queries into individual columns

Elijah Ben Izzy

06/06/2023, 2:55 PM

Good morning! So yes, this is a pretty common use-case. Just to be clear — you run a bunch of SQL operations that each load a dataframe of sorts, then join/manipulate them in some way, correct?

async

could work (although its still a little undeveloped). Our approach for parallelization has generally been to delegate to other frameworks. So, the

ray

and

dask

graph adapters both naturally do horizontal parallelism. The idea is its a quick swap for the driver, and you get the power of distributed systems. Some resources: • Quick post about scaling with ray • More information about horizontal scaling with ray/dask • dask hello_world • ray hello_world

Elijah Ben Izzy

06/06/2023, 2:55 PM

I think this should happily cover your case — both can be set up pretty easily to run on whatever cores/compute you have. That said, we’re also thinking of having anotehr simple multiprocessing adapter.

Stefan Krawczyk

06/06/2023, 5:13 PM

@Culver McWhirter another idea (as a stop gap) would be to split things into two drivers: 1. One that uses Ray/Dask to parallelize and load the data. 2. Then one that does the downstream computation — passing in the output of the first driver; it is not always ideal to use Ray/Dask because the serialization cost between processes can outweigh any parallelization benefits. We currently don’t support arbitrary parallelization of a DAG, but your use case is definitely a motivating one to provide functionality for.

Stefan Krawczyk

06/06/2023, 5:15 PM

Would you mind creating a github issue with your use case and what you’d like to see happen please? That would help.

Culver McWhirter

06/13/2023, 5:14 PM

sorry for the very late response, i wasnt sure exactly what i wanted, so i wanted to put in more effort and see if i could actually get something working this is what i ended up doing (im sure its not ideal/perfect, but it does work):

hamilton_async_sql.py

Culver McWhirter

06/13/2023, 5:18 PM

^ this is pretty specific to our code, but i wonder if it could be made more general by throwing each Hamilton func into a new thread with this?

loop = asyncio.get_running_loop()

loop.run_in_executor(THREAD_POOL, some_func)

would love to hear your thoughts

Elijah Ben Izzy

06/13/2023, 5:25 PM

Nice! Glad it works. OK, a bit confused — are you using the async decorator? If so, then why are you getting the event loop and adding it into another thread? Shouldn’t you get the benefits of async on a single thread?

Elijah Ben Izzy

06/13/2023, 5:26 PM

Also, if you want, we’d be happy to get on a call and talk through your use-case!

Stefan Krawczyk

06/13/2023, 5:36 PM

yeah @Culver McWhirter we also have an AsyncDriver for Hamilton - which we can jump on a call to explain too. But a little more context on where you want this to run would help 🙂

Culver McWhirter

06/13/2023, 5:36 PM

this is my first dive into async with Python, so i might be doing stuff a little wrong the reason i ended up having to put

run_query()

and

get_results()

in new threads is because theyre not async functions themselves, so i couldnt

await

them

Culver McWhirter

06/13/2023, 5:38 PM

sorry i left out that important part. I am using

AsyncDriver

, the code snippet I posted was the funcs i pass to to the driver

Culver McWhirter

06/13/2023, 5:43 PM

so this would be the other script that actually imports those functions and runs the driver

async_hamilton_driver_py.py

Stefan Krawczyk

06/13/2023, 5:55 PM

@Culver McWhirter do you have time now to jump on a quick call?

👍 1

Stefan Krawczyk

06/13/2023, 6:10 PM

@Culver McWhirter https://meet.google.com/ciz-naxw-hhg?authuser=0

Stefan Krawczyk

06/13/2023, 6:42 PM

Thanks for your time @Culver McWhirter here’s the gist of code we walked through https://gist.github.com/skrawcz/677daa5e72cba8b9c26d91728468f9e0

3 Views

Open in Slack

Previous Next