Hello community, how are you? I wanted to ask you a question about the use of bento for my service. Currently my project has a prediction pipeline that uses four pytorch models. The models are used to classify chat texts, and the chats may or may not pass through the four models. I built my service so that it can make the decision of which model to use next. As pytorch models are batch-enabled naturally I chose to use their native batch instead of using the batch that bento provides. My choice was based on the fact that to use the batch that bento provides I need to send requests in the ways the model consumes. With that, I had a plan to create a pre-processing step for requests that will be sent to bento. I figured that with this step I could use the batch that bento provides, making, for example, several requests from the same chat and letting bento decide how it will classify these various entries with the batch. As we have 4 models, I also wondered if it would be possible for these models to run in parallel, each one with different tasks, assuming I would have hardware for that.
As I'm new to this framework, I don't know if my plan makes sense and I wanted your help to light my way.
Thank you very much in advance