Hello community, I have a very strange problem whe...
# ask-for-help
p
Hello community, I have a very strange problem when I try to make concurrent requests to the service using batch. The behavior is happening when I make 100 concurrent requests to the service. The first 100 requests go ok, but when I try to make another 100 the service throws an exception. In the following image are the outputs before and after the error, and the following image shows the error that is thrown by Bento. Additional info: The first time I run the test with 100 requests, bento creates batches with 3 inputs, but the second time I run the test the requests that don't go wrong only create batches of size 1. Follow my runnable code:
Copy code
class BinaryModelRunnable(bentoml.Runnable):
    
    SUPPORTED_RESOURCES = ("<http://nvidia.com/gpu|nvidia.com/gpu>", "cpu")
    SUPPORTS_CPU_MULTI_THREADING = True

    def __init__(self):
        """
        Starts the class by loading the corresponding models
        """
        self.binary_model = bentoml.pytorch.load_model(binary_model)
        <http://self.binary_model.to|self.binary_model.to>(device)    
    

    @bentoml.Runnable.method(batchable=True, batch_dim=0)
    def predict(self, ids, mask):   
        print(ids)
        print("batch_size", len(ids))
        output = self.binary_model(ids=ids, mask=mask)
        
        prediction = []
        prediction.extend(torch.sigmoid(output).cpu().detach().numpy().tolist())

        return np.array(prediction) 

binary_runner = bentoml.Runner(BinaryModelRunnable, name= "binary_runnable", models=[binary_model], max_batch_size=4, max_latency_ms=10000)
I have this problem for some time but no one has been able to help me. someone please help me