This message was deleted BentoML #ask-for-help

Join Slack

This message was deleted.

# ask-for-help

Slackbot

05/11/2023, 8:34 PM

This message was deleted.

Sean

05/11/2023, 11:25 PM

Hi @James Zhang, thanks for the thoughtful question. The StackOverflow article you shared already provided some insightful answers and I agree with all the points from the author. There is no right or wrong in this approach but “you just need to be aware of the pros and cons of doing so.” While it is true that calling RPC from a Kafka processor will couple availability of Kafka and the RPC service, this architecture decouples the type and scaling of the compute resources. Putting models behind an RPC allows you to run model inference on different hardware, e.g. GPU, scale independently of the Kafka processor, e.g. you won’t need as many GPU instances serving models as Kafka processors. Luckily, model inference RPC is mostly idempotent. You can safely retry upon failures and maintain the processing guarantees in Kafka. Make sure RPCs are made synchronously so the back pressure can be properly propagated to the Kafka consumer in case there is a backup in the RPC service protecting the service from getting overwhelmed. While it is entirely possible to serve models in a streaming-native way but you’ll end up having to solve all the model serving challenges, e.g. dependencies, containerization, scaling, resource management, in Kafka settings. Putting models behind an RPC offers simplicity that goes a long way until advanced customizations and optimizations are needed.

James Zhang

05/14/2023, 1:27 PM

thank you for your reply Sean. I'm just wondering, if I train and deploy model like every day, how is the inference RPC stays idempotent after a day?

Sean

05/16/2023, 1:43 AM

If you wish to have idempotence preserved across deployments, you would need to implement versioned RPC. Version would be a part of the input for selecting the deployed service version. Alternatively, coupling the model lifecycle with the lifecycle of the stream processors is also viable but you’ll have to solve resource isolation and scaling problems discussed above.

2 Views

Open in Slack

Previous Next