Hi @James Zhang, thanks for the thoughtful question. The StackOverflow article you shared already provided some insightful answers and I agree with all the points from the author. There is no right or wrong in this approach but “you just need to be aware of the pros and cons of doing so.”
While it is true that calling RPC from a Kafka processor will couple availability of Kafka and the RPC service, this architecture decouples the type and scaling of the compute resources. Putting models behind an RPC allows you to run model inference on different hardware, e.g. GPU, scale independently of the Kafka processor, e.g. you won’t need as many GPU instances serving models as Kafka processors.
Luckily, model inference RPC is mostly idempotent. You can safely retry upon failures and maintain the processing guarantees in Kafka. Make sure RPCs are made synchronously so the back pressure can be properly propagated to the Kafka consumer in case there is a backup in the RPC service protecting the service from getting overwhelmed.
While it is entirely possible to serve models in a streaming-native way but you’ll end up having to solve all the model serving challenges, e.g. dependencies, containerization, scaling, resource management, in Kafka settings. Putting models behind an RPC offers simplicity that goes a long way until advanced customizations and optimizations are needed.