🚀 Just dropped from our team at Snowflake AI Research: ArcticInference — a vLLM plugin that supercharges the CodeAct Agent using SuffixDecoding, achieving 1.8×–4.5× faster end-to-end speeds ⚡ with no loss in quality!
💡 Example: Solving SWE-Bench Verified on 4×H100 GPUs with openhands-lm-32b (37.2% resolve rate) now takes 5.8h instead of 10.9h — saving both time ⏱️ and money 💰.
🔗 Dive in: https://www.snowflake.com/en/engineering-blog/fast-speculative-decoding-vllm-arctic/