Speculative Decoding for Message-Oriented Systems: Early Exit with Confidence

We study speculative decoding for message-orientedsystems: a fast predictor proposes a decision and exits early whenconfidence exceeds a threshold τ , otherwise a slow, more accuratepredictor runs (with timeout ∆). We quantify accuracy/latencytradeoffs, throughput gains, accept/fallback rates, and show howτ and ∆ shape the Pareto frontier.