Ablation Study of Transformer Components in Middleware: Queues, Cross-Attention, MoE, Rings

Paper_Ablation_Study_of Transformer Components in Middleware Queues Cross-Attention MoE Rings Download

We systematically disable transformer-inspired
components in a unified middleware simulator—IO-aware
queues, cross-attention routing, mixture-of-experts dispatch, and
ring+shortcut topology—and measure their isolated contributions. Metrics: mean and p95 latency, throughput, allocation
error, and CPU-cost proxy. Guidelines fall out: queues tame
tails under burst, cross-attention cuts mismatch waste, MoE lifts
throughput via sparse activation, and ring shortcuts pay down
network distance.

Ablation Study of Transformer Components in Middleware: Queues, Cross-Attention, MoE, Rings

Leave a Reply Cancel reply