Normalization & Attention Backends for RF: RMSNorm + AttentionModelAdapter comparing FlashMHA, Grouped, Latent, and Baseline MHA

Normalization & Attention Backends for RF RMSNorm AttentionModelAdapter Download

We benchmark normalization and attention backends
for RF spectrum models. An AttentionModelAdapter provides
a unified interface to baseline MHA, FlashMHA, groupedquery attention (GQA), and latent attention, while a swap from
LayerNorm to RMSNorm reduces latency. On streaming FFT
power spectra, the best backend (Latent) achieves 90.6% accuracy
with p50 latency 22.0 ms, 480 MB peak KV memory, and 1900
samples/s throughput under a 30 ms budget.
Index Terms—RF classification, normalization, attention, RMSNorm, FlashAttention, GQA