Long spectrum sequences stress attention memory and bandwidth. We benchmark grouped-query attention
(GQA) against multi-head attention (MHA) and multi-query
attention (MQA) on FFT-token streams. GQA provides substantial
throughput gains and peak-memory reductions while maintaining
accuracy across token groupings.
RF monitoring pipelines often tokenize FFT power spectra
into long sequences for downstream classification or anomaly
scoring. Vanilla multi-head attention (MHA) quickly becomes
memory-bound as sequence length increases. We study groupedquery attention (GQA) as a middle ground between MHA and
MQA: reduce key/value projections while retaining multiple
query groups.