RF monitoring stacks must compress and interpret highrate spectra under tight latency and power budgets. We target
the common case where the front-end produces windowed
FFT power spectra (magnitude-only) and the back-end must
(1) compress to a compact token sequence for downstream
classifiers, and (2) preserve class-relevant detail. We explore
multi-head linear attention (MHLA) with FlashAttention-style
backends and token-dropout as a simple, hardware-friendly
compressor.
II. BACKGROUND
FlashAttention & Linear Attention. FlashAttention variants
reduce memory traffic for attention kernels; linear attention
further reduces quadratic costs. Rotary Positional Embeddings
(RoPE). RoPE injects relative position via complex rotations,
often improving extrapolation. Token-Dropout. We drop a
proportion r of lowest-energy bins (or a learned saliency proxy)
prior to attention, trading fidelity for speed and energy.
III. METHOD
A. SpectrumEncoder
Given an N-bin power spectrum x ∈ R
N , we form
tokens by striding and optional pooling. We then apply tokendropout with rate r (by energy or entropy score), followed by
MHLA with a pluggable backend (Flash, grouped, or baseline).
Positional encoding uses RoPE, which we ablate by toggling
(none/static/dynamic-θ).