We adapt DINO-style self-supervised learning to
Wi-Fi channel state information (CSI) time-series data. By
treating the subcarrier–time grid as a patchable signal and
training a Vision Transformer (ViT) with student–teacher architecture, we learn RF embeddings that significantly improve
downstream decoding tasks over hand-crafted features. Our
method achieves superior linear-probe accuracy, produces wellclustered embedding geometries, and demonstrates strong data
efficiency across label fractions. We provide complete code and
reproducible build pipeline for RF self-supervised learning.