Skip to content

AM/FM Handcrafted Features vs. Learned Features in RF Modulation Classification

Classical RF features encode domain priors that are stable
and interpretable. Learned features capture non-linear cues
but are harder to audit. We evaluate both in a controlled,
reproducible setting.
Modern RF modulation classification systems rely heavily
on deep learning architectures that extract features automatically from raw I/Q samples. While these learned representations achieve high performance, they often lack the interpretability and physical grounding of handcrafted features.
Classical signal processing features, such as amplitude modulation index and frequency deviation, directly encode known
properties of communication signals and provide transparent
decision rationale.
This work provides a systematic comparison between handcrafted AM/FM features and learned representations using
identical datasets and evaluation protocols. We focus on features that capture fundamental signal characteristics: amplitude
modulation depth, frequency deviation patterns, and higherorder spectral moments. Our analysis uses SHAP (SHapley
Additive exPlanations) to provide feature-level attributions for
the classical stack while maintaining comparability to learned
baselines through controlled experimental design.

V. REPRODUCIBILITY
All results are fully reproducible via the provided Makefile
pipeline. Run make dev-quick for a small-scale validation or make press for complete results. Data flows
through DATASET_FUNC and CLASSIFIER_SPEC environment variables to ensure consistent sampling between classical
and learned approaches.
The pipeline supports:

  • Configurable dataset sources via DATASET_FUNC
  • Learned model comparison via CLASSIFIER_SPEC
  • Reproducible random seeds for all experiments
  • Automated figure and table generation
    Source code and experimental configurations are available
    to enable replication and extension of these results.

Importance of Validating on Real RF Datasets

Your paper’s experiments rely on synthetic RF data (generated with PSK/QAM/AM/FM over AWGN and mild fading), which is a great starting point for controlled testing but may not capture real-world artifacts like hardware impairments (e.g., I/Q imbalance, phase noise), over-the-air (OTA) propagation effects (e.g., multipath fading, Doppler shifts), or variable sampling conditions from actual receivers. Validating on real RF datasets ensures the normalization policies (evenly spaced downsampling, windowed pooling, strided cropping) generalize beyond ideal simulations. This could reveal if policies like strided cropping are more sensitive to real burst misalignment or if pooling helps mitigate hardware-induced jitter.

Real datasets often include raw I/Q samples, making them compatible with your _create_temporal_input builder. Expect potentially lower accuracies due to unmodeled effects, but this strengthens your contributions by showing robustness.

Recommended Real-World RF Datasets for Modulation Classification

Based on current sources (as of November 2025), here are suitable real-world (OTA-collected) datasets. I prioritized those with raw I/Q data, multiple modulations, and public availability. These were collected using software-defined radios (SDRs) like USRP, capturing authentic wireless environments.

1. RadioML 2018.01A (DeepSig)

  • Description: A large-scale dataset with over-the-air recordings (mixed with some synthetic elements for channel effects). It includes 24 digital and analog modulation types (e.g., variants of ASK, PSK, QAM, AM, FM). Total ~2.5 million signals, suitable for testing your temporal models across SNR levels and real propagation.
  • Size: ~2.5M examples (large, ~106 GB uncompressed).
  • Modulations: 24 (e.g., BPSK, QPSK, 8PSK, 16QAM, 64QAM, 256QAM, AM-SSB, FM, etc.).
  • SNR Range: -20 dB to +30 dB.
  • Collection Method: Over-the-air using USRP SDRs in real environments, capturing hardware and channel impairments.
  • Availability: Free download via Kaggle or DeepSig’s site. Direct link: https://www.kaggle.com/datasets/pinxau1000/radioml2018.
  • Why Suitable: Matches your synthetic setup (includes PSK/QAM/AM/FM), allowing direct comparison. Test your policies on OTA subsets to quantify drop-offs (e.g., aliasing from real noise).

2. DeepRFSoC Dataset

  • Description: Real-world I/Q samples for automatic modulation classification, captured to include hardware effects from RF-ADC/DAC.
  • Size: Not specified (moderate, based on similar datasets; check repo for details).
  • Modulations: 8 (QPSK, BPSK, 16QAM, 64QAM, 8PSK, PAM4, GFSK, CPFSK).
  • SNR Range: Not explicitly stated; varies with real conditions.
  • Collection Method: Transmitted and received OTA on AMD RFSoC platform (FPGA-based SDR), simulating edge-device scenarios.
  • Availability: Documentation and likely download via GitHub repo: https://github.com/axdy/rfsoc_quant_amc. (Contact authors if data isn’t directly hosted.)
  • Why Suitable: Focuses on quantization and real hardware, aligning with your aliasing proxy. Sweep lengths on this to see if windowed pooling reduces distortion from ADC noise.

3. USRP-Collected RF Database (from “Radio Frequency Database Construction and Modulation Recognition” Paper)

  • Description: A real-world wireless signal dataset for modulation recognition in sensor networks, with raw samples under Rayleigh fading and AWGN.
  • Size: ~14,000 samples total (2,000 per modulation for training; each sample ~800,000 points at 800 kHz sampling).
  • Modulations: 7 (2FSK, 16QAM, 64QAM, BPSK, MSK, QPSK, 2ASK).
  • SNR Range: Not explicitly detailed; includes variable noise from real channels.
  • Collection Method: OTA using NI USRP 2920 SDRs (transmitter/receiver setup at 400 MHz center frequency, 70 MHz IF, 400 kbps symbol rate).
  • Availability: Not publicly linked in the paper; contact authors (e.g., via PMC or affiliations) for access. Paper provides setup details for replication.
  • Why Suitable: Emphasizes real multipath fading, testing if your strided cropping (energy-centered) handles burst localization better in noisy OTA data.

Additional Options

  • ORACLE Dataset (from related papers): ~250,000 OTA transmissions for dynamic modulation classification in real networks. Collected with SDRs; check IEEE Xplore for access (may require institutional login).
  • If needed, explore GitHub repos like kwyoke/RF_modulation_classification for OTA extensions of RadioML, or search IEEE DataPort for “over-the-air modulation dataset” for newer entries.

How to Perform Validation

  1. Data Integration: Download I/Q samples (often in HDF5, MAT, or CSV). Use your synthetic generator’s format as a template to preprocess (e.g., via Python with NumPy/Scipy for loading).
  2. Apply Policies: Modify _create_temporal_input to handle real bursts (variable N >> L). Run sweeps for L={32,64,128,256,512} as in your paper.
  3. Metrics and Comparison: Reuse your harness for accuracy, per-modulation breakdowns, and aliasing proxy (PSD divergence). Compare to synthetic baselines—e.g., expect 5-15% accuracy drops on real data due to impairments. Slice by SNR if available.
  4. Potential Findings: Downsampling might excel in clean OTA but degrade with multipath; pooling could smooth hardware noise. If accuracy tanks, consider augmenting with real effects (e.g., via TorchSig for hybrid testing).
  5. Reproducibility: Log results in your JSON harness and update figures/tables. Share on your GitHub (implied from email) for community feedback.

This validation could form a strong “Future Work” extension or follow-up paper. If you provide more details (e.g., specific modulations to match),

https://grok.com/share/bGVnYWN5_aa1651c6-e47c-4576-8df3-530240d5616a

Leave a Reply

Your email address will not be published. Required fields are marked *