Majority vs Weighted vs Stacked Voting in RF Modulation Ensembles

Majority vs Weighted vs Stacked Voting in RF Modulation Ensembles bgilbert1984 Download

Performance Metrics.** Key Findings includes a bullet point stating, “Stacked voting showed the best calibration (lowest ECE) at $K=3$ .” Accuracy vs. Models shows a simplified line chart illustrating that all three methods (majority, weighted, stacked) reach 1.0 accuracy at $K=3$ models, with stacked showing higher intermediate accuracy at $K=2$¹. Performance Metrics is a table showing: Metric: TTFB (p50) at $K=4$, Majority: 3.2 ms, Weighted: 3.2 ms, Stacked: 3.4 ms²²². Another row shows: Metric: ECE at $K=3$, Majority: 0.654, Weighted: 0.654, Stacked: 0.333³³³. A final note states, “Weighted voting typically dominates majority when confidences are calibrated; stacked can surpass both given diverse base-model errors and sufficient meta-data⁴.”]

🖼️ RF Modulation Ensembles: Voting Strategy Comparison

This image summarizes the core results from the paper “Majority vs Weighted vs Stacked Voting in RF Modulation Ensembles”⁵⁵⁵⁵.

1. Key Findings

Accuracy: All three methods (majority, weighted, and stacked) achieved 1.000 accuracy at $K=3$ models⁶.
Calibration: Stacked voting demonstrated the best calibration, with the lowest Expected Calibration Error (ECE = 0.333) at $K=3$, compared to $0.654$ for majority and weighted⁷⁷⁷.
General Performance: Weighted voting is generally expected to outperform majority when confidences are calibrated, while stacked can exceed both if there are diverse base-model errors and enough meta-data⁸.

2. Accuracy vs. Model Count ($K$)

# Models (K)	Majority	Weighted	Stacked
1	$\approx 0.0$	$\approx 0.0$	$\approx 0.0$
2	$\approx 0.0$	$\approx 0.0$	$\approx 0.35$
3	1.000	1.000	1.000
4	1.000	1.000	1.000

Observation: Stacked voting showed a noticeable accuracy advantage at $K=2$ models before all three methods converged at $K=3$⁹.

3. Performance Metrics at Max Model Count

Metric	Majority	Weighted	Stacked
Time-to-First-Byte (TTFB, p50) at $K=4$	3.2 ms	3.2 ms	3.4 ms
Expected Calibration Error (ECE) at $K=3$	0.654	0.654	0.333
Macro-F1 at $K=3$	0.400	0.400	0.400

The data suggests that stacked voting is slightly slower in terms of median TTFB at $K=4$ models but provides significantly better calibration¹⁰¹⁰¹⁰¹⁰¹⁰¹⁰¹⁰¹⁰¹⁰.

Would you like to know more about the Stacked meta-learner used in the study or the different input types to the classifier?