Skip to content

Majority vs Weighted vs Stacked Voting in RF Modulation Ensembles

Performance Metrics.** Key Findings includes a bullet point stating, “Stacked voting showed the best calibration (lowest ECE) at $K=3$ .” Accuracy vs. Models shows a simplified line chart illustrating that all three methods (majority, weighted, stacked) reach 1.0 accuracy at $K=3$ models, with stacked showing higher intermediate accuracy at $K=2$1. Performance Metrics is a table showing: Metric: TTFB (p50) at $K=4$, Majority: 3.2 ms, Weighted: 3.2 ms, Stacked: 3.4 ms222. Another row shows: Metric: ECE at $K=3$, Majority: 0.654, Weighted: 0.654, Stacked: 0.333333. A final note states, “Weighted voting typically dominates majority when confidences are calibrated; stacked can surpass both given diverse base-model errors and sufficient meta-data4.”]


🖼️ RF Modulation Ensembles: Voting Strategy Comparison

This image summarizes the core results from the paper “Majority vs Weighted vs Stacked Voting in RF Modulation Ensembles”5555.

1. Key Findings

  • Accuracy: All three methods (majority, weighted, and stacked) achieved 1.000 accuracy at $K=3$ models6.
  • Calibration: Stacked voting demonstrated the best calibration, with the lowest Expected Calibration Error (ECE = 0.333) at $K=3$, compared to $0.654$ for majority and weighted777.
  • General Performance: Weighted voting is generally expected to outperform majority when confidences are calibrated, while stacked can exceed both if there are diverse base-model errors and enough meta-data8.

2. Accuracy vs. Model Count ($K$)

# Models (K)MajorityWeightedStacked
1$\approx 0.0$$\approx 0.0$$\approx 0.0$
2$\approx 0.0$$\approx 0.0$$\approx 0.35$
31.0001.0001.000
41.0001.0001.000

Observation: Stacked voting showed a noticeable accuracy advantage at $K=2$ models before all three methods converged at $K=3$9.


3. Performance Metrics at Max Model Count

MetricMajorityWeightedStacked
Time-to-First-Byte (TTFB, p50) at $K=4$3.2 ms3.2 ms3.4 ms
Expected Calibration Error (ECE) at $K=3$0.6540.6540.333
Macro-F1 at $K=3$0.4000.4000.400

The data suggests that stacked voting is slightly slower in terms of median TTFB at $K=4$ models but provides significantly better calibration101010101010101010.


Would you like to know more about the Stacked meta-learner used in the study or the different input types to the classifier?