Confidence Calibration for Weighted Voting in RF Ensembles

Open-Set Handling in RF Ensembles: Thresholding, Abstention, and OSCR Analysis

We investigate post-softmax calibration for weighted
ensemble voting in RF signal classification. Neural network confidence scores are often miscalibrated, leading to overconfident
predictions that degrade ensemble performance. Using per-model
temperature scaling, we reduce Expected Calibration Error
(ECE) from 15.4% to 4.2% (73% improvement) and improve
utility (accuracy × coverage) from 65.6% to 71.7% (+9.3%)
at τ = 0.6 with <0.1ms inference overhead. The approach
integrates directly into existing ensemble probability paths and
supports reproducible evaluation via synthetic or NPZ datasets.1

Ensemble methods for RF signal classification combine
predictions from multiple neural networks to achieve superior
accuracy over individual models. However, modern neural networks often exhibit poor calibration—their confidence scores
do not reflect actual prediction accuracy [1]. This miscalibration becomes particularly problematic in weighted ensemble
voting, where model probabilities directly influence the final
decision.
We address confidence calibration in RF ensemble classifiers through temperature scaling applied to individual model
logits before weighted aggregation. Our contributions include:
(1) systematic measurement of calibration quality using ECE
and MCE metrics, (2) analysis of how miscalibration affects utility under confidence-based abstention, (3) temperature
scaling optimization for ensemble probability paths, and (4)
integration hooks for production RF classification systems.

Confidence Calibration for Weighted Voting in RF Ensembles

Leave a Reply Cancel reply