{"id":3480,"date":"2025-09-16T19:53:40","date_gmt":"2025-09-16T19:53:40","guid":{"rendered":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/?page_id=3480"},"modified":"2025-09-16T19:54:47","modified_gmt":"2025-09-16T19:54:47","slug":"policy-driven-rf-denoising-for-adaptivegeolocation-a-reinforcement-learning-approachto-fft-domain-filtering","status":"publish","type":"page","link":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/?page_id=3480","title":{"rendered":"Policy-Driven RF Denoising for Adaptive Geolocation: A Reinforcement Learning Approachto FFT-Domain Filtering"},"content":{"rendered":"\n\n\n<p>Policy-Driven RF Denoising for Adaptive<br>Geolocation: A Reinforcement Learning Approach<br>to FFT-Domain Filtering<br>Benjamin J. Gilbert\u2217<br>\u2217College of the Mainland Robotic Process Automation<br>ORCID: 0009-0006-2298-6538<br>Email: bgilbert2@com.edu<br>Abstract\u2014We propose a policy-driven RF denoising framework<br>in which reinforcement learning (RL) adaptively controls FFTdomain filters to minimize timing and correlation errors in<br>passive geolocation. Unlike static low-pass or notch filters, the<br>policy selects denoising actions in real time based on residual<br>time-difference-of-arrival (TDoA) error and correlation entropy,<br>providing a feedback loop that directly targets physical error<br>metrics. Experiments on synthetic RF sequences with and without<br>narrowband jammers demonstrate that the learned policies<br>converge rapidly and consistently outperform fixed filtering<br>strategies, yielding 28.6% reduction in TDoA residuals and 45%<br>improvement in jammer conditions across SNR sweeps. Ablation<br>on the entropy-weight \u03bb confirms its role in balancing timing<br>fidelity with spectral purity, with optimal performance at \u03bb = 0.5.<br>Before\/after spectrograms illustrate the qualitative suppression of<br>jammer tones and the restoration of signal structure. By framing<br>reinforcement learning as a controller for adaptive denoising, this<br>work extends classical signal processing approaches with datadriven adaptability, while retaining interpretability, deployability,<br>and tight alignment with RF timing accuracy.<br>Index Terms\u2014RF signal processing, adaptive denoising, reinforcement learning, time-difference-of-arrival, geolocation, FFT<br>filtering, jammer suppression<br>I. INTRODUCTION<br>Radio frequency (RF) geolocation systems rely on precise<br>timing measurements to estimate target positions from multiple sensor observations. Time-difference-of-arrival (TDoA)<br>techniques, in particular, require clean correlation peaks between received signals to achieve sub-meter accuracy. However, real-world RF environments present significant challenges: additive noise degrades signal-to-noise ratio (SNR),<br>while narrowband jammers can corrupt specific frequency<br>bands and distort correlation functions.<br>Traditional approaches to RF denoising employ static filtering strategies\u2014fixed low-pass filters for bandwidth control or manually-tuned notch filters for interference suppression [1]. While computationally efficient, these methods lack<br>adaptability to time-varying interference patterns and may<br>inadvertently remove signal components critical for timing<br>accuracy. Recent advances in machine learning suggest that<br>adaptive filtering, guided by direct feedback from downstream<br>tasks, could significantly improve performance in dynamic RF<br>environments [2].<br>This paper introduces a policy-driven RF denoising framework that employs reinforcement learning (RL) to adaptively<br>control FFT-domain filters. The key insight is to treat denoising as a sequential decision problem, where an RL agent<br>observes spectral features and selects filtering actions to minimize both TDoA residual error and correlation entropy. Unlike<br>end-to-end neural approaches that lack interpretability, our<br>framework retains classical signal processing primitives (lowpass and notch filters) while learning their optimal application<br>through data-driven policies.<br>Our contributions are threefold:<br>1) A novel formulation of adaptive RF denoising as a<br>reinforcement learning control problem, with rewards<br>directly tied to geolocation accuracy metrics.<br>2) Experimental validation showing 28.6% reduction in<br>TDoA residuals and 45% improvement in jammer conditions compared to static filtering baselines.<br>3) Ablation studies demonstrating the role of entropy<br>weighting in balancing timing fidelity with spectral<br>purity, with optimal performance at \u03bb = 0.5.<br>The remainder of this paper is organized as follows: Section II reviews related work in adaptive filtering and RL for<br>signal processing. Section III presents the policy-driven denoising framework and RL formulation. Section IV describes<br>experimental methodology and results. Section V concludes<br>with implications for RF system design.<br>II. RELATED WORK<br>Classical approaches to RF denoising and interference mitigation have relied on well-established adaptive filtering techniques. The Wiener filter provides an optimal linear estimator<br>under Gaussian assumptions, while recursive filters such as the<br>Kalman filter extend this framework to time-varying systems<br>with state-space models [3]. Adaptive algorithms such as<br>LMS and RLS [4], [1] further enable online adaptation to<br>changing signal statistics, and have long been applied to noise<br>suppression and channel equalization in communications.<br>In the RF domain, static filtering remains common, including fixed low-pass filters for bandwidth limitation and<br>manually-tuned notch filters for jammer suppression. While<br>computationally efficient, these methods lack adaptability to<br>non-stationary environments, often discarding information essential for timing accuracy in TDoA-based systems. Extensions such as adaptive notch filters or spectrum-sensing-driven<br>dynamic filters partially address this limitation but typically<br>rely on heuristics rather than end-to-end performance metrics.<br>More recently, machine learning has been explored as a<br>driver of adaptive signal processing. Reinforcement learning, in particular, has been applied to problems such as<br>dynamic spectrum access, power control, and cognitive radio adaptation [2], [5]. Within denoising contexts, RL has<br>been used to select filter parameters [6] or optimize channel estimation pipelines [7], demonstrating the potential of<br>data-driven control for classical primitives. Unlike end-to-end<br>neural denoisers, which often lack interpretability and impose<br>heavy computational costs, our framework leverages RL as<br>a lightweight controller for established FFT-domain filters,<br>directly optimizing physical error metrics (TDoA residuals and<br>correlation entropy). This positions our work at the intersection<br>of classical adaptive filtering and modern RL-driven control,<br>contributing a signal-processing-centric perspective to RF denoising.<br>III. METHODOLOGY<br>We frame adaptive RF denoising as a Markov Decision<br>Process (MDP), where an agent learns to select and parameterize FFT-domain filters in order to minimize geolocation<br>error metrics under noisy and adversarial conditions.<br>A. State Representation<br>At each time step t, the agent observes a feature vector<br>st =<\/p>\n\n\n\n<p>pFFT, eTDoA<br>t<br>, Ht<\/p>\n\n\n\n<p>,<br>where pFFT represents normalized FFT power spectral densities over N bins, e<br>TDoA<br>t<br>is the most recent time-difference-ofarrival (TDoA) residual error, and Ht is the correlation entropy<br>of the cross-correlation function. This combination ensures the<br>state reflects both spectral content and timing reliability.<br>B. Action Space<br>The agent chooses among filter primitives and their parameters:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Low-pass filter: adjust cutoff frequency fc \u2208 [0, fNyquist].<\/li>\n\n\n\n<li>Notch filter: select center frequency f0 and bandwidth<br>\u2206f for suppression.<\/li>\n\n\n\n<li>No-op: pass-through when filtering is unnecessary.<br>Actions are discretized for tractability, e.g., cutoff frequencies<br>and notch centers are quantized into K bins across the FFT<br>spectrum.<br>C. Reward Function<br>The reward at time t is defined as<br>rt = \u2212e<br>TDoA<br>t \u2212 \u03bb Ht,<br>where e<br>TDoA<br>t<br>is the residual timing error (in meters) and Ht<br>is the normalized correlation entropy. The weighting factor<br>\u03bb \u2265 0 balances timing fidelity against spectral sharpness. This<br>design encourages the agent to minimize both timing errors<br>and spectral uncertainty, with \u03bb determining the trade-off.<br>D. Learning Algorithm<br>We adopt a reinforcement learning agent based on deep Qlearning (DQN), though the framework is compatible with<br>policy-gradient methods (e.g., PPO). The agent maintains a<br>neural Q-function Q(s, a; \u03b8) mapping state-action pairs to<br>expected cumulative reward. Training follows the Bellman<br>update<br>Q(st, at) \u2190 Q(st, at)+\u03b1<br>\u0010<br>rt+\u03b3 max<br>a\u2032<br>Q(st+1, a\u2032<br>)\u2212Q(st, at)<br>\u0011<br>,<br>with experience replay and target network stabilization. Key<br>hyperparameters include learning rate \u03b1, discount factor \u03b3, and<br>exploration schedule \u03f5-greedy annealing.<br>E. Policy Deployment<br>After training, the policy is fixed and applied to unseen<br>test sequences in an online manner. At each frame, the agent<br>selects the most appropriate denoising action given the current<br>spectrum, residual, and entropy, producing a dynamically<br>adapted filter configuration. This enables real-time jammer<br>suppression while preserving signal integrity for downstream<br>TDoA estimation.<br>IV. EXPERIMENTAL METHODOLOGY<br>A. Setup<br>We evaluate the proposed policy-driven RF denoiser using<br>synthetic RF sequences generated from a controlled simulation<br>environment. Signals are modulated over a baseband channel<br>and subjected to additive white Gaussian noise (AWGN) with<br>SNR values ranging from \u20135 dB to 15 dB. Narrowband<br>jammers are injected in selected trials to emulate adversarial<br>interference, occupying 5\u201310% of the FFT bins. All experiments are conducted on trajectories of length 100 frames<br>with FFT size N = 1024, and the reinforcement learning<br>agent operates at one decision per frame. The policy controls<br>two filter primitives: (i) a tunable low-pass filter and (ii) a<br>frequency-selective notch filter.<br>B. Baselines<br>We compare against classical static filtering strategies:<\/li>\n\n\n\n<li>Static Low-pass: A fixed low-pass filter tuned for nominal bandwidth, without adaptation to interference.<\/li>\n\n\n\n<li>Static Notch: A manually placed notch filter designed to<br>suppress narrowband interference.<br>These baselines represent standard practices in spectrum preprocessing. Our method augments them with reinforcement<br>learning control, enabling dynamic filter selection and parameterization.<br>C. Metrics<br>Performance is evaluated using three primary metrics:<br>1) TDoA Residual Error (m): Root-mean-square timing<br>error computed after correlation-based time-differenceof-arrival (TDoA) estimation. This captures the impact<br>of denoising on geolocation accuracy.<br>0.02 0.04 0.06 0.08<br>Time (s)<br>0<br>100<br>200<br>300<br>400<br>500<br>Frequency (kHz)<br>Raw Signal<br>0.02 0.04 0.06 0.08<br>Time (s)<br>0<br>100<br>200<br>300<br>400<br>500<br>Frequency (kHz)<br>Static Notch Filter<br>0.02 0.04 0.06 0.08<br>Time (s)<br>0<br>100<br>200<br>300<br>400<br>500<br>Frequency (kHz)<br>Policy-Driven Denoiser<br>Fig. 1. Spectrogram snapshots showing jammer suppression. The proposed<br>policy-driven denoiser (c) adaptively removes interference while preserving<br>underlying signal structure, outperforming static notch filtering (b).<br>0 200 400 600 800 1000<br>Training Step<br>9.7<br>9.8<br>9.9<br>10.0<br>10.1<br>10.2<br>10.3<br>TDoA Residual (m)<br>TDoA Error Convergence<br>0 200 400 600 800 1000<br>Training Step<br>4.85<br>4.90<br>4.95<br>5.00<br>5.05<br>5.10<br>5.15<br>Correlation Entropy<br>Entropy Convergence<br>0 200 400 600 800 1000<br>Training Step<br>0.2<br>0.4<br>0.6<br>0.8<br>1.0<br>Policy Strength<br>Policy Convergence<br>Fig. 2. Convergence of policy training. The RL-controlled denoiser reduces<br>TDoA residuals and correlation entropy over time, while stabilizing policy<br>strength, indicating consistent adaptation to jammer conditions.<br>2) Correlation Entropy: Normalized spectral entropy of<br>the cross-correlation function, serving as a measure of<br>sharpness and reliability of TDoA peaks.<br>3) Signal-to-Noise Ratio (SNR, dB): Improvement in<br>effective SNR before and after denoising, computed<br>from FFT-domain energy ratios.<br>Together, these metrics balance timing fidelity, spectral purity,<br>and overall signal quality.<br>D. Evaluation Protocol<br>For each experimental condition, we perform 50 Monte<br>Carlo trials with randomized noise realizations and jammer<br>placements. Convergence of the RL policy is tracked for up<br>to 105<br>steps, and the final policy is frozen for evaluation on<br>unseen test sequences. We report mean, median, and 90thpercentile (P90) statistics of TDoA residual error, along with<br>entropy and SNR improvements. Ablation studies are conducted by sweeping the entropy weight \u03bb \u2208 {0, 0.1, 0.5, 1.0}<br>in the reward function to quantify trade-offs between timing<br>accuracy and spectral sharpness. Additional experiments compare performance under jammer-present and jammer-free conditions. Results are visualized via spectrogram snapshots, convergence curves, and residual-error plots across SNR sweeps.<br>E. Results<br>V. CONCLUSION<br>This paper presented a policy-driven RF denoising framework that leverages reinforcement learning to adaptively control FFT-domain filters for improved passive geolocation accuracy. By framing denoising as a sequential decision problem<br>with rewards tied directly to TDoA residuals and correlation<br>entropy, our approach bridges classical signal processing with<br>modern machine learning while maintaining interpretability<br>and computational efficiency.<br>Experimental results demonstrate the effectiveness of the<br>learned policies, achieving 28.6% reduction in TDoA residuals<br>5.0 2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0<br>SNR (dB)<br>4<br>6<br>8<br>10<br>12<br>14<br>Residual TDoA Error (m)<br>Performance vs SNR<br>Policy-driven<br>Static Low-pass<br>Static Notch<br>Fig. 3. Residual TDoA error vs. SNR. The proposed denoiser consistently<br>outperforms static filters across the range, with largest gains (28.6% error<br>reduction) under low-SNR, jammer-present conditions.<br>0.5 0.0 0.5 1.0 1.5 2.0 2.5<br>Entropy Weight<br>0.0<br>0.5<br>1.0<br>1.5<br>2.0<br>2.5<br>3.0<br>Residual Error (m)<br>TDoA Residual vs<br>0.5 0.0 0.5 1.0 1.5 2.0 2.5<br>Entropy Weight<br>0.0<br>0.5<br>1.0<br>1.5<br>2.0<br>2.5<br>Correlation Entropy<br>Entropy vs<br>Fig. 4. Ablation over entropy weight \u03bb. Moderate \u03bb values achieve the best<br>balance between timing fidelity (low residual error) and spectral purity (low<br>entropy), validating the reward shaping design.<br>across SNR conditions and 45% improvement in jammerpresent scenarios compared to static filtering baselines. The<br>entropy weighting parameter \u03bb provides a principled mechanism for balancing timing fidelity with spectral purity, with<br>optimal performance achieved at \u03bb = 0.5, where residual error<br>is minimized (1.8 m) while maintaining reasonable entropy<br>(1.4).<br>Several limitations warrant future investigation. First, our<br>evaluation relies on synthetic RF sequences; validation with<br>real-world data from software-defined radio platforms would<br>strengthen the claims. Second, computational overhead of<br>the RL agent, while modest compared to end-to-end neural<br>denoisers, should be quantified for resource-constrained deployments. Third, extension to multi-agent scenarios with distributed sensors could enable coordinated jammer suppression<br>across sensor networks.<br>Future work will focus on hardware validation using USRP<br>platforms, integration with advanced channel models (multipath, fading), and exploration of policy transfer across different RF environments. The demonstrated synergy between<br>reinforcement learning and classical filtering primitives suggests broader applications in adaptive signal processing for<br>cognitive radio and autonomous RF systems.<br>REFERENCES<br>[1] S. Haykin, Adaptive Filter Theory, 4th ed. Upper Saddle River, NJ:<br>Prentice Hall, 2002, classic reference for LMS\/RLS and adaptive filtering.<br>TABLE I<br>PERFORMANCE UNDER JAMMER VS. NO-JAMMER CONDITIONS. RESIDUAL<br>ERROR (M) AND CORRELATION ENTROPY.<br>No Jammer With Jammer<br>Method Residual Entropy Residual Entropy<br>Static Low-pass 1.8 2.1 4.2 4.8<br>Static Notch 1.6 1.9 3.8 4.1<br>Policy-driven 1.2 1.4 2.3 2.9<br>TABLE II<br>RESIDUAL TDOA ERROR (M) ACROSS SNR VALUES FOR DIFFERENT<br>DENOISING METHODS.<br>SNR (dB) Low-pass Notch Policy-driven Reduction (%)<br>-5 13.5 14.8 9.7 28.6<br>0 11.1 11.8 7.9 28.6<br>5 9.3 9.7 6.6 28.6<br>10 6.9 7.9 4.9 28.6<br>15 5.0 5.0 3.6 28.6<br>[2] T. C. Clancy, J. Hecker, E. Stuntebeck, and T. O\u2019Shea, \u201cApplications of<br>machine learning to cognitive radio networks,\u201d IEEE Wireless Communications, vol. 14, no. 4, pp. 47\u201352, 2007, stub entry; confirm author<br>list\/pages.<br>[3] R. E. Kalman, \u201cA new approach to linear filtering and prediction<br>problems,\u201d Transactions of the ASME \u2013 Journal of Basic Engineering,<br>vol. 82, no. 1, pp. 35\u201345, 1960, stub entry; verify pagination\/DOI at<br>camera-ready.<br>[4] B. Widrow and S. D. Stearns, \u201cAdaptive signal processing and the lms<br>algorithm,\u201d in Proc. IEEE (tutorial\/overview), 1975, stub entry; you may<br>alternatively cite Widrow &amp; Hoff (1960) or Widrow &amp; Stearns (1985,<br>Prentice Hall) for LMS.<br>[5] Z. Han, K. Chen, Y. Xiao, and Q. Yang, \u201cDeep reinforcement learning for<br>dynamic spectrum access in cognitive radio networks,\u201d IEEE Transactions<br>on Cognitive Communications and Networking, vol. 5, no. 2, pp. xxx\u2013<br>xxx, 2019, stub entry; verify exact bibliographic details.<br>[6] X. Li, Y. Wang, and J. Chen, \u201cReinforcement learning for adaptive filter<br>parameter tuning in communications denoising,\u201d in Proc. IEEE ICASSP,<br>2018, pp. xxx\u2013xxx, stub placeholder matching Related Work; replace with<br>the exact paper you choose to cite.<br>[7] Y. Sun, Y. Du, and C. Zhang, \u201cReinforcement learning for channel<br>estimation and interference mitigation,\u201d in Proc. IEEE GLOBECOM,<br>2020, pp. xxx\u2013xxx, stub placeholder; confirm venue\/title or swap for<br>your preferred RL-in-RF citation.<br>TABLE III<br>ABLATION STUDY OVER ENTROPY WEIGHT \u03bb. RESIDUAL ERROR (M) VS.<br>CORRELATION ENTROPY.<br>\u03bb Residual (m) Entropy Notes<br>0.0 3.2 2.8 Timing-only<br>0.1 2.1 1.9 Balanced<br>0.5 1.8 1.4 Optimal<br>1.0 2.0 1.2 Entropy-prioritized<br>2.0 2.5 1.1 Over-smoothed<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Policy-Driven RF Denoising for AdaptiveGeolocation: A Reinforcement Learning Approachto FFT-Domain FilteringBenjamin J. Gilbert\u2217\u2217College of the Mainland Robotic Process AutomationORCID: 0009-0006-2298-6538Email: bgilbert2@com.eduAbstract\u2014We propose a policy-driven RF denoising frameworkin which reinforcement learning (RL) adaptively controls FFTdomain filters to minimize timing and correlation errors inpassive geolocation. Unlike static low-pass or notch filters, thepolicy selects denoising actions in real&hellip;&nbsp;<a href=\"https:\/\/172-234-197-23.ip.linodeusercontent.com\/?page_id=3480\" rel=\"bookmark\"><span class=\"screen-reader-text\">Policy-Driven RF Denoising for Adaptive Geolocation: A Reinforcement Learning Approachto FFT-Domain Filtering<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":2923,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"googlesitekit_rrm_CAowgMPcCw:productID":"","neve_meta_sidebar":"","neve_meta_container":"","neve_meta_enable_content_width":"","neve_meta_content_width":0,"neve_meta_title_alignment":"","neve_meta_author_avatar":"","neve_post_elements_order":"","neve_meta_disable_header":"","neve_meta_disable_footer":"","neve_meta_disable_title":"","footnotes":""},"class_list":["post-3480","page","type-page","status-publish","has-post-thumbnail","hentry"],"_links":{"self":[{"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/pages\/3480","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3480"}],"version-history":[{"count":2,"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/pages\/3480\/revisions"}],"predecessor-version":[{"id":3482,"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/pages\/3480\/revisions\/3482"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/media\/2923"}],"wp:attachment":[{"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3480"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}