RL-Driven RF Neuromodulation

We train a DQN over power, frequency, phase,angle to maximize a target-state proxy while penalizing SAR.Compared to a hand-tuned schedule baseline, our agent improvesevaluation return by 25 % with median episode return 100, andreduces state reconstruction error to 0.05. Plots and captionsauto-sync from logs.