Radio-frequency (RF) neuromodulation systems require careful balance between therapeutic efficacy and patient
safety, particularly regarding specific absorption rate (SAR)
exposure limits. This paper presents a constrained reinforcement
learning approach for closed-loop beamforming that minimizes
SAR while maintaining neuromodulation utility. Using primaldual optimization, our method learns policies that respect safety
budgets through adaptive Lagrange multipliers. Experimental
results demonstrate that beamforming with learned constraints
reduces SAR by up to 40% compared to omnidirectional transmission while preserving 85% of maximum ratio transmission
(MRT) utility. We derive safety envelopes showing the fundamental tradeoff between SAR constraints and achievable performance, providing guidance for clinical safety protocol design. The
approach generalizes to multi-beam arrays and complex tissue
models, enabling practical deployment in therapeutic RF systems.