1. What Is Viterbi Decoding?
Viterbi is a dynamic programming algorithm that finds the most likely sequence of hidden states in a Hidden Markov Model (HMM) given an observation sequence.
In your RF-to-speech pipeline:
- Hidden states = words (
sierra,charlie,bravo)- Observations = RF-inferred neural features $ x_t $
- Goal = reconstruct inner speech from noisy RF surrogates
2. Why Not Brute Force?
- States: $ N = 50 $ words (NATO lexicon)
- Sequence length: $ T = 30 $ frames
- Total sequences: $ N^T = 50^{30} \approx 10^{51} $
Viterbi: $ O(T N^2) $ → polynomial, feasible
3. HMM Components (Your Paper)
| Component | Formula | Meaning |
|---|---|---|
| Emission | $ p(x_t | w_t) = \mathcal{N}(x_t; \mu_{w_t}, \Sigma) $ | How likely is $ x_t $ given word $ w_t $? |
| Transition | $ p(w_t | w_{t-1}) = \pi_{w_{t-1}, w_t} $ | Language prior (bigram or GPT-style) |
| Joint | $ P(w_{1:T}, x_{1:T}) = p(w_1) \prod_{t=2}^T p(w_t | w_{t-1}) \cdot p(x_t | w_t) $ | Full probability |
4. Viterbi Step-by-Step (With Example)
Input:
- $ x_{1:30} $ = RF features
- True utterance:
sierra charlie bravo
Step 1: Initialization ($ t = 1 $)
For each word $ w $:
$$
V_1(w) = p(w) \cdot p(x_1 | w)
$$
$$
\psi_1(w) = 0
$$
$ V_t(w) $ = max probability of being in $ w $ at time $ t $
$ \psi_t(w) $ = best previous word
Step 2: Recursion ($ t = 2 $ to $ T $)
For each word $ w $ at time $ t $:
$$
V_t(w) = \max_{w’} \left[ V_{t-1}(w’) \cdot \pi_{w’,w} \cdot p(x_t | w) \right]
$$
$$
\psi_t(w) = \arg\max_{w’} \left[ V_{t-1}(w’) \cdot \pi_{w’,w} \right]
$$
Key Insight:
At each step, keep only the best path to each state → prune $ N^T $ to $ T N $
Step 3: Termination
Best path probability:
$$
P^* = \max_w V_T(w)
$$
Final state:
$$
q_T^* = \arg\max_w V_T(w)
$$
Step 4: Backtracking
$$
q_t^* = \psi_{t+1}(q_{t+1}^*), \quad t = T-1, \dots, 1
$$
→ Output: sierra → charlie → bravo
5. Visual Example (Your Fig. 1)
graph TD
subgraph "Frame 1–5"
A[sierra] --> B[charlie]
A --> C[delta]
end
subgraph "Frame 6–10"
B --> D[charlie]
C --> E[papa]
end
subgraph "Frame 11–15"
D --> F[bravo]
E --> G[quebec]
end
style A fill:#90EE90
style B fill:#90EE90
style D fill:#90EE90
style F fill:#90EE90
style C fill:#FFB6C1
style E fill:#FFB6C1
style G fill:#FFB6C1
- Green path = Viterbi path (highest $ V_t $)
- Red paths = pruned (lower probability)
6. Why Language Priors Matter (Your Fig. 2)
| SNR | No Prior WER | Bigram WER | GPT-style WER |
|---|---|---|---|
| 10 dB | 2.8% | 1.6% | 1.1% |
GPT prior $ p(\text{charlie} | \text{sierra}) $ high → boosts $ V_t(\text{charlie}) $
No prior → all transitions equal → framewise confusion
7. Viterbi in Your Code
def viterbi(obs, mu, Sigma, trans_matrix):
T, _ = obs.shape
N = len(mu)
V = np.zeros((T, N))
psi = np.zeros((T, N), dtype=int)
# Init
emit = [multivariate_normal.pdf(obs[0], mu[i], Sigma) for i in range(N)]
V[0] = emit
psi[0] = 0
# Recursion
for t in range(1, T):
for j in range(N):
probs = V[t-1] * trans_matrix[:, j]
emit_prob = multivariate_normal.pdf(obs[t], mu[j], Sigma)
V[t, j] = np.max(probs) * emit_prob
psi[t, j] = np.argmax(probs)
# Backtrack
path = [np.argmax(V[-1])]
for t in range(T-1, 0, -1):
path.append(psi[t, path[-1]])
return path[::-1]
8. Connection to FFT Triage
graph LR
A[IQ] --> B[FFT Triage]
B --> C[Confidence c]
C --> D[\hat{q}]
D --> E[SNR_est]
E --> F[Set σ² in HMM]
F --> G[Viterbi Decoder]
G --> H[Word Sequence]
- Low $ \hat{q} $ → high $ \sigma^2 $ → rely more on transition prior
- High $ \hat{q} $ → low $ \sigma^2 $ → trust emission likelihood
9. Key Takeaways
| Concept | Your Paper |
|---|---|
| Dynamic Programming | Avoids $ N^T $ search |
| Transition Prior | GPT-style → 60% WER reduction |
| Emission Model | Gaussian on RF features |
| Output | Word sequence + posterior traces |
10. LaTeX Figure for Your Paper
\begin{figure}[t]
\centering
\begin{mermaid}
graph LR
subgraph t=1
A[sierra] -->|V=0.8| B[sierra]
end
subgraph t=2
B -->|π(charlie|sierra)| C[charlie]
B -->|π(delta|sierra)| D[delta]
end
style C fill:#90EE90
style D fill:#FFB6C1
\end{mermaid}
\caption{Viterbi path (green) vs pruned path (red). GPT prior favors "charlie" after "sierra".}
\label{fig:viterbi}
\end{figure}
Bottom Line
Viterbi = “best path” in a trellis of word hypotheses
Language prior = GPS for noisy RF neural data
Your system: 1.5 ms FFT → \hat{q} → SNR → Viterbi → 1.1% WER