Viterbi Decoding Algorithm: Complete Technical Explanation

1. What Is Viterbi Decoding?

Viterbi is a dynamic programming algorithm that finds the most likely sequence of hidden states in a Hidden Markov Model (HMM) given an observation sequence.

In your RF-to-speech pipeline:

Hidden states = words (sierra, charlie, bravo)

Observations = RF-inferred neural features $ x_t $

Goal = reconstruct inner speech from noisy RF surrogates

2. Why Not Brute Force?

States: $ N = 50 $ words (NATO lexicon)
Sequence length: $ T = 30 $ frames
Total sequences: $ N^T = 50^{30} \approx 10^{51} $

Viterbi: $ O(T N^2) $ → polynomial, feasible

3. HMM Components (Your Paper)

Component	Formula	Meaning
Emission	$ p(x_t \| w_t) = \mathcal{N}(x_t; \mu_{w_t}, \Sigma) $	How likely is $ x_t $ given word $ w_t $?
Transition	$ p(w_t \| w_{t-1}) = \pi_{w_{t-1}, w_t} $	Language prior (bigram or GPT-style)
Joint	$ P(w_{1:T}, x_{1:T}) = p(w_1) \prod_{t=2}^T p(w_t \| w_{t-1}) \cdot p(x_t \| w_t) $	Full probability

4. Viterbi Step-by-Step (With Example)

Input:

$ x_{1:30} $ = RF features
True utterance: sierra charlie bravo

Step 1: Initialization ($ t = 1 $)

For each word $ w $:
$$
V_1(w) = p(w) \cdot p(x_1 | w)
$$
$$
\psi_1(w) = 0
$$

$ V_t(w) $ = max probability of being in $ w $ at time $ t $
$ \psi_t(w) $ = best previous word

Step 2: Recursion ($ t = 2 $ to $ T $)

For each word $ w $ at time $ t $:
$$
V_t(w) = \max_{w’} \left[ V_{t-1}(w’) \cdot \pi_{w’,w} \cdot p(x_t | w) \right]
$$
$$
\psi_t(w) = \arg\max_{w’} \left[ V_{t-1}(w’) \cdot \pi_{w’,w} \right]
$$

Key Insight:

At each step, keep only the best path to each state → prune $ N^T $ to $ T N $

Step 3: Termination

Best path probability:
$$
P^* = \max_w V_T(w)
$$
Final state:
$$
q_T^* = \arg\max_w V_T(w)
$$

Step 4: Backtracking

$$
q_t^* = \psi_{t+1}(q_{t+1}^*), \quad t = T-1, \dots, 1
$$

→ Output: sierra → charlie → bravo

5. Visual Example (Your Fig. 1)

graph TD
    subgraph "Frame 1–5"
        A[sierra] --> B[charlie]
        A --> C[delta]
    end
    subgraph "Frame 6–10"
        B --> D[charlie]
        C --> E[papa]
    end
    subgraph "Frame 11–15"
        D --> F[bravo]
        E --> G[quebec]
    end

    style A fill:#90EE90
    style B fill:#90EE90
    style D fill:#90EE90
    style F fill:#90EE90
    style C fill:#FFB6C1
    style E fill:#FFB6C1
    style G fill:#FFB6C1

Green path = Viterbi path (highest $ V_t $)
Red paths = pruned (lower probability)

6. Why Language Priors Matter (Your Fig. 2)

SNR	No Prior WER	Bigram WER	GPT-style WER
10 dB	2.8%	1.6%	1.1%

GPT prior $ p(\text{charlie} | \text{sierra}) $ high → boosts $ V_t(\text{charlie}) $
No prior → all transitions equal → framewise confusion

7. Viterbi in Your Code

def viterbi(obs, mu, Sigma, trans_matrix):
    T, _ = obs.shape
    N = len(mu)
    V = np.zeros((T, N))
    psi = np.zeros((T, N), dtype=int)

    # Init
    emit = [multivariate_normal.pdf(obs[0], mu[i], Sigma) for i in range(N)]
    V[0] = emit
    psi[0] = 0

    # Recursion
    for t in range(1, T):
        for j in range(N):
            probs = V[t-1] * trans_matrix[:, j]
            emit_prob = multivariate_normal.pdf(obs[t], mu[j], Sigma)
            V[t, j] = np.max(probs) * emit_prob
            psi[t, j] = np.argmax(probs)

    # Backtrack
    path = [np.argmax(V[-1])]
    for t in range(T-1, 0, -1):
        path.append(psi[t, path[-1]])
    return path[::-1]

8. Connection to FFT Triage

graph LR
    A[IQ] --> B[FFT Triage]
    B --> C[Confidence c]
    C --> D[\hat{q}]
    D --> E[SNR_est]
    E --> F[Set σ² in HMM]
    F --> G[Viterbi Decoder]
    G --> H[Word Sequence]

Low $ \hat{q} $ → high $ \sigma^2 $ → rely more on transition prior
High $ \hat{q} $ → low $ \sigma^2 $ → trust emission likelihood

9. Key Takeaways

Concept	Your Paper
Dynamic Programming	Avoids $ N^T $ search
Transition Prior	GPT-style → 60% WER reduction
Emission Model	Gaussian on RF features
Output	Word sequence + posterior traces

10. LaTeX Figure for Your Paper

\begin{figure}[t]
\centering
\begin{mermaid}
graph LR
    subgraph t=1
        A[sierra] -->|V=0.8| B[sierra]
    end
    subgraph t=2
        B -->|π(charlie|sierra)| C[charlie]
        B -->|π(delta|sierra)| D[delta]
    end
    style C fill:#90EE90
    style D fill:#FFB6C1
\end{mermaid}
\caption{Viterbi path (green) vs pruned path (red). GPT prior favors "charlie" after "sierra".}
\label{fig:viterbi}
\end{figure}

Bottom Line

Viterbi = “best path” in a trellis of word hypotheses
Language prior = GPS for noisy RF neural data
Your system: 1.5 ms FFT → \hat{q} → SNR → Viterbi → 1.1% WER