Deriving the Kalman Filter in four steps
The Kalman filter
Suppose $\forall t=1,\ldots,T.{\bf x}_{t}\in\mathbb{R}^n, {\bf y}_t\in\mathbb{R}^m$,
$$ \begin{aligned} p({\bf x}_t \vert {\bf x}_{t-1}) &= {\cal N}({\bf x}_t \vert {\bf A}_{t-1}{\bf x}_{t-1}, {\bf Q}_{t-1})\\ p({\bf y}_t \vert {\bf x}_t) &= {\cal N}({\bf y}_t \vert {\bf H}_t{\bf x}_t, {\bf R}_t) \end{aligned} $$then, the Kalman filter equations are given by
$$ \begin{aligned} p({\bf x}_t \vert {\bf y}_{1:t-1}) &= {\cal N}({\bf x}_t \vert \bar{\bf m}_t,\bar{\bf P}_t) &\quad \text{(Predict)}\\ p({\bf x}_t \vert {\bf y}_{1:t}) &= {\cal N}({\bf x}_t \vert {\bf m}_t,{\bf P}_t) &\quad \text{(Update)} \end{aligned} $$where the prediction-step equations are given by
$$ \begin{aligned} \bar{\bf m}_t &= {\bf A}_{t-1}{\bf m}_{t-1}\\ \bar{\bf P}_{t-1} &= {\bf A}_{t-1}{\bf P}_{t-1}{\bf A}_{t-1}^\intercal + {\bf Q}_{t-1} \end{aligned} $$and the update-step equations are given by
$$ \begin{aligned} {\bf e}_t &= {\bf y}_t - {\bf H}_t\bar{\bf m}_t\\ {\bf S}_t &= {\bf H}_t\bar{\bf P}_t{\bf H}_{t}^\intercal + {\bf R}_t\\ {\bf K}_t &= \bar{\bf P}_t{\bf H}_{t}^\intercal{\bf S}_t^{-1}\\ \\ {\bf m}_t &= \bar{\bf m}_t + {\bf K}_t{\bf e}_t\\ {\bf P}_t &= \bar{\bf P}_t - {\bf K}_t{\bf S}_t {\bf K}_t^\intercal \end{aligned} $$Sketch of Proof
The sketch of the proof follows 4 steps:
- Estimate $p({\bf x}_t, {\bf x}_{t-1} \vert {\bf y}_{1:t-1})$— Join
- Estimate $p({\bf x}_t \vert {\bf y}_{1:t-1}) = {\cal N}({\bf x}_{t} \vert \bar{\bf m}_t, \bar{\bf P}_t)$ — Marginalise
- Estimate $p({\bf x}_t, {\bf y}_t \vert {\bf y}_{1:t-1})$ using 2 — Join
- Estimate $p({\bf x}_t \vert {\bf y}_{1:t}) = {\cal N}({\bf x}_t \vert {\bf m}_t, {\bf P}_t)$ — Condition
Steps 1 through 3 follow from Lemma A1. Step 4 follows from Lemma A2.
Proof
Step 1 — $p({\bf x}_t, {\bf x}_{t-1} \vert {\bf y}_{1:t-1})$
Using Lemma A1, we write
$$ \begin{aligned} p({\bf x}_t, {\bf x}_{t-1} \vert {\bf y}_{1:t-1}) &= p({\bf x}_{t-1} \vert {\bf y}_{1:t-1})p({\bf x}_t \vert {\bf x}_{t-1})\\ &={\cal N}({\bf x}_{t-1} \vert {\bf m}_{t-1}, {\bf P}_{t-1}) p({\bf x}_{t}\vert{\bf A}_{t-1}{\bf x}_{t-1}, {\bf Q}_t)\\ &= {\cal N}\left( \begin{bmatrix} {\bf m}_{t-1}\\ {\bf A}_{t-1}{\bf m}_{t-1} \end{bmatrix}, \begin{bmatrix} {\bf P}_{t-1} & {\bf P}_{t-1}{\bf A}_{t-1}^\intercal\\ {\bf A}_{t-1}{\bf P}_{t-1} & {\bf A}_{t-1}{\bf P}_{t-1}{\bf A}_{t-1}^\intercal + {\bf Q}_{t-1} \end{bmatrix} \right) \end{aligned} $$Step 2 — $p({\bf x}_t \vert {\bf y}_{1:t-1})$
Using Lemma A1, we integrate ${\bf x}_{t-1}$ to obtain
$$ \begin{aligned} p({\bf x}_{t} \vert {\bf y}_{1:t-1}) &= {\cal N}({\bf A}_{t-1}{\bf m}_{t-1}, {\bf A}_{t-1}{\bf P}_{t-1}{\bf A}_{t-1}^\intercal + {\bf Q}_{t-1})\\ &= p(\bar{\bf m}_{t-1}, \bar{\bf P}_{t-1}) \end{aligned} $$where
- $\bar{\bf m}_{t-1} = {\bf A}_{t-1}{\bf m}_{t-1}$
- $\bar{\bf P}_{t-1} = {\bf A}_{t-1}{\bf P}_{t-1}{\bf A}_{t-1}^\intercal + {\bf Q}_{t-1}$
Step 3 — $p({\bf x}_t, {\bf y}_t \vert {\bf y}_{1:t-1})$
Having $p({\bf x}_t \vert {\bf y}_{1:t-1})$ and using Lemma A1 we obtain
$$ \begin{aligned} p({\bf y}_t, {\bf x}_{t} \vert {\bf y}_{1:t}) &= p({\bf x}_t| {\bf y}_{1:t-1})p({\bf y}_t \vert {\bf x}_t)\\ &= {\cal N}({\bf x}_t \vert \bar{\bf m}_t, \bar{\bf P}_t){\cal N}({\bf y}_t \vert {\bf H}_t{\bf y}_t, {\bf R}_t)\\ &= {\cal N}\left( \begin{bmatrix} {\bf x}_t\\ {\bf y}_t \end{bmatrix} {\huge\vert} \begin{bmatrix} \bar{\bf m}_t\\ {\bf H}_t {\bf y}_t \end{bmatrix}, \begin{bmatrix} \bar{\bf P}_t & \bar{\bf P}_t{\bf H}_t^\intercal\\ {\bf H}_t\bar{\bf P}_t & {\bf H}_t\bar{\bf P}_t{\bf H}_t^\intercal + {\bf R}_t \end{bmatrix} \right) \end{aligned} $$Step 4 — $p({\bf x}_t \vert {\bf y}_{1:t})$
Having $p({\bf x}_t, {\bf y}_t \vert {\bf y}_{1:t-1})$ and using Lemma A2, we obtain
$$ \begin{aligned} p({\bf x}_t \vert {\bf y}_{1:t}) &= {\cal N}({\bf x}_t \vert \bar{\bf m}_t + {\bf K}_t[{\bf y}_t - {\bf H}_t\bar{\bf m}_t], \bar{\bf P}_t - {\bf K}_t{\bf S}_t{\bf K}_t^\intercal)\\ &= {\cal N}({\bf x}_t \vert {\bf m}_t, {\bf P}_t) \end{aligned} $$where
- ${\bf S}_t = {\bf H}_t\bar{\bf P}_t{\bf H}_t^\intercal + {\bf R}_t$
- ${\bf K}_t = \bar{\bf P}_t{\bf H}_t^\intercal{\bf S}_t^{-1}$
- ${\bf m}_t = {\bf x}_t \vert \bar{\bf m}_t + {\bf K}_t[{\bf y}_t - {\bf H}_t\bar{\bf m}_t]$
- ${\bf P}_t = \bar{\bf P}_t - {\bf K}_t{\bf S}_t{\bf K}_t^\intercal$
Lemmas
Lemma A1
Suppose ${\bf x}\in\mathbb{R}^n$ and ${\bf y}\in\mathbb{R}^m$ are random variables such that
$$ \begin{aligned} {\bf x}&\sim{\cal N}({\bf m}, {\bf P})\\ {\bf y}\vert{\bf x} &\sim{\cal N}({\bf Hx} + {\bf u}, {\bf R}) \end{aligned} $$then, the joint distribution for $({\bf x}, {\bf y})$ is given by
$$ \begin{pmatrix} {\bf x}\\ {\bf y} \end{pmatrix} \sim {\cal N}\left( \begin{bmatrix} {\bf m}\\ {\bf Hm} + {\bf u} \end{bmatrix}, \begin{bmatrix} {\bf P} & {\bf PH}^\intercal\\ {\bf HP} & {\bf HPH}^\intercal + {\bf R} \end{bmatrix} \right) $$and the marginal distribution for ${\bf y}$ is given by
$$ {\bf y}\sim{\cal N}\left({\bf Hm} + {\bf u}, {\bf HPH}^\intercal + {\bf R}\right) $$Lemma A2
Suppose ${\bf x}\in\mathbb{R}^n$ and ${\bf y}\in\mathbb{R}^m$ have joint Gaussian distribution of the form
$$ \begin{pmatrix} {\bf x}\\ {\bf y} \end{pmatrix} \sim {\cal N}\left( \begin{bmatrix} {\bf a}\\ {\bf b} \end{bmatrix}, \begin{bmatrix} {\bf A} & {\bf C}^\intercal\\ {\bf C} & {\bf B} \end{bmatrix} \right) $$then
$$ \begin{aligned} {\bf x}&\sim{\cal N}({\bf a}, {\bf A})\\ {\bf y}&\sim{\cal N}({\bf b}, {\bf B})\\ {\bf x} | {\bf y} &\sim {\cal N}({\bf a} + {\bf CB}^{-1}({\bf y} - {\bf b}), {\bf A} - {\bf CB}^{-1}{\bf C}^\intercal)\\ {\bf y}\vert {\bf x} &\sim {\cal N}({\bf b} + {\bf C}^\intercal{\bf A}^{-1}({\bf x} - {\bf a}), {\bf B} - {\bf C}^\intercal{\bf A}{\bf C}) \end{aligned} $$References
- Särkkä, S. (2013). Bayesian Filtering and Smoothing (Institute of Mathematical Statistics Textbooks). Cambridge: Cambridge University Press. doi:10.1017/CBO9781139344203