The weighted-observation likelihood filter (WoLF) is a provably robust variant of the Kalman filter (KF)
in the presence of outliers and misspecified measurement models.
We show its capabilities in
tracking problems (via the KF),
online learning of neural neworks (via the extended KF),
and data assimilation (via the ensemble KF).
with \(\bm\theta_t\in\R^p\) the (latent) state vector, \({\bm y}_t\in\R^d\) the (observed) measurement vector, \({\bf F}_t\in\R^{p\times p}\) the state transition matrix, \({\bf H}_t\in\R^{d\times p}\), \(\bm\phi_t\) a zero-mean Gaussian-distributed random vector with known covariance matrix \({\bf Q}_t\), and \(\bm\varphi_t\) any zero-mean random vector representing the measurement noise.
We determine either \(\mathbb{E}[\bm\theta_t \vert \bm y_{1:t}]\) or \(\mathbb{E}[\bm y_{t+1} \vert \bm y_{1:t}]\) recursively
by applying the predict and modified update equations.
Definition
The posterior influence function (PIF) is
$$
\text{PIF}(\bm y_t^c, \bm y_{1:t-1}) = \text{KL}(
q(\bm\theta_t \vert \bm y_t^c, \bm y_{1:t-1})\,\|\,
q(\bm\theta_t \vert \bm y_t, \bm y_{1:t-1})\,\|\,
).
$$
Definition: Outlier-robust filter
A filter is outlier-robust if the PIF is bounded, i.e.,
$$
\sup_{\bm y_t^c\in\R^d}|\text{PIF}(\bm y_t^c, \bm y_{1:t-1})| < \infty.
$$
Theorem
If \(\sup_{\bm y_t\in\R^d} W\left(\bm y_t, \hat{\bm y}_t\right) < \infty\)
and \(\sup_{\bm y_t\in\R^d} W\left(\bm y_{t}, \hat{\bm y}_t\right)^2\,\|\bm y_t\|^2 < \infty\) then the PIF is bounded.
Remarks
The Kalman filter is not outlier-robust.
Filters with IMQ and TMD weighting function are outlier-robust.
Computational results
Compared to variational-Bayes (VB) methods, which require multiple iterations to converge,
WoLF has an equivalent computational cost to the Kalman filter.
Method
Time
#HP
Ref
KF
\(O(p^3)\)
0
Kalman1960
KF-B
\(O(I\,p^3)\)
3
Wang2018
KF-IW
\(O(I\,p^3)\)
2
Agamennoni2012
OGD
\(O(I\, p^2)\)
2
Bencomo2023
WoLF-IMQ
\(O(p^3)\)
1
(Ours)
WoLF-TMD
\(O(p^3)\)
1
(Ours)
Below, \(I\) is the number of inner iterations,
\(p\) is the dimension of the state vector,
and #HP is the number of hyperparameters.
Experiment: Kalman filter (KF)
2d tracking
Linear SSM with \({\bf Q}_t = q\) \({\bf I}_4\), \({\bf R}_t = r\,{\bf I}_2\),
\(\Delta = 0.1\) is the sampling rate, \(q = 0.10\) is the system noise, \(r = 10\) is the measurement noise, and \({\bf I}_K\) is a \(K\times K\) identity matrix.
We measure the RMSE of the posterior mean estimate of the state components.
Online training of neural networks on a corrupted version of the tabular UCI datasets.
We consider a multilayered perceptron (MLP) with twenty hidden units, two hidden layers, and a real-valued output unit.
We evaluate the median squared error (RMedSE) between the true and predicted output.
The SSM is
$$
\begin{aligned}
{\bm\theta_t} &= \bm\theta_{t-1} + \bm\phi_t\\
{\bm y}_t &= h_t(\bm\theta_t) + \bm\varphi_t,
\end{aligned}
$$
with
\(h_t(\bm\theta_t) = h(\bm\theta_t, {\bf x}_t) \) the MLP and
\({\bf x}_t\in\mathbb{R}^m\) the input vector.
We estimate \( \mathbb{E}[\bm\theta_t \vert \bm y_{1:t}] \) via the extended Kalman filter (EKF) — one measurement, one parameter update.
We modify the measurement mean with
Here, \(\bm\theta_{s,k}\) is the value of the state component \(k\) at step \(s\),
\(\bm\phi_{s,i} \sim {\cal N}(8, 1)\), \(\bm\varphi_{s,i} \sim {\cal N}(0, 1)\), \(p_\epsilon = 0.001\),
\(i = 1, \ldots, d\), \(s = 1, \ldots, S\), with \(S \gg 1\) the number of steps, and
\(\bm\theta_{s, d + k} = \bm\theta_{s, k}\), \(\bm\theta_{s, -k} = \bm\theta_{s, d - k}\).
We consider the metric
\(L_t = \sqrt{\frac{1}{d}(\bm\theta_t - \bm\mu_t)^\intercal (\bm\theta_t - \bm\mu_t)}\).