For a state-space system with state \(x_k\) and measurement \(z_k\), \[ x_k = F x_{k-1} + w_k, \qquad z_k = H x_k + v_k, \] the Kalman filter predicts \(\hat{x}_{k|k-1} = F\hat{x}_{k-1}\) and corrects it with the innovation (measurement residual) \[ r_k = z_k - H\hat{x}_{k|k-1}, \qquad \hat{x}_k = \hat{x}_{k|k-1} + K_k r_k. \] The gain \(K_k\) is statistically optimal only when the measurement noise \(v_k\) is white. When the noise is colored, i.e. concentrated in a frequency band, that assumption breaks and the estimate degrades; learned filters such as KalmanNet help but still treat the innovation as broadband.
FW-NKF keeps the recursion but passes the innovation through a learnable IIR filter before the correction: \[ \tilde{r}_k = \sum_{m=0}^{M} b_m\, r_{k-m} \;-\; \sum_{n=1}^{N} a_n\, \tilde{r}_{k-n}, \qquad \hat{x}_k = \hat{x}_{k|k-1} + K_k \tilde{r}_k. \] An IIR response (rather than FIR) is sharp with few learnable coefficients \(\{a_n, b_m\}\), so the filter can attenuate the noisy band of the innovation while keeping the informative one.
Training uses more than state error: the observation model maps both the true and the estimated state back to measurement space, and a frequency-domain term pulls the filtered signal toward the clean spectrum, \[ \mathcal{L} = \mathcal{L}_{\text{state}} + \lambda\, \mathcal{L}_{\text{spec}}, \qquad \mathcal{L}_{\text{spec}} = \big\lVert\, |\mathcal{F}(H\hat{x})| - |\mathcal{F}(Hx)|\, \big\rVert, \] where \(\mathcal{F}\) is the DFT and \(\lambda\) weights the spectral loss.
FW-NKF is tested on synthetic systems (the Lorenz attractor, a nonlinear pendulum) and on real tracking (EuRoC MAV IMU odometry, UWB-IMU human pose), against the classical KF, KalmanNet and its Bayesian/recursive variants, the Recurrent Kalman Network, and an autoregressive KF, over MSE / NRMSE / \(R^2\) across several seeds. It is consistently the strongest:
| System | FW-NKF MSE | FW-NKF R^2 | KalmanNet MSE |
|---|---|---|---|
| Lorenz (3-state) | 0.276 | 0.999 | 19.53 |
| Pendulum (2-state) | 0.278 | 0.947 | 0.767 |
| EuRoC MAV (10-state) | 0.035 | 0.989 | - |
The spectral term is the active ingredient: sweeping its weight \(\lambda\) from \(0\) (a plain neural Kalman filter) upward improves accuracy, with the best setting near \(\lambda \approx 0.01\)-\(0.1\) depending on the system, confirming that frequency-selective denoising of the innovation - not just added capacity - drives the gain.
Work with ETH Zürich's Sensing, Interaction & Perception Lab; accepted at IEEE ICRA 2026.