Vanilla Kalman Filter

BS-free introduction to the math of Kalman filter

Graphical model

Hidden states are denoted as "x" and observables as "z"

Kindergarten physics

r(t+δt)=r(t)+r˙(t)δt+12r¨(t)δt2+h.o.t{\bf r}(t+\delta t) = {\bf r}(t) + \dot {\bf r}(t) \delta t + \frac12 \ddot {\bf r}(t) \delta t^2 + {\rm h.o.t}

If assume r¨=0\langle \ddot {\bf r} \rangle = 0, then the state xx is characterized by (r,r˙)({\bf r}, \dot {\bf r}), i.e.

xi(t+δt)=xi(t)+vi(t)δt+12εi(t)δt2vi(t+δt)=vi(t)+εi(t)δtx_{i}(t+\delta t) = x_i (t) + v_i (t) \delta t +\frac12 \varepsilon_i(t) \delta t^2\\ v_{i}(t+\delta t) = v_i(t) + \varepsilon_i (t) \delta t \qquad\qquad\qquad

Time evolution

xa(t+δt)=Fabxb(t)+ηax_a(t+\delta t) = F_{ab} x_b(t) + \eta_a

Observation or measurement

Inference

  • Given observations z<tz_{<t} up to t1t-1, what is the distribution of state xtx_t?

p(xtz<t)=dxt1  p(xtxt1)p(xt1zt1)p(x_t|z_{<t}) = \int {\rm d}x_{t-1}\; p(x_t | x_{t-1}) p (x_{t-1} | z_{\le t-1})
  • If there is one more observation ztz_t, what is the distribution of state xtx_t?

p(xtzt)=p(xtzt,z<t)p(ztxt)p(xtz<t)p(x_t|z_{\le t}) = p(x_t | z_t, z_{<t}) \propto p(z_t | x_t) p(x_t | z_{<t})

Expand the recursion, we have Feynman-Kac formula or path integral

p(xtz<t)[τ=1t1dxτ]τ=1tp(zτxτ)p(xτxτ1)p(z0x0)p(x0)p(x_t | z_{< t}) \propto \int \left [ \prod_{\tau = 1}^{t-1} {\rm d}x_\tau \right]\, \prod _{\tau=1}^{t}p(z_\tau | x_\tau) p(x_\tau | x_{\tau - 1}) p(z_0 | x_0) p(x_0)

If we assume the priors, transition probabilities and emission probabilities are all normal, then the posterior of state xtx_t is also normal.

Thus, we assume

p(xtz<t)=N(x^tt1,Ptt1)p(xtzt)=N(x^tt,Ptt)p(x_t | z_{< t}) = \mathcal N (\hat x_{t|t-1}, {P}_{t|t-1})\\ p(x_t | z_{\le t}) = \mathcal N (\hat x_{t|t}, {P}_{t|t})\qquad

State means

Use the recursion again,

p(xtzt)p(ztxt)p(xtz<t)p(x_t | z_{\le t}) \propto p(z_t | x_t) p(x_t | z_{<t})
N(xt;x^tt,Ptt)N(zt;Hxt,R)N(xt;x^tt1,Ptt1)\mathcal N(x_t; \hat x_{t|t}, P_{t|t}) \propto \mathcal N(z_t; Hx_t, R) \mathcal N(x_t; \hat x_{t|t-1}, P_{t|t-1})

Maximize xtx_t __ on both sides, the LHS __ xt=x^ttx_t = \hat x _{t|t}, the RHS

rhs=(ztHxtR1ztHxt)+(xtx^tt1Ptt11xtx^tt1){\rm rhs}=(z_t - Hx_t|R^{-1}|z_t - Hx_t) + (x_t - \hat x_{t|t-1}|P_{t|t-1}^{-1}|x_t - \hat x_{t|t-1})
0=xtrhs=HR1(ztHxt)+Ptt11(xtx^tt1)0=\partial_{x_t^\top}{\rm rhs}= - H^\top R^{-1}(z_t - Hx_t) + P^{-1}_{t|t-1} (x_t - \hat x_{t|t-1})

We must have

(x^ttx^tt1)=Ptt1HR1(ztHx^tt)=Ptt1HR1(ztHx^tt1H(x^ttx^tt1))\begin{align} (\hat x_{t|t} - \hat x_{t|t-1}) &= P_{t|t-1} H^\top R^{-1} (z_t - H \hat x_{t|t})\\ &= P_{t|t-1} H^\top R^{-1} (z_t - H \hat x_{t|t-1} - H(\hat x_{t|t} - \hat x_{t|t-1}) ) \end{align}

Therefore,

x^ttx^tt1=(I+Ptt1HR1H)1Ptt1HR1(ztHx^tt)K(ztHx^tt)\hat x_{t|t} - \hat x_{t|t-1} = (I + P_{t|t-1} H^\top R^{-1} H)^{-1} P_{t|t-1} H^\top R^{-1} (z_t - H \hat x_{t|t}) \equiv K (z_t - H \hat x_{t|t})

where K=(I+Ptt1HR1H)1Ptt1HR1K = (I + P_{t|t-1} H^\top R^{-1} H)^{-1} P_{t|t-1} H^\top R^{-1} is the Kalman gain.

State covariances

The covariances

(Ptt1)ab(xtx^tt1)a(xtx^tt1)b(P_{t|t-1})_{ab} \equiv \langle (x_t - \hat x_{t|t-1})_a (x_t - \hat x_{t|t-1})_b \rangle
(Ptt)ab(xtx^tt)a(xtx^tt)b(P_{t|t})_{ab} \equiv \langle (x_t - \hat x_{t|t})_a (x_t - \hat x_{t|t})_b \rangle

where

(Ptt)ab=[xt(x^ttx^tt1)x^tt1]a[]b=[xtx^tt1+(x^ttx^tt1)]a[]b=[xtx^tt1K(ztHx^tt1)]a[]b=[xtx^tt1K(H(xtx^tt1)+ξ)]a[]b=[(IKH)ac(xtx^tt1)cKacξc][(IKH)bd(xtx^tt1)dKbdξd]=(IKH)Ptt1(IKH)+KRK\begin{align} (P_{t|t})_{ab} &= \langle [x_t - (\hat x_{t|t} - \hat x_{t|t-1}) - \hat x_{t|t-1}]_a [\cdots]_b \rangle \\ &= \langle [x_t - \hat x_{t|t-1} + (\hat x_{t|t} - \hat x_{t|t-1})]_a [\cdots]_b \rangle \\ &= \langle [x_t - \hat x_{t|t-1} - K (z_t - H\hat x_{t|t-1})]_a [\cdots]_b \rangle \\ &= \langle [x_t - \hat x_{t|t-1} - K (H(x_t-\hat x_{t|t-1}) + \xi)]_a [\cdots]_b \rangle \\ &= \langle [(I-KH)_{ac}(x_t - \hat x_{t|t-1})_c - K_{ac}\xi_c][(I-KH)_{bd}(x_{t} - \hat x_{t|t-1})_d - K_{bd}\xi_d] \rangle\\ &= (I - KH) P_{t|t-1} (I - KH)^\top + KRK^\top \end{align}

Given the prior covariance, minimize the posterior MSE minK(Ptt)aa\min_K (P_{t|t})_{aa}

trPttK=(HPtt1(IKH)+RK)=KR(IKH)Ptt1H=K(R+HPtt1H)Ptt1H\begin{align} \frac{\partial\, {\rm tr} P_{t|t}}{\partial K} &= (-HP_{t|t-1}(I-KH)^\top + RK^\top)^\top \\ &= KR - (I-KH)P_{t|t-1}^\top H^\top \\ &= K (R + HP_{t|t-1}^\top H^\top) - P_{t|t-1}^\top H^\top \end{align}

Therefore,

K=Ptt1H(HPtt1H+R)1K^* = P_{t|t-1}^\top H^\top (HP_{t|t-1}^\top H^\top + R)^{-1}

But previously we got K=(I+Ptt1HR1H)1Ptt1HR1K = (I + P_{t|t-1} H^\top R^{-1} H)^{-1} P_{t|t-1} H^\top R^{-1}.

Are they consistent? To show that

K1=R(Ptt1H)1(I+Ptt1HR1H)=R(Ptt1H)1+H=(R+HPtt1H)(Ptt1H)1\begin{align} K^{-1} &= R (P_{t|t-1}H^\top)^{-1}(I + P_{t|t-1} H^\top R^{-1} H) \\ &= R(P_{t|t-1}H^\top)^{-1} + H \\ &= (R + HP_{t|t-1}H^\top)(P_{t|t-1}H^\top)^{-1} \end{align}

Thus, K=(Ptt1H)(R+HPtt1H)1=KK = (P_{t|t-1}H^\top) (R + HP_{t|t-1}H^\top)^{-1} = K^*

Given the posterior, the prior covariance for next step

(Ptt1)ab(xtx^tt1)a(xtx^tt1)b=(xtFx^t1t1)a(xtFx^t1t1)b=(xtFxt1t1+F(xt1t1x^t1t1))a(xtFxt1t1+F(xt1t1x^t1t1))b\begin{align} (P_{t|t-1})_{ab} &\equiv \langle (x_t - \hat x_{t|t-1})_a (x_t - \hat x_{t|t-1})_b \rangle \\ &= \langle (x_t - F\hat x_{t-1|t-1})_a (x_t - F\hat x_{t-1|t-1})_b \rangle \\ &= \langle (x_t - F x_{t-1|t-1} + F(x_{t-1|t-1} - \hat x_{t-1|t-1}) )_a (x_t - F x_{t-1|t-1} + F(x_{t-1|t-1} - \hat x_{t-1|t-1}))_b \rangle \end{align}

Thus, the forecasted covariance the composed of measurement noise and evolution noise

Ptt1=Q+FPt1t1FP_{t|t-1} = Q + F P_{t-1|t-1} F^\top

Algorithm

  • Forecast state mean using posterior as prior

x^tt1=Fx^t1t1Ptt1=Q+FPt1t1F\hat x_{t|t-1} = F\hat x_{t-1 | t-1}\\ P_{t|t-1} = Q + F P_{t-1|t-1} F^\top
  • Compute Kalman gain K=(Ptt1H)(R+HPtt1H)1K=(P_{t|t-1}H^\top)(R + HP_{t|t-1}H^\top)^{-1}

  • Make a measurement ztz_tand a prediction using forecast state __ Hx^tt1H\hat x_{t|t-1}__

  • Compute posterior mean using new observation x^tt=x^tt1+K(ztHx^tt1)\hat x_{t|t} = \hat x_{t|t-1} + K(z_t - H\hat x_{t|t-1})

  • Compute posterior covariance Ptt=(IKH)Ptt1(IKH)+KRKP_{t|t} = (I -KH)P_{t|t-1}(I-KH)^\top + K R K^\top

Last updated

Was this helpful?