Graphical model
Kindergarten physics
r(t+δt)=r(t)+r˙(t)δt+21r¨(t)δt2+h.o.t If assume ⟨r¨⟩=0, then the state x is characterized by (r,r˙), i.e.
xi(t+δt)=xi(t)+vi(t)δt+21εi(t)δt2vi(t+δt)=vi(t)+εi(t)δt Time evolution
xa(t+δt)=Fabxb(t)+ηa F_{ab} = \pmatrix{ 1 & \delta t \\ 0 & 1 }\quad \eta_a \sim {\cal N}(0, {Q})
Observation or measurement
z_a(t) = H_{ab} x_b(t) + \xi_a\quad \\ H_{ab} = \pmatrix{ 1 & 0 \\ } \quad \xi_a\sim {\cal N}(0, {R})
Inference
Given observations z<t up to t−1, what is the distribution of state xt?
p(xt∣z<t)=∫dxt−1p(xt∣xt−1)p(xt−1∣z≤t−1) If there is one more observation zt, what is the distribution of state xt?
p(xt∣z≤t)=p(xt∣zt,z<t)∝p(zt∣xt)p(xt∣z<t) Expand the recursion, we have Feynman-Kac formula or path integral
p(xt∣z<t)∝∫[τ=1∏t−1dxτ]τ=1∏tp(zτ∣xτ)p(xτ∣xτ−1)p(z0∣x0)p(x0) If we assume the priors, transition probabilities and emission probabilities are all normal, then the posterior of state xt is also normal.
Thus, we assume
p(xt∣z<t)=N(x^t∣t−1,Pt∣t−1)p(xt∣z≤t)=N(x^t∣t,Pt∣t) State means
Use the recursion again,
p(xt∣z≤t)∝p(zt∣xt)p(xt∣z<t) N(xt;x^t∣t,Pt∣t)∝N(zt;Hxt,R)N(xt;x^t∣t−1,Pt∣t−1) Maximize xt __ on both sides, the LHS __ xt=x^t∣t, the RHS
rhs=(zt−Hxt∣R−1∣zt−Hxt)+(xt−x^t∣t−1∣Pt∣t−1−1∣xt−x^t∣t−1) 0=∂xt⊤rhs=−H⊤R−1(zt−Hxt)+Pt∣t−1−1(xt−x^t∣t−1) We must have
(x^t∣t−x^t∣t−1)=Pt∣t−1H⊤R−1(zt−Hx^t∣t)=Pt∣t−1H⊤R−1(zt−Hx^t∣t−1−H(x^t∣t−x^t∣t−1)) Therefore,
x^t∣t−x^t∣t−1=(I+Pt∣t−1H⊤R−1H)−1Pt∣t−1H⊤R−1(zt−Hx^t∣t)≡K(zt−Hx^t∣t) where K=(I+Pt∣t−1H⊤R−1H)−1Pt∣t−1H⊤R−1 is the Kalman gain.
State covariances
The covariances
(Pt∣t−1)ab≡⟨(xt−x^t∣t−1)a(xt−x^t∣t−1)b⟩ (Pt∣t)ab≡⟨(xt−x^t∣t)a(xt−x^t∣t)b⟩ where
(Pt∣t)ab=⟨[xt−(x^t∣t−x^t∣t−1)−x^t∣t−1]a[⋯]b⟩=⟨[xt−x^t∣t−1+(x^t∣t−x^t∣t−1)]a[⋯]b⟩=⟨[xt−x^t∣t−1−K(zt−Hx^t∣t−1)]a[⋯]b⟩=⟨[xt−x^t∣t−1−K(H(xt−x^t∣t−1)+ξ)]a[⋯]b⟩=⟨[(I−KH)ac(xt−x^t∣t−1)c−Kacξc][(I−KH)bd(xt−x^t∣t−1)d−Kbdξd]⟩=(I−KH)Pt∣t−1(I−KH)⊤+KRK⊤ Given the prior covariance, minimize the posterior MSE minK(Pt∣t)aa
∂K∂trPt∣t=(−HPt∣t−1(I−KH)⊤+RK⊤)⊤=KR−(I−KH)Pt∣t−1⊤H⊤=K(R+HPt∣t−1⊤H⊤)−Pt∣t−1⊤H⊤ Therefore,
K∗=Pt∣t−1⊤H⊤(HPt∣t−1⊤H⊤+R)−1 But previously we got K=(I+Pt∣t−1H⊤R−1H)−1Pt∣t−1H⊤R−1.
Are they consistent? To show that
K−1=R(Pt∣t−1H⊤)−1(I+Pt∣t−1H⊤R−1H)=R(Pt∣t−1H⊤)−1+H=(R+HPt∣t−1H⊤)(Pt∣t−1H⊤)−1 Thus, K=(Pt∣t−1H⊤)(R+HPt∣t−1H⊤)−1=K∗
Given the posterior, the prior covariance for next step
(Pt∣t−1)ab≡⟨(xt−x^t∣t−1)a(xt−x^t∣t−1)b⟩=⟨(xt−Fx^t−1∣t−1)a(xt−Fx^t−1∣t−1)b⟩=⟨(xt−Fxt−1∣t−1+F(xt−1∣t−1−x^t−1∣t−1))a(xt−Fxt−1∣t−1+F(xt−1∣t−1−x^t−1∣t−1))b⟩ Thus, the forecasted covariance the composed of measurement noise and evolution noise
Pt∣t−1=Q+FPt−1∣t−1F⊤ Algorithm
Forecast state mean using posterior as prior
x^t∣t−1=Fx^t−1∣t−1Pt∣t−1=Q+FPt−1∣t−1F⊤ Compute Kalman gain K=(Pt∣t−1H⊤)(R+HPt∣t−1H⊤)−1
Make a measurement ztand a prediction using forecast state __ Hx^t∣t−1__
Compute posterior mean using new observation x^t∣t=x^t∣t−1+K(zt−Hx^t∣t−1)
Compute posterior covariance Pt∣t=(I−KH)Pt∣t−1(I−KH)⊤+KRK⊤