递推最小二乘法(Recursive least square, RLS)详细推导

假设有数据( X , Y ) (X,Y)(X ,Y ),其中X ∈ R m × d X \in {\mathbb{R}^{m \times d}}X ∈R m ×d,Y ∈ R m × 1 Y \in {\mathbb{R}^{m \times 1}}Y ∈R m ×1,m m m为样本数,d d d为特征数,考虑最小二乘解
θ 0 = ( X T X ) − 1 X T Y = Σ 0 − 1 X T Y (1) \begin{aligned}{\theta_0} = {\left( {{X^{\rm{T}}}X} \right)^{ – 1}}{X^{\rm{T}}}Y = {\Sigma_0}^{-1}{X^{\rm{T}}}Y \tag{1}\end{aligned}θ0 ​=(X T X )−1 X T Y =Σ0 ​−1 X T Y ​(1 )
⇒ Σ 0 θ 0 = X T Y (2) \Rightarrow {\Sigma_0}{\theta_0} = {X^{\rm{T}}}Y \tag{2}⇒Σ0 ​θ0 ​=X T Y (2 )
当新数据( X 1 , Y 1 ) \left( {{X_1},{Y_1}} \right)(X 1 ​,Y 1 ​)到来时,更新模型,得到新的回归系数
θ 1 = ( [ X X 1 ] T [ X X 1 ] ) − 1 [ X X 1 ] T [ Y Y 1 ] = Σ 1 − 1 [ X X 1 ] T [ Y Y 1 ] (3) \begin{aligned} {\theta_1} &= {\left( {{{\left[ {\begin{array}{cc} X\ {{X_1}} \end{array}} \right]}^{\rm{T}}}\left[ {\begin{array}{cc} X\ {{X_1}} \end{array}} \right]} \right)^{ – 1}}{\left[ {\begin{array}{cc} X\ {{X_1}} \end{array}} \right]^{\rm{T}}}\left[ {\begin{array}{cc} Y\ {{Y_1}} \end{array}} \right] \ &= {\Sigma 1}^{ – 1}{\left[ {\begin{array}{cc} X\ {{X_1}} \end{array}} \right]^{\rm{T}}}\left[ {\begin{array}{cc} Y\ {{Y_1}} \end{array}} \right]\tag{3} \end{aligned}θ1 ​​=([X X 1 ​​]T [X X 1 ​​])−1 [X X 1 ​​]T [Y Y 1 ​​]=Σ1 ​−1 [X X 1 ​​]T [Y Y 1 ​​]​(3 )
其中
Σ 1 = [ X X 1 ] T [ X X 1 ] = X T X + X 1 T X 1 = Σ 0 + X 1 T X 1 (4) \begin{aligned} {\Sigma _1} &= {\left[ {\begin{array}{cc} X\ {{X_1}} \end{array}} \right]^{\rm{T}}}\left[ {\begin{array}{cc} X\ {{X_1}} \end{array}} \right] \ &= {X^{\rm{T}}}X + X_1^{\rm{T}}{X_1} \ &= {\Sigma _0} + X_1^{\rm{T}}{X_1} \end{aligned} \tag{4}Σ1 ​​=[X X 1 ​​]T [X X 1 ​​]=X T X +X 1 T ​X 1 ​=Σ0 ​+X 1 T ​X 1 ​​(4 )
⇒ Σ 0 = Σ 1 − X 1 T X 1 (5) \begin{aligned} \Rightarrow {\Sigma _0} = {\Sigma _1} – X_1^{\rm{T}}{X_1} \end{aligned} \tag{5}⇒Σ0 ​=Σ1 ​−X 1 T ​X 1 ​​(5 )
根据公式(4)的结果,通过归纳可得
Σ k = Σ k − 1 + X k T X k (6) \begin{aligned} {\Sigma _k} = {\Sigma
{k – 1}} + X_k^{\rm{T}}{X_k} \end{aligned} \tag{6}Σk ​=Σk −1 ​+X k T ​X k ​​(6 )
[ X X 1 ] T [ Y Y 1 ] = X T Y + X 1 T Y 1 = Σ 0 θ 0 + X 1 T Y 1 / / 公 式 ( 2 ) 结 果 替 换 得 到 = ( Σ 1 − X 1 T X 1 ) θ 0 + X 1 T Y 1 / / 公 式 ( 5 ) 结 果 替 换 得 到 = Σ 1 θ 0 + X 1 T ( Y 1 − X 1 θ 0 ) (7) \begin{aligned} {\left[ {\begin{array}{cc} X\ {{X_1}} \end{array}} \right]^{\rm{T}}}\left[ {\begin{array}{cc} Y\ {{Y_1}} \end{array}} \right] &= {X^{\rm{T}}}Y + X_1^{\rm{T}}{Y_1}\ &= {\Sigma 0}{\theta_0} + X_1^{\rm{T}}{Y_1} \quad //公式(2)结果替换得到\ &= \left( {{\Sigma _1} – X_1^{\rm{T}}{X_1}} \right){\theta_0} + X_1^{\rm{T}}{Y_1} \quad //公式(5)结果替换得到\ &= {\Sigma _1}{\theta_0} + X_1^{\rm{T}}\left( {{Y_1} – {X_1}{\theta_0}} \right) \end{aligned} \tag{7}[X X 1 ​​]T [Y Y 1 ​​]​=X T Y +X 1 T ​Y 1 ​=Σ0 ​θ0 ​+X 1 T ​Y 1 ​//公式(2 )结果替换得到=(Σ1 ​−X 1 T ​X 1 ​)θ0 ​+X 1 T ​Y 1 ​//公式(5 )结果替换得到=Σ1 ​θ0 ​+X 1 T ​(Y 1 ​−X 1 ​θ0 ​)​(7 )
将公式(7)回带到公式(3):
θ 1 = Σ 1 − 1 ( Σ 1 θ 0 + X 1 T ( Y 1 − X 1 θ 0 ) ) = θ 0 + Σ 1 − 1 X 1 T ( Y 1 − X 1 θ 0 ) (8) \begin{aligned} {\theta_1} &= {\Sigma _1}^{ – 1}\left( {{\Sigma _1}{\theta_0} + X_1^{\rm{T}}\left( {{Y_1} – {X_1}{\theta_0}} \right)} \right) \ &= {\theta_0} + {\Sigma _1}^{ – 1}X_1^{\rm{T}}\left( {{Y_1} – {X_1}{\theta_0}} \right) \end{aligned} \tag{8}θ1 ​​=Σ1 ​−1 (Σ1 ​θ0 ​+X 1 T ​(Y 1 ​−X 1 ​θ0 ​))=θ0 ​+Σ1 ​−1 X 1 T ​(Y 1 ​−X 1 ​θ0 ​)​(8 )
根据公式(8)的结果,通过归纳可得
θ k = θ k − 1 + Σ k − 1 X k T ( Y k − X k θ k − 1 ) (9) \begin{aligned} {\theta_k} = {\theta
{k – 1}} + {\Sigma k}^{ – 1}X_k^{\rm{T}}\left( {{Y_k} – {X_k}{\theta{k – 1}}} \right) \end{aligned} \tag{9}θk ​=θk −1 ​+Σk ​−1 X k T ​(Y k ​−X k ​θk −1 ​)​(9 )

到这里,已经能够实现对最小二乘的递推,其过程可概括如下,我们称为算法1:

  1. 根据公式(5)更新Σ k = Σ k − 1 + X k T X k {\Sigma k} = {\Sigma {k – 1}} + X_k^{\rm{T}}{X_k}Σk ​=Σk −1 ​+X k T ​X k ​;
  2. 根据公式(9)更新θ k = θ k − 1 + Σ k − 1 X k T ( Y k − X k θ k − 1 ) {\theta_k} = {\theta_{k – 1}} + {\Sigma k}^{ – 1}X_k^{\rm{T}}\left( {{Y_k} – {X_k}{\theta{k – 1}}} \right)θk ​=θk −1 ​+Σk ​−1 X k T ​(Y k ​−X k ​θk −1 ​)。

但以上过程存在两个问题:

  1. 对矩阵Σ k \Sigma_k Σk ​的求逆计算复杂度比较高,我们能否在递推过程中避免对Σ k \Sigma_k Σk ​的求逆计算,而直接更新它的逆矩阵;
  2. 矩阵Σ k \Sigma_k Σk ​中的元素会随着数据量的增加不断增大,可能会发生数值溢出的问题。

针对以上问题,我们要对公式进一步改造,根据Sherman-Morrison-Woodbury公式:
( A + U V T ) − 1 = A − 1 − A − 1 U ( I + V T A − 1 U ) − 1 V T A − 1 {\left( {A + U{V^{\rm{T}}}} \right)^{ – 1}} = {A^{ – 1}} – {A^{ – 1}}U{\left( {I + {V^{\rm{T}}}{A^{ – 1}}U} \right)^{ – 1}}{V^{\rm{T}}}{A^{ – 1}}(A +U V T )−1 =A −1 −A −1 U (I +V T A −1 U )−1 V T A −1
公式(6)的逆可写成如下形式
Σ k − 1 = ( Σ k − 1 + X k T X k ) − 1 = Σ k − 1 − 1 − Σ k − 1 − 1 X k T ( I + X k Σ k − 1 − 1 X k T ) − 1 X k Σ k − 1 − 1 (10) \begin{aligned} {\Sigma k}^{ – 1} &= {\left( {{\Sigma {k – 1}} + X_k^{\rm{T}}{X_k}} \right)^{ – 1}} \ &= \Sigma {k – 1}^{ – 1} – \Sigma {k – 1}^{ – 1}X_k^{\rm{T}}{\left( {I + {X_k}\Sigma {k – 1}^{ – 1}X_k^{\rm{T}}} \right)^{ – 1}}{X_k}\Sigma {k – 1}^{ – 1} \end{aligned} \tag{10}Σk ​−1 ​=(Σk −1 ​+X k T ​X k ​)−1 =Σk −1 −1 ​−Σk −1 −1 ​X k T ​(I +X k ​Σk −1 −1 ​X k T ​)−1 X k ​Σk −1 −1 ​​(1 0 )
令P k = ∑ k − 1 {P_k} = {\sum k}^{ – 1}P k ​=∑k ​−1,公式(10)变为:
P k = P k − 1 − P k − 1 X k T ( I + X k P k − 1 X k T ) − 1 X k P k − 1 (11) \begin{aligned} {P_k} = {P
{k – 1}} – {P_{k – 1}}X_k^{\rm{T}}{\left( {I + {X_k}{P_{k – 1}}X_k^{\rm{T}}} \right)^{ – 1}}{X_k}{P_{k – 1}} \end{aligned} \tag{11}P k ​=P k −1 ​−P k −1 ​X k T ​(I +X k ​P k −1 ​X k T ​)−1 X k ​P k −1 ​​(1 1 )
公式(9)变为:
θ k = θ k − 1 + P k X k T ( Y k − X k θ k − 1 ) (12) \begin{aligned} {\theta_k} = {\theta_{k – 1}} + {P_k}X_k^{\rm{T}}\left( {{Y_k} – {X_k}{\theta_{k – 1}}} \right) \end{aligned} \tag{12}θk ​=θk −1 ​+P k ​X k T ​(Y k ​−X k ​θk −1 ​)​(1 2 )
注意到,公式(11)依然存在对I + X k P k − 1 X k T {I + {X_k}{P_{k – 1}}X_k^{\rm{T}}}I +X k ​P k −1 ​X k T ​的求逆运算,这似乎依然没有解决上述问题1,我们避免了对Σ k \Sigma_k Σk ​的求逆,但却又引入了一个新的逆。事实上,如果数据是逐个到达的,则X k X_k X k ​为一个行向量(在本文中,一个样本我们用行向量表示,这主要是因为本文规定数据矩阵中每一行代表一个样本),因此I + X k P k − 1 X k T {I + {X_k}{P_{k – 1}}X_k^{\rm{T}}}I +X k ​P k −1 ​X k T ​最终得到结果为一个数值,我们无需矩阵求逆计算,只需要取它的倒数就好了,即
P k = P k − 1 − P k − 1 X k T X k P k − 1 1 + X k P k − 1 X k T (13) \begin{aligned} {P_k} = {P_{k – 1}} – \frac{{{P_{k – 1}}X_k^{\rm{T}}{X_k}{P_{k – 1}}}}{{1 + {X_k}{P_{k – 1}}X_k^{\rm{T}}}} \end{aligned} \tag{13}P k ​=P k −1 ​−1 +X k ​P k −1 ​X k T ​P k −1 ​X k T ​X k ​P k −1 ​​​(1 3 )
于是我们得到了新的递推算法如下,我们称为算法2:

  1. 根据公式(13)更新P k = P k − 1 − P k − 1 X k T X k P k − 1 1 + X k P k − 1 X k T ; {P_k} = {P_{k – 1}} – \frac{{{P_{k – 1}}X_k^{\rm{T}}{X_k}{P_{k – 1}}}}{{1 + {X_k}{P_{k – 1}}X_k^{\rm{T}}}};P k ​=P k −1 ​−1 +X k ​P k −1 ​X k T ​P k −1 ​X k T ​X k ​P k −1 ​​;
  2. 根据公式(12)更新θ k = θ k − 1 + P k X k T ( Y k − X k θ k − 1 ) {\theta_k} = {\theta_{k – 1}} + {P_k}X_k^{\rm{T}}\left( {{Y_k} – {X_k}{\theta_{k – 1}}} \right)θk ​=θk −1 ​+P k ​X k T ​(Y k ​−X k ​θk −1 ​)。

一些书上的递推算法可能并非这样的形式,我们可以进一步对上述过程进行一些整理。在一些书中,K k = P k X k T {K_k} = {P_k}X_k^{\rm{T}}K k ​=P k ​X k T ​也被称为增益,Y k − X k θ k − 1 {Y_k} – {X_k}{\theta_{k – 1}}Y k ​−X k ​θk −1 ​被称为新息,顾名思义,就是引入的新信息。
K k = P k X k T = ( P k − 1 − P k − 1 X k T ( I + X k P k − 1 X k T ) − 1 X k P k − 1 ) X k T / / 公 式 ( 11 ) 结 果 替 换 得 到 = P k − 1 X k T ( I − ( I + X k P k − 1 X k T ) − 1 X k P k − 1 X k T ) = P k − 1 X k T ( I + X k P k − 1 X k T ) − 1 ( ( I + X k P k − 1 X k T ) − X k P k − 1 X k T ) = P k − 1 X k T ( I + X k P k − 1 X k T ) − 1 (14) \begin{aligned} {K_k} &= {P_k}X_k^{\rm{T}}\ &= \left( {{P_{k – 1}} – {P_{k – 1}}X_k^{\rm{T}}{{\left( {I + {X_k}{P_{k – 1}}X_k^{\rm{T}}} \right)}^{ – 1}}{X_k}{P_{k – 1}}} \right)X_k^{\rm{T}} \quad //公式(11)结果替换得到\ &= {P_{k – 1}}X_k^{\rm{T}}\left( {I – {{\left( {I + {X_k}{P_{k – 1}}X_k^{\rm{T}}} \right)}^{ – 1}}{X_k}{P_{k – 1}}X_k^{\rm{T}}} \right)\ &= {P_{k – 1}}X_k^{\rm{T}}{\left( {I + {X_k}{P_{k – 1}}X_k^{\rm{T}}} \right)^{ – 1}}\left( {\left( {I + {X_k}{P_{k – 1}}X_k^{\rm{T}}} \right) – {X_k}{P_{k – 1}}X_k^{\rm{T}}} \right)\ &= {P_{k – 1}}X_k^{\rm{T}}{\left( {I + {X_k}{P_{k – 1}}X_k^{\rm{T}}} \right)^{ – 1}} \end{aligned} \tag{14}K k ​​=P k ​X k T ​=(P k −1 ​−P k −1 ​X k T ​(I +X k ​P k −1 ​X k T ​)−1 X k ​P k −1 ​)X k T ​//公式(1 1 )结果替换得到=P k −1 ​X k T ​(I −(I +X k ​P k −1 ​X k T ​)−1 X k ​P k −1 ​X k T ​)=P k −1 ​X k T ​(I +X k ​P k −1 ​X k T ​)−1 ((I +X k ​P k −1 ​X k T ​)−X k ​P k −1 ​X k T ​)=P k −1 ​X k T ​(I +X k ​P k −1 ​X k T ​)−1 ​(1 4 )
将公式(14)的结果代入到公式(11)可得
P k = P k − 1 − K k X k P k − 1 = ( I − K k X k ) P k − 1 (15) \begin{aligned} {P_k} = {P_{k – 1}} – {K_k}{X_k}{P_{k – 1}} = \left( {I – {K_k}{X_k}} \right){P_{k – 1}} \end{aligned} \tag{15}P k ​=P k −1 ​−K k ​X k ​P k −1 ​=(I −K k ​X k ​)P k −1 ​​(1 5 )
于是,算法2可进一步的写为如下形式,我们称为算法3:

  1. 根据公式(14)更新模型增益K k = P k − 1 X k T ( I + X k P k − 1 X k T ) − 1 {K_k} = {P_{k – 1}}X_k^{\rm{T}}{\left( {I + {X_k}{P_{k – 1}}X_k^{\rm{T}}} \right)^{ – 1}}K k ​=P k −1 ​X k T ​(I +X k ​P k −1 ​X k T ​)−1;
  2. 根据公式(15)更新P k = ( I − K k X k ) P k − 1 {P_k} = \left( {I – {K_k}{X_k}} \right){P_{k – 1}}P k ​=(I −K k ​X k ​)P k −1 ​;
  3. 更新回归系数θ k = θ k − 1 + K k ( Y k − X k θ k − 1 ) {\theta_k} = {\theta_{k – 1}} + {K_k}\left( {{Y_k} – {X_k}{\theta_{k – 1}}} \right)θk ​=θk −1 ​+K k ​(Y k ​−X k ​θk −1 ​)

Original: https://blog.csdn.net/qq_39645262/article/details/125691638
Author: tianmingemmm
Title: 递推最小二乘法(Recursive least square, RLS)详细推导

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/613711/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球