已知数据集( x 1 , y 1 ) , ( x 2 , y 2 ) . . . ( x N , y N ) {(x_1, y_1), (x_2, y_2)…(x_N, y_N)}(x 1 ,y 1 ),(x 2 ,y 2 )…(x N ,y N )
假设一元线性回归方程为y ^ = b ∗ x + a \hat y = bx+a y ^=b ∗x +a,接下来用最小二乘法求解a和b
损 失 函 数 L ( a , b ) = Σ i = 1 N ( y ^ i − y i ) 2 = Σ i = 1 N ( b ∗ x i + a − y i ) 2 \displaystyle 损失函数\mathcal{L}(a, b) = \Sigma_{i=1}^N (\hat y_i-y_i)^2 = \Sigma_{i=1}^N (bx_i+a-y_i)^2 损失函数L (a ,b )=Σi =1 N (y ^i −y i )2 =Σi =1 N (b ∗x i +a −y i )2
∂ L ∂ a = Σ i = 1 N 2 ( b ∗ x i + a − y i ) = Σ i = 1 N 2 b x i + 2 a N − Σ i = 1 N y i = 2 b N x ‾ + 2 a N − N y ‾ = 2 N ( b x ‾ + a − y ‾ ) \displaystyle \frac {\partial \mathcal{L}} {\partial a} = \Sigma_{i=1}^N 2(b*x_i+a-y_i) = \Sigma_{i=1}^N 2bx_i + 2aN – \Sigma_{i=1}^N y_i =2bN \overline x +2aN-N \overline y = 2N(b \overline x +a- \overline y)∂a ∂L =Σi =1 N 2 (b ∗x i +a −y i )=Σi =1 N 2 b x i +2 a N −Σi =1 N y i =2 b N x +2 a N −N y =2 N (b x +a −y )
令∂ L ∂ a = 0 \displaystyle \frac {\partial \mathcal{L}} {\partial a} = 0 ∂a ∂L =0,求得a = y ‾ − b x ‾ a = \overline y-b \overline x a =y −b x,带入L ( a , b ) \mathcal{L}(a, b)L (a ,b )
L ( a , b ) = Σ i = 1 N ( b ∗ x i + y ‾ − b x ‾ − y i ) 2 = Σ i = 1 N [ b ( x i − x ‾ ) − ( y i − y ‾ ) ] 2 \displaystyle \mathcal{L}(a, b) = \Sigma_{i=1}^N (b*x_i + \overline y – b \overline x – y_i)^2 = \Sigma_{i=1}^N [b(x_i – \overline x) – (y_i – \overline y)]^2 L (a ,b )=Σi =1 N (b ∗x i +y −b x −y i )2 =Σi =1 N [b (x i −x )−(y i −y )]2
∂ L ∂ b = Σ i = 1 N 2 ( x i − x ‾ ) [ b ( x i − x ‾ ) − ( y i − y ‾ ) ] = Σ i = 1 N [ 2 b ( x i − x ‾ ) 2 − 2 ( x i − x ‾ ) ( y i − y ‾ ) ] = 2 b Σ i = 1 N ( x i − x ‾ ) 2 − 2 Σ i = 1 N ( x i − x ‾ ) ( y i − y ‾ ) = 2 b V a r ( x ) − 2 C o v ( x , y ) \displaystyle \frac {\partial \mathcal{L}} {\partial b} = \Sigma_{i=1}^N 2(x_i – \overline x )[b(x_i – \overline x) – (y_i – \overline y)] = \Sigma_{i=1}^N[2b(x_i – \overline x)^2 – 2(x_i – \overline x)(y_i – \overline y)] = 2b\Sigma_{i=1}^N (x_i – \overline x)^2 – 2\Sigma_{i=1}^N (x_i – \overline x)(y_i – \overline y) =2bVar(x) – 2Cov(x, y)∂b ∂L =Σi =1 N 2 (x i −x )[b (x i −x )−(y i −y )]=Σi =1 N [2 b (x i −x )2 −2 (x i −x )(y i −y )]=2 b Σi =1 N (x i −x )2 −2 Σi =1 N (x i −x )(y i −y )=2 b V a r (x )−2 C o v (x ,y )
令∂ L ∂ b = 0 \displaystyle \frac {\partial \mathcal{L}} {\partial b} = 0 ∂b ∂L =0,求得b = C o v ( x , y ) V a r ( x ) \displaystyle b = \frac {Cov(x, y)} {Var(x)}b =V a r (x )C o v (x ,y )
上面处理的是x i , y i ∈ R x_i, y_i \in R x i ,y i ∈R的情况,下面讨论多变量线性回归。假设 x i ∈ R 1 × D ( 行向量 ) , y i ∈ R , x ∈ R N × D , y ∈ R N \boldsymbol x_i \in R^{1 \times D}(\textbf {行向量}), \boldsymbol y_i \in R, \boldsymbol x \in R^{N \times D}, \boldsymbol y \in R^N x i ∈R 1 ×D (行向量),y i ∈R ,x ∈R N ×D ,y ∈R N,其中N为样本总个数,D为特征维数。
假设线性回归模型为y ^ = x ⋅ θ \hat \boldsymbol y = \boldsymbol x \cdot \boldsymbol \theta y ^=x ⋅θ,接下来用最小二乘法求解θ ∈ R D \boldsymbol \theta \in R^D θ∈R D
损 失 函 数 L ( θ ) = ∣ ∣ x θ − y ∣ ∣ 2 = ∣ ∣ e ∣ ∣ 2 = e T e , ( e = x θ − y ) 损失函数\mathcal{L}(\boldsymbol \theta) = || \boldsymbol x \boldsymbol \theta – \boldsymbol y||^2 = ||\boldsymbol e||^2 = \boldsymbol e^\mathrm T \boldsymbol e, (\boldsymbol e = \boldsymbol x \boldsymbol \theta – \boldsymbol y)损失函数L (θ)=∣∣x θ−y ∣∣2 =∣∣e ∣∣2 =e T e ,(e =x θ−y )
根据链式法则
∂ L ∂ θ = ∂ L ∂ e ∂ e ∂ θ = 2 e T x = 2 ( x θ − y ) T x = 2 θ T x T x − 2 y T x \displaystyle \frac {\partial \mathcal{L}} {\partial \boldsymbol \theta} = \frac {\partial \mathcal{L}} {\partial \boldsymbol e} \frac {\partial \boldsymbol e} {\partial \boldsymbol \theta}= 2\boldsymbol e^\mathrm T\boldsymbol x = 2(\boldsymbol x \boldsymbol \theta – \boldsymbol y)^\mathrm T \boldsymbol x = 2\boldsymbol \theta^\mathrm T \boldsymbol x^\mathrm T \boldsymbol x – 2\boldsymbol y^\mathrm T \boldsymbol x ∂θ∂L =∂e ∂L ∂θ∂e =2 e T x =2 (x θ−y )T x =2 θT x T x −2 y T x
令∂ L ∂ θ = 0 \displaystyle \frac {\partial \mathcal{L}} {\partial \boldsymbol \theta} = 0 ∂θ∂L =0,得到θ T x T x = y T x \boldsymbol \theta^\mathrm T \boldsymbol x^\mathrm T \boldsymbol x = \boldsymbol y^\mathrm T \boldsymbol x θT x T x =y T x,两边同时转置,得到x T x θ = x T y \boldsymbol x^\mathrm T \boldsymbol x \boldsymbol \theta = \boldsymbol x^\mathrm T \boldsymbol y x T x θ=x T y
注意x T x ∈ R D × D \displaystyle \boldsymbol x^\mathrm T \boldsymbol x \in R^{D \times D}x T x ∈R D ×D是一个半正定对称矩阵,可逆。因此,最终的解为
θ = ( x T x ) − 1 x T y \boldsymbol \theta = (\boldsymbol x^\mathrm T \boldsymbol x )^{-1}\boldsymbol x^\mathrm T \boldsymbol y θ=(x T x )−1 x T y
mathematics for machine learning
http://detexify.kirelabs.org/symbols.html
https://www.jianshu.com/p/6de552393933Original: https://blog.csdn.net/u011450367/article/details/121844479
Author: 小志8554
Title: 求解线性回归方程
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/635384/
转载文章受原作者版权保护。转载请注明原作者出处!