逻辑回归(Logistic Regression)

逻辑回归(Logistic Regression)

文章目录

1 变量表

符号含义
m m m

训练集样本数量
n n n

特征数量
( x ( i ) , y ( i ) ) ({{x}^{(i)}},{{y}^{(i)}})(x (i ),y (i ))

训练集中第
i i i

个样本,
y ( i ) ∈ { 1 , 0 } {y}^{(i)}\in{{1,0}}y (i )∈{1 ,0 }x ( i ) {x}^{(i)}x (i )i i i

个输入值
y ^ ( i ) \hat{y}^{(i)}y ^​(i )i i i

个输出值
x j ( i ) {x}^{(i)}_j x j (i )​i i i

个输入值的第
j j j

个特征
h h h

假设函数

2 逻辑回归模型

逻辑回归(Logistic Regression)

图1 逻辑回归模型

图1中,h h h代表h y p o t h e s i s hypothesis h y p o t h e s i s 假设(📑在机器学习中一般这么叫,andrew觉得不太好听~),它一般是模型输入的函数,在二分类逻辑回归模型中,h h h的表达方式为 :
{ h θ ( x ( i ) ) = g ( z ) = 1 1 + e − z z = θ T x ( i ) (1) \begin{cases} h_{\theta} (x^{(i)})=g(z)=\frac{1}{1+{{e}^{-z}}}\ z=\theta^Tx^{(i)} \end{cases} \tag{1}{h θ​(x (i ))=g (z )=1 +e −z 1 ​z =θT x (i )​(1 )

其中,
θ = [ θ 0 , θ 1 , ⋅ ⋅ ⋅ , θ n ] T x ( i ) = [ x ( 0 ) , x ( 1 ) , ⋅ ⋅ ⋅ , x ( n ) ] T \begin{aligned} \theta &=[\theta_{0},\theta_{1},···,\theta_{n}]^T\ x^{(i)}&=[x^{(0)},x^{(1)},···,x^{(n)}]^T \end{aligned}θx (i )​=[θ0 ​,θ1 ​,⋅⋅⋅,θn ​]T =[x (0 ),x (1 ),⋅⋅⋅,x (n )]T ​

θ 0 , θ 1 , ⋅ ⋅ ⋅ , θ n \theta_0, \theta_1 ,···,\theta_n θ0 ​,θ1 ​,⋅⋅⋅,θn ​为 模型参数,x 0 ( i ) x_{0}^{(i)}x 0 (i )​规定为1。g ( z ) g(z)g (z )如图2所示:

逻辑回归(Logistic Regression)

图2

实际上,h θ ( x ( i ) ) h_{\theta} (x^{(i)})h θ​(x (i ))的作用是对于给定的输入变量x ( i ) x^{(i)}x (i ),根据选择的参数计算输出变量y ^ ( i ) = 1 \hat{y}^{(i)}=1 y ^​(i )=1的可能性( estimated probablity)即:
h θ ( x ( i ) ) = P ( y ^ ( i ) = 1 ∣ x ( i ) ; θ ) (2) h_\theta \left( x^{(i)} \right)=P\left( \hat{y}^{(i)}=1|x^{(i)};\theta \right)\tag{2}h θ​(x (i ))=P (y ^​(i )=1 ∣x (i );θ)(2 )

判定边界为0.5 0.5 0 .5时, 模型输出表达式为:
y ^ ( i ) = { 1 , h θ ( x ( i ) ) ≥ 0.5 ( θ T x ( i ) ≥ 0 ) 0 , h θ ( x ( i ) ) < 0.5 ( θ T x ( i ) < 0 ) (3) \hat{y}^{(i)}= \begin{cases} 1,h_{\theta} (x^{(i)})\ge0.5\ \ (\theta^Tx^{(i)}\ge0)\ 0,h_{\theta} (x^{(i)})< 0.5\ \ (\theta^Tx^{(i)}< 0) \end{cases}\tag{3}y ^​(i )={1 ,h θ​(x (i ))≥0 .5 (θT x (i )≥0 )0 ,h θ​(x (i ))<0 .5 (θT x (i )<0 )​(3 )

; 3 代价函数(Cost Function)

建模误差,指的是模型输出值(预测或估计值)与训练集中实际值之间的差距,即:
e r r o r = y ^ ( i ) − y ( i ) (4) error=\hat{y}^{(i)}-y^{(i)}\tag{4}e r r o r =y ^​(i )−y (i )(4 )

考虑模型所有的输出值与实际值的差距,通常用 代价函数来综合评估 建模误差逻辑回归模型的 代价函数表达式一般为:
{ J ( θ ) = 1 m ∑ i = 1 m c o s t ( y ^ ( i ) , y ( i ) ) c o s t ( y ^ ( i ) , y ( i ) ) = − y ( i ) log ⁡ ( h θ ( x ( i ) ) ) − ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) (5) \begin{cases} J ( \theta) = \frac{1}{m}\sum\limits_{i=1}^m cost(\hat{y}^{(i)},y^{(i)}) \ cost(\hat{y}^{(i)},y^{(i)})=-{{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right) \tag{5} \end{cases}⎩⎨⎧​J (θ)=m 1 ​i =1 ∑m ​c o s t (y ^​(i ),y (i ))c o s t (y ^​(i ),y (i ))=−y (i )lo g (h θ​(x (i )))−(1 −y (i ))lo g (1 −h θ​(x (i )))​(5 )

c o s t ( y ^ ( i ) , y ( i ) ) cost(\hat{y}^{(i)},y^{(i)})c o s t (y ^​(i ),y (i ))随h θ ( x ( i ) ) h_\theta\left( {{x}^{(i)}} \right)h θ​(x (i ))的变化情况如图3所示:

逻辑回归(Logistic Regression)

图3

由图3可知:
当实际的 y ( i ) = 1 y^{(i)}=1 y (i )=1 且h θ ( x ( i ) ) {h_\theta}( x ^{(i)})h θ​(x (i ))也为 1 时误差为 0,当 y ( i ) = 1 y^{(i)}=1 y (i )=1 但h θ ( x ( i ) ) {h_\theta}( x ^{(i)})h θ​(x (i ))不为1时误差随着h θ ( x ( i ) ) {h_\theta}( x ^{(i)})h θ​(x (i ))变小而变大。
当实际的 y ( i ) = 0 y^{(i)}=0 y (i )=0 且h θ ( x ( i ) ) {h_\theta}( x ^{(i)})h θ​(x (i ))也为 0 时代价为 0,当y ( i ) = 0 y^{(i)}=0 y (i )=0 但h θ ( x ( i ) ) {h_\theta}( x ^{(i)})h θ​(x (i ))不为 0时误差随着h θ ( x ( i ) ) {h_\theta}( x ^{(i)})h θ​(x (i ))变大而变大。

当n = 1 n=1 n =1时, J ( θ 0 , θ 1 ) J (\theta_0, \theta_1)J (θ0 ​,θ1 ​)示意如图4所示:

逻辑回归(Logistic Regression)

图4 n=1时损失函数示意图

4 求解模型参数

求解模型参数的过程可以表示为 无约束最优化问题,即:
m i n θ ∈ R n J ( θ ) (6) \mathop{min}\limits_{\theta \in R^n}J(\theta) \tag{6}θ∈R n min ​J (θ)(6 )
故,θ = a r g m i n θ ∈ R n J ( θ ) \theta=arg \mathop{min}\limits_{\theta \in R^n}J(\theta)θ=a r g θ∈R n min ​J (θ)。

4.1 梯度下降(Gradient Descent)

梯度下降是一个用来求函数最小值的方法,该方法通过不断的迭代 模型参数,使得目标函数(这里即为代价函数J J J)达到局部最小值。

梯度下降的迭代公式为:
{ θ 0 : = θ 0 − α ∂ ∂ θ 0 J ( θ ) θ 1 : = θ 1 − α ∂ ∂ θ 1 J ( θ ) ⋅ ⋅ ⋅ θ n : = θ n − α ∂ ∂ θ n J ( θ ) (7) \begin{cases} {\theta_{0}}:={\theta_{0}}-\alpha \frac{\partial }{\partial {\theta_{0}}} J(\theta) \ {\theta_{1}}:={\theta_{1}}-\alpha \frac{\partial }{\partial {\theta_{1}}} J(\theta) \···\{\theta_{n}}:={\theta_{n}}-\alpha \frac{\partial }{\partial {\theta_{n}}} J(\theta) \end{cases}\tag{7}⎩⎪⎪⎪⎨⎪⎪⎪⎧​θ0 ​:=θ0 ​−α∂θ0 ​∂​J (θ)θ1 ​:=θ1 ​−α∂θ1 ​∂​J (θ)⋅⋅⋅θn ​:=θn ​−α∂θn ​∂​J (θ)​(7 )

其中,α \alpha α称为 学习率,∂ ∂ θ j J ( θ ) = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \frac{\partial }{\partial {\theta_{j}}}J(\theta)=\frac{1}{m}\sum\limits_{i=1}^{m}{({{h}{\theta }}({{x}^{(i)}})-{{y}^{(i)}})}x{j}^{(i)}∂θj ​∂​J (θ)=m 1 ​i =1 ∑m ​(h θ​(x (i ))−y (i ))x j (i )​。
式( 7 ) (7)(7 )写成矩阵形式为:
θ : = θ − α ∇ J ( θ ) (8) \theta:=\theta-\alpha \nabla J(\theta) \tag{8}θ:=θ−α∇J (θ)(8 )
其中,∇ J ( θ ) = [ ∂ ∂ θ 0 J ( θ ) , ∂ ∂ θ 1 J ( θ ) , ⋅ ⋅ ⋅ , ∂ ∂ θ n J ( θ ) ] T \nabla J(\theta)=[\frac{\partial }{\partial {\theta_{0}}}J(\theta),\frac{\partial }{\partial {\theta_{1}}}J(\theta),···,\frac{\partial }{\partial {\theta_{n}}}J(\theta)]^T ∇J (θ)=[∂θ0 ​∂​J (θ),∂θ1 ​∂​J (θ),⋅⋅⋅,∂θn ​∂​J (θ)]T。

参数可取方法
模型参数初始值

随机选取
学习率0.01 → 0.03 → 0.1 → ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.01\rightarrow0.03\rightarrow0.1\rightarrow······0 .0 1 →0 .0 3 →0 .1 →⋅⋅⋅⋅⋅⋅ 迭代停止条件

代价函数收敛

判断代价函数J ( θ ) J(\theta)J (θ)收敛的办法一般为画出” 代价函数值——迭代次数“图,如图5所示。即每进行一次式( 7 ) (7)(7 )的计算,便进行一次式( 5 ) (5)(5 )的计算。

逻辑回归(Logistic Regression)

图5 损失函数值随迭代次数变化示意图

; 4.2 其他算法

共轭梯度法、BFGS(变尺度法)、L-BFGS(限制变尺度法)······

5 一对多(one-vs-all)

当y ( i ) ∈ { 1 , 0 } → y ( i ) ∈ { 1 , 2 , 3 , ⋅ ⋅ ⋅ , k , ⋅ ⋅ ⋅ } {y}^{(i)}\in{{1,0}} \to{y}^{(i)}\in{{1,2,3,···,k,···}}y (i )∈{1 ,0 }→y (i )∈{1 ,2 ,3 ,⋅⋅⋅,k ,⋅⋅⋅}时:
将y ( i ) = 1 {y}^{(i)}=1 y (i )=1标记为 正向类,将y ( i ) ≠ 1 {y}^{(i)}\ne1 y (i )​=1标记为 负向类,构建二分类逻辑回归模型h θ ( 1 ) ( x ( i ) ) h_\theta^{( 1 )}\left( x^{(i)} \right)h θ(1 )​(x (i ));
将y ( i ) = 2 {y}^{(i)}=2 y (i )=2标记为 正向类,将y ( i ) ≠ 2 {y}^{(i)}\ne2 y (i )​=2标记为 负向类,构建二分类逻辑回归模型h θ ( 2 ) ( x ( i ) ) h_\theta^{\left( 2\right)}\left( x^{(i)} \right)h θ(2 )​(x (i ));
······
将y ( i ) = k {y}^{(i)}=k y (i )=k标记为 正向类,将y ( i ) ≠ k {y}^{(i)}\ne k y (i )​=k标记为 负向类,构建二分类逻辑回归模型h θ ( k ) ( x ( i ) ) h_\theta^{\left( k\right)}\left( x^{(i)} \right)h θ(k )​(x (i ));
······
式( 2 ) (2)(2 )转化为:
h θ ( k ) ( x ( i ) ) = p ( y ( i ) = k ∣ x ( i ) ; θ ) (9) h_\theta^{\left( k\right)}\left( x^{(i)} \right)=p\left( y^{(i)}=k|x^{(i)};\theta \right)\tag{9}h θ(k )​(x (i ))=p (y (i )=k ∣x (i );θ)(9 )

式( 3 ) (3)(3 )转化为:
{ y ^ ( i ) = k h θ ( k ) ( x ( i ) ) = max ⁡ i h θ ( k ) ( x ( i ) ) (10) \begin{cases} \hat{y}^{(i)}=k\ h_\theta^{\left( k \right)}(x^{(i)})=\mathop{\max}\limits_i\,h_\theta^{\left( k \right)}\left( x^{(i)} \right) \tag{10} \end{cases}{y ^​(i )=k h θ(k )​(x (i ))=i max ​h θ(k )​(x (i ))​(1 0 )

式( 10 ) (10)(1 0 )即为 一对多逻辑回归模型的输出表达式。

补充说明📝:

最优化相关知识后面看情况再补上。(ps:上课笔记丢了,难受n+1😭)
• B a t c h Batch B a t c h梯度下降是指计算代价函数时用到了训练集所有样本。
• 机器学习中的” 梯度下降“实际上是指 最速下降算法,即搜索方向为 负梯度方向学习率即为 搜索步长
• 在 逻辑回归模型中,代价函数J ( θ ) J(\theta)J (θ)为 凸函数,故求得的局部最小值即为全局最小值。
• 学习率取值不合适可能导致代价函数发散、不收敛或收敛速度慢等问题。

学习和温习记录,欢迎各位一起交流讨论,如有错误的地方或者建议,恳请在评论区批评和指导,博客不定期修改和更新。

Original: https://blog.csdn.net/qq_42800926/article/details/121966987
Author: 西红柿爱喝水
Title: 逻辑回归(Logistic Regression)

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/700126/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球