本节记录一下由贝叶斯定理延伸出来的几种预测性建模的方法,主要为线性判别分析(一次,二次),朴素贝叶斯(稍稍提一下贝叶斯网络)
判别分析适用于自变量连续,因变量为分类型的情形;
- 设因变量Y Y Y一共有K K K个类别;ϵ l = P ( Y = l ) \epsilon_l=P(Y=l)ϵl =P (Y =l )表示类别l l l的先验概率,满足∑ l = 1 K ϵ l = 1 \sum^K_{l=1}\epsilon_l=1 ∑l =1 K ϵl =1;f l ( x ) = f ( x ∣ Y = l ) f_l(\bold{x})=f(\bold{x}|Y=l)f l (x )=f (x ∣Y =l )表示类别Y = l Y=l Y =l的观测下自变量X = ( X 1 , ⋯ , X p ) \bold{X}=(X_1,\cdots,X_p)X =(X 1 ,⋯,X p )的概率密度函数;
由贝叶斯公式
P ( Y = l ∣ X = x ) = P ( Y = l ) f ( x ∣ Y = l ) ∑ i = 1 K P ( Y = i ) f ( x ∣ Y = i ) = ϵ l f l ( x ) ∑ i = 1 K ϵ i f i ( x ) P(Y=l|\bold{X}=\bold{x})=\frac{P(Y=l)f(\bold{x}|Y=l)}{\sum^K_{i=1}P(Y=i)f(\bold{x}|Y=i)}=\frac{\epsilon_lf_l(\bold{x})}{\sum^K_{i=1}\epsilon_if_i(\bold{x})}P (Y =l ∣X =x )=∑i =1 K P (Y =i )f (x ∣Y =i )P (Y =l )f (x ∣Y =l )=∑i =1 K ϵi f i (x )ϵl f l (x )
则预测x \bold{x}x的类别l ∗ l^l ∗为
l ∗ = a r g m a x l P ( Y = l ∣ X = x ) = a r g m a x l ϵ l f l ( x ) l^ = \underset{l}{argmax}\ P(Y=l|\bold{X}=\bold{x}) = \underset{l}{argmax}\ \epsilon_lf_l(\bold{x})l ∗=l a r g ma x P (Y =l ∣X =x )=l a r g ma x ϵl f l (x )
- 线性判别分析的假设为,观测自变量满足多元正态分布,即f l ( x ) ∼ M V N ( μ l , Σ l ) f_l(\bold{x})\sim MVN(\bold{\mu}_l,\bold{\Sigma}_l)f l (x )∼M V N (μl ,Σl )
1.1 线性判别分析
- 在上述假设中进一步假设所有类别的协方差矩阵都相等:Σ l = Σ , l = 1 , ⋯ , K \Sigma_l=\Sigma,l=1,\cdots,K Σl =Σ,l =1 ,⋯,K
前述已经将预测类别的类转化为求a r g m a x l ϵ l f l ( x ) \underset{l}{argmax}\ \epsilon_lf_l(\bold{x})l a r g ma x ϵl f l (x ),我们稍微计算一下
l o g ( ϵ l f l ( x ) ) = l o g [ ϵ l 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 e x p { − 1 2 ( x − μ l ) T Σ − 1 ( x − μ l ) } ] = δ l ( x ) + A \begin{aligned} log(\epsilon_lf_l(\bold{x}))&=log[\epsilon_l\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}exp{-\frac{1}{2}(\bold{x}-\bold{\mu}_l)^T\Sigma^{-1}(\bold{x}-\bold{\mu}_l)}]\ &= \delta_l(\bold{x})+A\ \end{aligned}l o g (ϵl f l (x ))=l o g [ϵl (2 π)2 p ∣Σ∣2 1 1 e x p {−2 1 (x −μl )T Σ−1 (x −μl )}]=δl (x )+A
其中δ l ( x ) = l o g ( ϵ l ) − 1 2 μ l T Σ − 1 μ l + μ l T Σ − 1 x \delta_l(\bold{x})=log(\epsilon_l)-\frac{1}{2}\bold{\mu}_l^T\Sigma^{-1}\bold{\mu}_l+\bold{\mu}_l^T\Sigma^{-1}\bold{x}δl (x )=l o g (ϵl )−2 1 μl T Σ−1 μl +μl T Σ−1 x,A = − p 2 l o g ( 2 π ) − 1 2 l o g ( ∣ Σ ∣ ) − 1 2 x T Σ − 1 x A=-\frac{p}{2}log(2\pi)-\frac{1}{2}log(|\Sigma|)-\frac{1}{2}\bold{x}^T\Sigma^{-1}\bold{x}A =−2 p l o g (2 π)−2 1 l o g (∣Σ∣)−2 1 x T Σ−1 x
可以看到A A A的值与类别l l l无关(对于所有的类别都是一样的),所以现在就将问题转化成了
l ∗ = a r g m a x l ϵ l f l ( x ) = a r g m a x l l o g ( ϵ l f l ( x ) ) = a r g m a x l δ l ( x ) \begin{aligned} l^* &= \underset{l}{argmax}\ \epsilon_lf_l(\bold{x})\ &= \underset{l}{argmax}\ log(\epsilon_lf_l(\bold{x}))\ &= \underset{l}{argmax}\ \delta_l(\bold{x}) \end{aligned}l ∗=l a r g ma x ϵl f l (x )=l a r g ma x l o g (ϵl f l (x ))=l a r g ma x δl (x )
而δ l ( x ) \delta_l(\bold{x})δl (x )为x \bold{x}x的线性函数,大概就是线性判别分析名称的来源吧。
1.2 二次判别分析
承上面的观点是不是最后转化成了关于x \bold{x}x的线性函数的二次函数的问题?
Bingo
- 二次判别分析的假设不再像线性判别分析那样,二次判别分析不假设各类别的协方差矩阵Σ l \Sigma_l Σl 都相等
- 那么我们再计算一下l o g ( ϵ l f l ( x ) ) log(\epsilon_lf_l(\bold{x}))l o g (ϵl f l (x ))
l o g ( ϵ l f l ( x ) ) = l o g [ ϵ l 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 e x p { − 1 2 ( x − μ l ) T Σ − 1 ( x − μ l ) } ] = ψ l ( x ) + B \begin{aligned} log(\epsilon_lf_l(\bold{x}))&=log[\epsilon_l\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}exp{-\frac{1}{2}(\bold{x}-\bold{\mu}_l)^T\Sigma^{-1}(\bold{x}-\bold{\mu}_l)}]\ &= \psi_l(\bold{x})+B\ \end{aligned}l o g (ϵl f l (x ))=l o g [ϵl (2 π)2 p ∣Σ∣2 1 1 e x p {−2 1 (x −μl )T Σ−1 (x −μl )}]=ψl (x )+B
其中ψ l ( x ) = l o g ( ϵ l − 1 2 l o g ( ∣ Σ l ∣ ) − 1 2 ( x − μ l ) T Σ − 1 ( x − μ l ) \psi_l(\bold{x})=log(\epsilon_l-\frac{1}{2}log(|\Sigma_l|)-\frac{1}{2}(\bold{x}-\bold{\mu}_l)^T\Sigma^{-1}(\bold{x}-\bold{\mu}_l)ψl (x )=l o g (ϵl −2 1 l o g (∣Σl ∣)−2 1 (x −μl )T Σ−1 (x −μl ),B = − p 2 l o g ( 2 π ) B=-\frac{p}{2}log(2\pi)B =−2 p l o g (2 π)
同理线性判别分析的分析,现在就将问题转化成了
l ∗ = a r g m a x l ϵ l f l ( x ) = a r g m a x l l o g ( ϵ l f l ( x ) ) = a r g m a x l ψ l ( x ) \begin{aligned} l^* &= \underset{l}{argmax}\ \epsilon_lf_l(\bold{x})\ &= \underset{l}{argmax}\ log(\epsilon_lf_l(\bold{x}))\ &= \underset{l}{argmax}\ \psi_l(\bold{x}) \end{aligned}l ∗=l a r g ma x ϵl f l (x )=l a r g ma x l o g (ϵl f l (x ))=l a r g ma x ψl (x )
而ψ l ( x ) \psi_l(\bold{x})ψl (x )为x \bold{x}x的二次函数;
1.3 参数估计
有三个参数需要估计:ϵ l , x l , Σ l \epsilon_l,\bold{x}_l,\Sigma_l ϵl ,x l ,Σl
S l = 1 N l − 1 ∑ i ∈ C l ( x i − x ˉ l ) ( x i − x ˉ l ) T S = ∑ l = 1 K N l − 1 N − K S l = 1 N − K ∑ l = 1 K ∑ i ∈ C l ( x i − x ˉ l ) ( x i − x ˉ l ) T \bold{S}l = \frac{1}{N_l-1}\sum{i\in C_l}(\bold{x}i-\bar{\bold{x}}_l)(\bold{x}_i-\bar{\bold{x}}_l)^T\ \bold{S} = \sum^K{l=1}\frac{N_l-1}{N-K}\bold{S}l=\frac{1}{N-K}\sum^K{l=1}\sum_{i\in C_l}(\bold{x}_i-\bar{\bold{x}}_l)(\bold{x}_i-\bar{\bold{x}}_l)^T S l =N l −1 1 i ∈C l ∑(x i −x ˉl )(x i −x ˉl )T S =l =1 ∑K N −K N l −1 S l =N −K 1 l =1 ∑K i ∈C l ∑(x i −x ˉl )(x i −x ˉl )T
朴素贝叶斯分类算法适用于因变量为分类型的情形(自变量不做要求);
还是回到第一节中的贝叶斯公式
P ( Y = l ∣ X = x ) = P ( Y = l ) f ( x ∣ Y = l ) ∑ i = 1 K P ( Y = i ) f ( x ∣ Y = i ) = ϵ l f l ( x ) ∑ i = 1 K ϵ i f i ( x ) P(Y=l|\bold{X}=\bold{x})=\frac{P(Y=l)f(\bold{x}|Y=l)}{\sum^K_{i=1}P(Y=i)f(\bold{x}|Y=i)}=\frac{\epsilon_lf_l(\bold{x})}{\sum^K_{i=1}\epsilon_if_i(\bold{x})}P (Y =l ∣X =x )=∑i =1 K P (Y =i )f (x ∣Y =i )P (Y =l )f (x ∣Y =l )=∑i =1 K ϵi f i (x )ϵl f l (x )
此时在朴素贝叶斯分类算法的假设仍然是在f l ( x ) f_l(\bold{x})f l (x )上, 给定Y Y Y 的值,X 1 , ⋯ , X p X_1,\cdots,X_p X 1 ,⋯,X p 是条件独立的。计f l ( x r ) f_l(x_r)f l (x r )为类别l l l中自变量X r X_r X r 的边缘分布;
f l ( x ) = ∏ r = 1 p f l ( x r ) f_l(\bold{x})=\prod^p_{r=1}f_l(x_r)f l (x )=r =1 ∏p f l (x r )
l ∗ = a r g m a x l ϵ l f l ( x ) = a r g m a x l l o g ( ϵ l f l ( x ) ) = a r g m a x l l o g ( ϵ l ∏ r = 1 p f l ( x r ) ) \begin{aligned} l^* &= \underset{l}{argmax}\ \epsilon_lf_l(\bold{x})\ &= \underset{l}{argmax}\ log(\epsilon_lf_l(\bold{x}))\ &= \underset{l}{argmax}\ log(\epsilon_l\prod^p_{r=1}f_l(x_r)) \end{aligned}l ∗=l a r g ma x ϵl f l (x )=l a r g ma x l o g (ϵl f l (x ))=l a r g ma x l o g (ϵl r =1 ∏p f l (x r ))
2.1 参数估计
1.3节已经把ϵ l \epsilon_l ϵl 的估计说了,下面就是f l ( x ) f_l(\bold{x})f l (x )的估计了;本节的开始也说过,朴素贝叶斯分类算法对于因变量没有特别的要求,所以需要分两类来估计:
f ^ l ( x r = γ v ) = ∣ { i ∣ i ∈ C l , x i r = γ v } ∣ N l \hat{f}l(x_r=\gamma_v)=\frac{|{i|i\in C_l,\ x{ir}=\gamma_v}|}{N_l}f ^l (x r =γv )=N l ∣{i ∣i ∈C l ,x i r =γv }∣
其中,x i r x_{ir}x i r 表示第i i i个观测的X r X_r X r 的取值。做一下平滑处理(若训练数据集中没有满足条件的观测,对应的MLR为0)
f ~ l ( x r = γ v ) = ∣ { i ∣ i ∈ C l , x i r = γ v } ∣ + n 0 N l + n 0 \tilde{f}l(x_r=\gamma_v)=\frac{|{i|i\in C_l,\ x{ir}=\gamma_v}|+n_0}{N_l+n_0}f ~l (x r =γv )=N l +n 0 ∣{i ∣i ∈C l ,x i r =γv }∣+n 0
接第2节的假设,并不是各自条件独立,存在一定的相关性(全概率公式展开简化),后面再补
最后都是通过贝叶斯公式转化为对a r g m a x l ϵ l f l ( x ) \underset{l}{argmax}\ \epsilon_lf_l(\bold{x})l a r g ma x ϵl f l (x )的求解上,在f l ( x ) f_l(\bold{x})f l (x )上做了不同的假设。
f_l(\bold{x})线性判别分析
二次判别分析
朴素贝叶斯分类算法
一条线过来的感觉真的舒爽哇!
Reference:
Original: https://blog.csdn.net/weixin_43521859/article/details/124316449
Author: 已患月岛萤综合症
Title: 数据挖掘 | 判别分析 +朴素贝叶斯分类算法
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/697078/
转载文章受原作者版权保护。转载请注明原作者出处!