数据挖掘 | 判别分析 +朴素贝叶斯分类算法

本节记录一下由贝叶斯定理延伸出来的几种预测性建模的方法,主要为线性判别分析(一次,二次),朴素贝叶斯(稍稍提一下贝叶斯网络)

判别分析适用于自变量连续,因变量为分类型的情形;

  • 设因变量Y Y Y一共有K K K个类别;ϵ l = P ( Y = l ) \epsilon_l=P(Y=l)ϵl ​=P (Y =l )表示类别l l l的先验概率,满足∑ l = 1 K ϵ l = 1 \sum^K_{l=1}\epsilon_l=1 ∑l =1 K ​ϵl ​=1;f l ( x ) = f ( x ∣ Y = l ) f_l(\bold{x})=f(\bold{x}|Y=l)f l ​(x )=f (x ∣Y =l )表示类别Y = l Y=l Y =l的观测下自变量X = ( X 1 , ⋯ , X p ) \bold{X}=(X_1,\cdots,X_p)X =(X 1 ​,⋯,X p ​)的概率密度函数;

由贝叶斯公式
P ( Y = l ∣ X = x ) = P ( Y = l ) f ( x ∣ Y = l ) ∑ i = 1 K P ( Y = i ) f ( x ∣ Y = i ) = ϵ l f l ( x ) ∑ i = 1 K ϵ i f i ( x ) P(Y=l|\bold{X}=\bold{x})=\frac{P(Y=l)f(\bold{x}|Y=l)}{\sum^K_{i=1}P(Y=i)f(\bold{x}|Y=i)}=\frac{\epsilon_lf_l(\bold{x})}{\sum^K_{i=1}\epsilon_if_i(\bold{x})}P (Y =l ∣X =x )=∑i =1 K ​P (Y =i )f (x ∣Y =i )P (Y =l )f (x ∣Y =l )​=∑i =1 K ​ϵi ​f i ​(x )ϵl ​f l ​(x )​

则预测x \bold{x}x的类别l ∗ l^l ∗为
l ∗ = a r g m a x l P ( Y = l ∣ X = x ) = a r g m a x l ϵ l f l ( x ) l^
= \underset{l}{argmax}\ P(Y=l|\bold{X}=\bold{x}) = \underset{l}{argmax}\ \epsilon_lf_l(\bold{x})l ∗=l a r g ma x ​P (Y =l ∣X =x )=l a r g ma x ​ϵl ​f l ​(x )

  • 线性判别分析的假设为,观测自变量满足多元正态分布,即f l ( x ) ∼ M V N ( μ l , Σ l ) f_l(\bold{x})\sim MVN(\bold{\mu}_l,\bold{\Sigma}_l)f l ​(x )∼M V N (μl ​,Σl ​)

1.1 线性判别分析

  • 在上述假设中进一步假设所有类别的协方差矩阵都相等:Σ l = Σ , l = 1 , ⋯ , K \Sigma_l=\Sigma,l=1,\cdots,K Σl ​=Σ,l =1 ,⋯,K

前述已经将预测类别的类转化为求a r g m a x l ϵ l f l ( x ) \underset{l}{argmax}\ \epsilon_lf_l(\bold{x})l a r g ma x ​ϵl ​f l ​(x ),我们稍微计算一下
l o g ( ϵ l f l ( x ) ) = l o g [ ϵ l 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 e x p { − 1 2 ( x − μ l ) T Σ − 1 ( x − μ l ) } ] = δ l ( x ) + A \begin{aligned} log(\epsilon_lf_l(\bold{x}))&=log[\epsilon_l\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}exp{-\frac{1}{2}(\bold{x}-\bold{\mu}_l)^T\Sigma^{-1}(\bold{x}-\bold{\mu}_l)}]\ &= \delta_l(\bold{x})+A\ \end{aligned}l o g (ϵl ​f l ​(x ))​=l o g [ϵl ​(2 π)2 p ​∣Σ∣2 1 ​1 ​e x p {−2 1 ​(x −μl ​)T Σ−1 (x −μl ​)}]=δl ​(x )+A ​
其中δ l ( x ) = l o g ( ϵ l ) − 1 2 μ l T Σ − 1 μ l + μ l T Σ − 1 x \delta_l(\bold{x})=log(\epsilon_l)-\frac{1}{2}\bold{\mu}_l^T\Sigma^{-1}\bold{\mu}_l+\bold{\mu}_l^T\Sigma^{-1}\bold{x}δl ​(x )=l o g (ϵl ​)−2 1 ​μl T ​Σ−1 μl ​+μl T ​Σ−1 x,A = − p 2 l o g ( 2 π ) − 1 2 l o g ( ∣ Σ ∣ ) − 1 2 x T Σ − 1 x A=-\frac{p}{2}log(2\pi)-\frac{1}{2}log(|\Sigma|)-\frac{1}{2}\bold{x}^T\Sigma^{-1}\bold{x}A =−2 p ​l o g (2 π)−2 1 ​l o g (∣Σ∣)−2 1 ​x T Σ−1 x
可以看到A A A的值与类别l l l无关(对于所有的类别都是一样的),所以现在就将问题转化成了
l ∗ = a r g m a x l ϵ l f l ( x ) = a r g m a x l l o g ( ϵ l f l ( x ) ) = a r g m a x l δ l ( x ) \begin{aligned} l^* &= \underset{l}{argmax}\ \epsilon_lf_l(\bold{x})\ &= \underset{l}{argmax}\ log(\epsilon_lf_l(\bold{x}))\ &= \underset{l}{argmax}\ \delta_l(\bold{x}) \end{aligned}l ∗​=l a r g ma x ​ϵl ​f l ​(x )=l a r g ma x ​l o g (ϵl ​f l ​(x ))=l a r g ma x ​δl ​(x )​

而δ l ( x ) \delta_l(\bold{x})δl ​(x )为x \bold{x}x的线性函数,大概就是线性判别分析名称的来源吧。

1.2 二次判别分析

承上面的观点是不是最后转化成了关于x \bold{x}x的线性函数的二次函数的问题?
Bingo

  • 二次判别分析的假设不再像线性判别分析那样,二次判别分析不假设各类别的协方差矩阵Σ l \Sigma_l Σl ​都相等
  • 那么我们再计算一下l o g ( ϵ l f l ( x ) ) log(\epsilon_lf_l(\bold{x}))l o g (ϵl ​f l ​(x ))

l o g ( ϵ l f l ( x ) ) = l o g [ ϵ l 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 e x p { − 1 2 ( x − μ l ) T Σ − 1 ( x − μ l ) } ] = ψ l ( x ) + B \begin{aligned} log(\epsilon_lf_l(\bold{x}))&=log[\epsilon_l\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}exp{-\frac{1}{2}(\bold{x}-\bold{\mu}_l)^T\Sigma^{-1}(\bold{x}-\bold{\mu}_l)}]\ &= \psi_l(\bold{x})+B\ \end{aligned}l o g (ϵl ​f l ​(x ))​=l o g [ϵl ​(2 π)2 p ​∣Σ∣2 1 ​1 ​e x p {−2 1 ​(x −μl ​)T Σ−1 (x −μl ​)}]=ψl ​(x )+B ​

其中ψ l ( x ) = l o g ( ϵ l − 1 2 l o g ( ∣ Σ l ∣ ) − 1 2 ( x − μ l ) T Σ − 1 ( x − μ l ) \psi_l(\bold{x})=log(\epsilon_l-\frac{1}{2}log(|\Sigma_l|)-\frac{1}{2}(\bold{x}-\bold{\mu}_l)^T\Sigma^{-1}(\bold{x}-\bold{\mu}_l)ψl ​(x )=l o g (ϵl ​−2 1 ​l o g (∣Σl ​∣)−2 1 ​(x −μl ​)T Σ−1 (x −μl ​),B = − p 2 l o g ( 2 π ) B=-\frac{p}{2}log(2\pi)B =−2 p ​l o g (2 π)
同理线性判别分析的分析,现在就将问题转化成了
l ∗ = a r g m a x l ϵ l f l ( x ) = a r g m a x l l o g ( ϵ l f l ( x ) ) = a r g m a x l ψ l ( x ) \begin{aligned} l^* &= \underset{l}{argmax}\ \epsilon_lf_l(\bold{x})\ &= \underset{l}{argmax}\ log(\epsilon_lf_l(\bold{x}))\ &= \underset{l}{argmax}\ \psi_l(\bold{x}) \end{aligned}l ∗​=l a r g ma x ​ϵl ​f l ​(x )=l a r g ma x ​l o g (ϵl ​f l ​(x ))=l a r g ma x ​ψl ​(x )​
而ψ l ( x ) \psi_l(\bold{x})ψl ​(x )为x \bold{x}x的二次函数;

1.3 参数估计

有三个参数需要估计:ϵ l , x l , Σ l \epsilon_l,\bold{x}_l,\Sigma_l ϵl ​,x l ​,Σl ​

S l = 1 N l − 1 ∑ i ∈ C l ( x i − x ˉ l ) ( x i − x ˉ l ) T S = ∑ l = 1 K N l − 1 N − K S l = 1 N − K ∑ l = 1 K ∑ i ∈ C l ( x i − x ˉ l ) ( x i − x ˉ l ) T \bold{S}l = \frac{1}{N_l-1}\sum{i\in C_l}(\bold{x}i-\bar{\bold{x}}_l)(\bold{x}_i-\bar{\bold{x}}_l)^T\ \bold{S} = \sum^K{l=1}\frac{N_l-1}{N-K}\bold{S}l=\frac{1}{N-K}\sum^K{l=1}\sum_{i\in C_l}(\bold{x}_i-\bar{\bold{x}}_l)(\bold{x}_i-\bar{\bold{x}}_l)^T S l ​=N l ​−1 1 ​i ∈C l ​∑​(x i ​−x ˉl ​)(x i ​−x ˉl ​)T S =l =1 ∑K ​N −K N l ​−1 ​S l ​=N −K 1 ​l =1 ∑K ​i ∈C l ​∑​(x i ​−x ˉl ​)(x i ​−x ˉl ​)T

朴素贝叶斯分类算法适用于因变量为分类型的情形(自变量不做要求);

还是回到第一节中的贝叶斯公式
P ( Y = l ∣ X = x ) = P ( Y = l ) f ( x ∣ Y = l ) ∑ i = 1 K P ( Y = i ) f ( x ∣ Y = i ) = ϵ l f l ( x ) ∑ i = 1 K ϵ i f i ( x ) P(Y=l|\bold{X}=\bold{x})=\frac{P(Y=l)f(\bold{x}|Y=l)}{\sum^K_{i=1}P(Y=i)f(\bold{x}|Y=i)}=\frac{\epsilon_lf_l(\bold{x})}{\sum^K_{i=1}\epsilon_if_i(\bold{x})}P (Y =l ∣X =x )=∑i =1 K ​P (Y =i )f (x ∣Y =i )P (Y =l )f (x ∣Y =l )​=∑i =1 K ​ϵi ​f i ​(x )ϵl ​f l ​(x )​
此时在朴素贝叶斯分类算法的假设仍然是在f l ( x ) f_l(\bold{x})f l ​(x )上, 给定Y Y Y 的值,X 1 , ⋯ , X p X_1,\cdots,X_p X 1 ​,⋯,X p ​ 是条件独立的。计f l ( x r ) f_l(x_r)f l ​(x r ​)为类别l l l中自变量X r X_r X r ​的边缘分布;
f l ( x ) = ∏ r = 1 p f l ( x r ) f_l(\bold{x})=\prod^p_{r=1}f_l(x_r)f l ​(x )=r =1 ∏p ​f l ​(x r ​)
l ∗ = a r g m a x l ϵ l f l ( x ) = a r g m a x l l o g ( ϵ l f l ( x ) ) = a r g m a x l l o g ( ϵ l ∏ r = 1 p f l ( x r ) ) \begin{aligned} l^* &= \underset{l}{argmax}\ \epsilon_lf_l(\bold{x})\ &= \underset{l}{argmax}\ log(\epsilon_lf_l(\bold{x}))\ &= \underset{l}{argmax}\ log(\epsilon_l\prod^p_{r=1}f_l(x_r)) \end{aligned}l ∗​=l a r g ma x ​ϵl ​f l ​(x )=l a r g ma x ​l o g (ϵl ​f l ​(x ))=l a r g ma x ​l o g (ϵl ​r =1 ∏p ​f l ​(x r ​))​

2.1 参数估计

1.3节已经把ϵ l \epsilon_l ϵl ​的估计说了,下面就是f l ( x ) f_l(\bold{x})f l ​(x )的估计了;本节的开始也说过,朴素贝叶斯分类算法对于因变量没有特别的要求,所以需要分两类来估计:

f ^ l ( x r = γ v ) = ∣ { i ∣ i ∈ C l , x i r = γ v } ∣ N l \hat{f}l(x_r=\gamma_v)=\frac{|{i|i\in C_l,\ x{ir}=\gamma_v}|}{N_l}f ^​l ​(x r ​=γv ​)=N l ​∣{i ∣i ∈C l ​,x i r ​=γv ​}∣​
其中,x i r x_{ir}x i r ​表示第i i i个观测的X r X_r X r ​的取值。做一下平滑处理(若训练数据集中没有满足条件的观测,对应的MLR为0)
f ~ l ( x r = γ v ) = ∣ { i ∣ i ∈ C l , x i r = γ v } ∣ + n 0 N l + n 0 \tilde{f}l(x_r=\gamma_v)=\frac{|{i|i\in C_l,\ x{ir}=\gamma_v}|+n_0}{N_l+n_0}f ~​l ​(x r ​=γv ​)=N l ​+n 0 ​∣{i ∣i ∈C l ​,x i r ​=γv ​}∣+n 0 ​​

接第2节的假设,并不是各自条件独立,存在一定的相关性(全概率公式展开简化),后面再补

最后都是通过贝叶斯公式转化为对a r g m a x l ϵ l f l ( x ) \underset{l}{argmax}\ \epsilon_lf_l(\bold{x})l a r g ma x ​ϵl ​f l ​(x )的求解上,在f l ( x ) f_l(\bold{x})f l ​(x )上做了不同的假设。

f_l(\bold{x})线性判别分析

二次判别分析

朴素贝叶斯分类算法

一条线过来的感觉真的舒爽哇!

Reference:

Original: https://blog.csdn.net/weixin_43521859/article/details/124316449
Author: 已患月岛萤综合症
Title: 数据挖掘 | 判别分析 +朴素贝叶斯分类算法

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/697078/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球