# 机器学习中的损失函数

(C_1,C_2)是要区分的两个类别，通过分类函数执行时得到的值与阈值的大小关系来决定类别归属，例如：

[g(x) = g(w^Tx+b) ]

## 函数间隔和几何间隔

[\gamma_i=y_i*(w^Tx_i+b) ]

[\hat{\gamma} = \frac{w^Tx_i+b}{||w||} ]

(\hat{\gamma})可以看出就是二维平面中，点到直线的距离，高维下便是点到平面的距离。考虑正反例：

[\hat{\gamma} = y_i*{\frac{w^Tx_i+b}{||w||}} ]

## 最优间隔分类器

[\max \gamma \ s.t.\ y_i(w^Tx_i+b)>\gamma,i=1,…,m \ ||w||=1 ]

[\min \frac{1}{2} ||w||^2 \ s.t.\ y_i(w^Tx_i+b)\geq1, i=i,…,m ]

# 损失函数

[\theta^* = \arg \min_\theta \frac{1}{N}{}\sum_{i=1}^{N} L(y_i, f(x_i; \theta)) + \lambda\ \Phi(\theta) ]

# 常用的损失函数

• 铰链损失(Hinge Loss)：主要用于支持向量机 SVM中；
• 交叉熵损失(Cross Entropy Loss/Softmax Loss)：用于逻辑回归问题；
• 平方损失(Square Loss)：用于最小二乘问题；
• 其他特定场景有奇效的loss

## Hinge Loss

[L(x_i)=\max(0, 1-f(m_i,w)) ]

[\underset{w,\zeta}{argmin} \frac{1}{2}||w||^2+ C\sum_i \zeta_i \ st.\quad \forall y_iw^Tx_i \geq 1- \zeta_i \ \zeta_i \geq 0 ]

[\begin{aligned} J(w)&=\frac{1}{2}||w||^2 + C\sum_i max(0,1-y_iw^Tx_i) \ &= \frac{1}{2}||w||^2 + C\sum_i max(0,1-m_i(w)) \ &= \frac{1}{2}||w||^2 + C\sum_i L_{Hinge}(m_i) \end{aligned} ]

SVM的损失函数实质可看作是L2-norm和Hinge loss之和。

## Softmax Loss

[L=-log(P(Y|X)) ]

[P(y=1|x;\theta) = h(x) \ P(y=0|x;\theta) = 1-h(x) ]

[p(y|x;\theta)=h(x)^y(1-h(x))^{(1-y)} ]

[L(\theta) = \prod_{i=1}^n{h(x)^y(1-h(x))^{(1-y)}} ]

[\begin{split} \ell(\theta)&=log(L(\theta)) \ &= \sum_{i=1}^m{ylog(h(x)) + (1-y)log(1-h(x))} \end{split} ]

## Squares Loss

[L(Y, f(X)) =\sum_{i=1} ^n (Y-f(X))^2 ]

## Exponentially Loss

[L(Y,f(X)) = \frac{1}{2} \sum_{i=1}^n \exp[-y_if(x_i)] ]

[C_{(m-1)}(x_i) = \alpha_1k_1(x_i) + \cdots + \alpha_{m-1}k_{m-1}(x_i) ]

[C_{m}(x_i) = C_{(m-1)}(x_i) + \alpha_m k_m(x_i) ]

[E = \sum_{i=1}^N e^{-y_i C_m(x_i)} ]

[E = \sum_{i=1}^N w_i^{(m)}e^{-y_i\alpha_m k_m(x_i)} ]

[E = \sum_{y_i = k_m(x_i)} w_i^{(m)}e^{-\alpha_m} + \sum_{y_i \neq k_m(x_i)} w_i^{(m)}e^{\alpha_m} ]

[= \sum_{i=1}^N w_i^{(m)}e^{-\alpha_m} + \sum_{y_i \neq k_m(x_i)} w_i^{(m)}(e^{\alpha_m}-e^{-\alpha_m}) ]

[\frac{d E}{d \alpha_m} = \frac{d (\sum_{y_i = k_m(x_i)} w_i^{(m)}e^{-\alpha_m} + \sum_{y_i \neq k_m(x_i)} w_i^{(m)}e^{\alpha_m}) }{d \alpha_m} ]

[\alpha_m = \frac{1}{2}\ln\left(\frac{\sum_{y_i = k_m(x_i)} w_i^{(m)}}{\sum_{y_i \neq k_m(x_i)} w_i^{(m)}}\right) ]

[\alpha_m = \frac{1}{2}\ln\left( \frac{1 – \epsilon_m}{\epsilon_m}\right) ]

Original: https://www.cnblogs.com/houkai/p/10160673.html
Author: 侯凯
Title: 机器学习中的损失函数

(0)