arcface的前世今生

2023年9月30日上午6:58 • Python • 阅读 44

arcface调研

1.简介
*
1.1发表
1.2优点
1.3性能
2.arcface的前世今生
*
2.1 softmax
2.2center loss
2.2 L-softmax
–
- 决策边界的概念
2.3A-softmax
2.4Cosface
2.5 arcface
3.参考
*
论文
博客

1.简介

1.1发表

ArcFace/InsightFace（弧度）是伦敦帝国理工学院邓建康等在2018.01发表，在SphereFace基础上改进了对特征向量归一化和加性角度间隔，提高了类间可分性同时加强类内紧度和类间差异。
论文链接：ArcFace: Additive Angular Margin Loss for Deep Face Recognition

1.2优点

ArcFace loss：Additive Angular Margin Loss（加性角度间隔损失函数），对特征向量和权重归一化，对θ加上角度间隔m，角度间隔比余弦间隔在对角度的影响更加直接。几何上有恒定的线性角度margen。
ArcFace中是直接在角度空间θ中最大化分类界限，而CosFace是在余弦空间cos(θ)中最大化分类界限。

1.3性能

LFW上99.83%，YTF上98.02%

2.arcface的前世今生

作为基于 softmax 改进的损失函数，arcface loss 的出现不是一簇而就的，在 arcface loss 之前有大量的前人的工作：

center loss，L-softmax loss，A-softmax loss，CosFace loss 等

2.1 softmax

公式：
L 1 = − 1 N ∑ i = 1 N l o g e W y i T x i + b y i ∑ j = 1 N W y i T x i + b y i L_{1} = – \frac{1}{N} \sum_{i=1}^{N} {log \frac{e^{W^{T}{y{i}} x_{i} + b_{y_{i}}}} {\sum_{j=1}^{N}{W^{T}{y{i}} x_{i} + b_{y_{i}}}}}L 1 =−N 1 i =1 ∑N l o g ∑j =1 N W y i T x i +b y i e W y i T x i +b y i

2.2center loss

论文：[ECCV 2016] A Discriminative Feature Learning Approach for Deep Face Recognition

ECCV 2016的这篇文章主要是提出了一个新的Loss：Center Loss，用以辅助Softmax Loss进行人脸的训练，主要目的是利用softmax loss来分开不同类别，利用center loss来压缩同一类别，最终获取discriminative features。

其原理是增加惩罚项，约束每一类向中心聚集

公式：
L 1 = − 1 N ∑ i = 1 N l o g e W y i T x i + b y i ∑ j = 1 N W y i T x i + b y i + λ 2 ∑ i = 1 N ∣ ∣ x i − c y i ∣ ∣ 2 L_{1} = – \frac{1}{N} \sum_{i=1}^{N} {log \frac{e^{W^{T}{y{i}} x_{i} + b_{y_{i}}}} {\sum_{j=1}^{N}{W^{T}{y{i}} x_{i} + b_{y_{i}}}}} + \frac{\lambda}{2} \sum_{i=1}^{N} {\mid\mid x_{i} – cy_{i} \mid\mid ^{2}}L 1 =−N 1 i =1 ∑N l o g ∑j =1 N W y i T x i +b y i e W y i T x i +b y i +2 λi =1 ∑N ∣∣x i −c y i ∣∣2

其中：y i y_{i}y i 代表第 i 类的中心。y i y_{i}y i 的求解过程相对复杂。
在训练的初始，y i y_{i}y i 由Xavier初始化，即刚开始并不指向类中心，然后每个 iterator更新一次y i y_{i}y i 。
其更新方式类似于反向传播。

通过对x i − c y i x_{i} – cy_{i}x i −c y i 求导，得到下述表达式：
Δ c y i = ∑ i = 1 N δ ( y i = j ) ( c j − x i ) 1 + ∑ i = 1 N δ ( y i = j ) \Delta cy_{i} = \frac{\sum_{i=1}^{N} {\delta (y_{i=j})(c_{j} – x_{i}) } } {1 + \sum_{i=1} ^{N} {\delta(y_{i=j})} }Δc y i =1 +∑i =1 N δ(y i =j )∑i =1 N δ(y i =j )(c j −x i )
按导数值Δ c y i \Delta cy_{i}Δc y i 随着训练逐渐逼近类中心点。

贡献：
能够更好的区分细节，即当两个输入样本十分相似时，能够对细节更好的建模。
在一定程度上提升了类间距，缩小类内距。提出了 softmax 的类间距较小的问题，开启优化softmax的先河
缺点：
训练不稳定，收敛慢，容易过拟合。

center loss 和 softmax 的对比：

; 2.2 L-softmax

决策边界的概念

论文：Large-Margin Softmax Loss for Convolutional Neural Networks

假设只有两个类的情况下(方便分析，推广到多类别同样适用)

softmax：
L 1 = − 1 N ∑ i = 1 N l o g e W y i T x i + b y i ∑ j = 1 N W y i T x i + b y i L_{1} = – \frac{1}{N} \sum_{i=1}^{N} {log \frac{e^{W^{T}{y{i}} x_{i} + b_{y_{i}}}} {\sum_{j=1}^{N}{W^{T}{y{i}} x_{i} + b_{y_{i}}}}}L 1 =−N 1 i =1 ∑N l o g ∑j =1 N W y i T x i +b y i e W y i T x i +b y i
由 softmax 的公式可知，当类别为二的时候，分类为1和2的概率p分别为p 1 = e W 1 T x + b 1 p_{1} = e^{W^{T}{1} x + b{1}}p 1 =e W 1 T x +b 1 和 p 2 = e W 2 T x + b 2 p_{2} = e^{W^{T}{2} x + b{2}}p 2 =e W 2 T x +b 2 ，当p1 > p2 的时候判断为类1，当p1 < p2时判断为类2 。因此判断边界为p 1 = = p 2 p_{1} == p_{2}p 1 ==p 2 ，即W 1 T x + b 1 = W 2 T x + b 2 W^{T}{1} x + b{1} = W^{T}{2} x + b{2}W 1 T x +b 1 =W 2 T x +b 2 。因此其决策边界依赖于 W 和 b 两个矩阵。

L-softmax 引入参数W的归一化，并将偏置 b置为零。决策边界W 1 T x + b 1 = W 2 T x + b 2 W^{T}{1} x + b{1} = W^{T}{2} x + b{2}W 1 T x +b 1 =W 2 T x +b 2 变为∥ x ∥ c o s ( θ 1 ) = ∥ x ∥ c o s ( θ 2 ) \lVert x \rVert cos(\theta_{1}) = \lVert x \rVert cos(\theta_{2})∥x ∥c o s (θ1 )=∥x ∥c o s (θ2 ) ，softmax的边界∥ W 1 T ∥ ∥ x ∥ c o s ( θ 1 ) = ∥ W 2 T ∥ ∥ x ∥ c o s ( θ 2 ) \lVert W^{T}{1}\rVert \lVert x \rVert cos(\theta{1}) = \lVert W^{T}{2} \rVert \lVert x \rVert cos(\theta{2})∥W 1 T ∥∥x ∥c o s (θ1 )=∥W 2 T ∥∥x ∥c o s (θ2 )既依赖于权重向量的大小，又依赖于角度的余弦值，从而导致余弦空间中的决策区域重叠(重叠就是margin < 0)。因此，L-softmax仅以角度(angular) 作为决策边界优于原softmax依赖于 W 和 θ \theta θ判断边界。
注：θ \theta θ代表W W W和x x x的夹角

L-softmax：
L 1 = − 1 N ∑ i = 1 N l o g e ∥ x i ∥ c o s ( m θ y i , i ) e ∥ x i ∥ c o s ( m θ y i , i ) + ∑ i ≠ j e ∥ x i ∥ c o s ( θ i , j ) L_{1} = – \frac{1}{N} \sum_{i=1}^{N} {log \frac{e^{\lVert x_{i} \rVert cos(m \theta_{y_{i,i}})} } {e^{\lVert x_{i} \rVert cos(m \theta_{y_{i,i}} ) }+\sum_{i \neq j}^{}{ e^{ \lVert x_{i} \rVert cos( \theta_{i,j})} }}}L 1 =−N 1 i =1 ∑N l o g e ∥x i ∥c o s (m θy i ,i )+∑i =j e ∥x i ∥c o s (θi ,j )e ∥x i ∥c o s (m θy i ,i )
引入c o s ( m θ y i , i ) cos(m \theta_{y_{i,i}})c o s (m θy i ,i ) 参数m作为角边距来调参控制训练的类间距。

贡献：
用夹角θ \theta θ来控制决策边界，引入角边距m。优化了扩大类间距的算法。

注：论文中并没有归一化W矩阵，这里是为了简化计算。真正开始归一化矩阵W的是下一节的A-softmax。

L-softmax 和 softmax：

左边第一个是softmax，右边三个是取m不同的L-softmax。

; 2.3A-softmax

在L-softmax loss (large margin softmax loss)的基础上再添加限制条件||W||=1（权重归一化）和b=0就会使得预测仅取决于W和x之间的角度θ，这样便得到了angular softmax loss，简称A-softmax loss。上述L-softmax中，为了简化推理，已经使用了||W||=1（权重归一化）和b=0的假设。实际上，原论文并没有这样做。
原L-softmax的公式：
L 1 = − 1 N ∑ i = 1 N l o g e ∥ W i ∥ ∥ x i ∥ c o s ( m θ y i , i ) + b i e ∥ W i ∥ ∥ x i ∥ c o s ( m θ y i , i ) + b i + ∑ i ≠ j e ∥ W i ∥ ∥ x i ∥ c o s ( θ i , j ) + b i L_{1} = – \frac{1}{N} \sum_{i=1}^{N} {log \frac{e^{\lVert W_{i} \rVert \lVert x_{i} \rVert cos(m \theta_{y_{i,i}}) + b_{i}} } {e^{\lVert W_{i} \rVert \lVert x_{i} \rVert cos(m \theta_{y_{i,i}} ) + b_{i}}+\sum_{i \neq j}^{}{ e^{\lVert W_{i} \rVert \lVert x_{i} \rVert cos( \theta_{i,j}) + b_{i}} }}}L 1 =−N 1 i =1 ∑N l o g e ∥W i ∥∥x i ∥c o s (m θy i ,i )+b i +∑i =j e ∥W i ∥∥x i ∥c o s (θi ,j )+b i e ∥W i ∥∥x i ∥c o s (m θy i ,i )+b i
A-softmax的公式(添加W归一化，b=0 的L-softmax)：
L 1 = − 1 N ∑ i = 1 N l o g e ∥ x i ∥ c o s ( m θ y i , i ) e ∥ x i ∥ c o s ( m θ y i , i ) + ∑ i ≠ j e ∥ x i ∥ c o s ( θ i , j ) L_{1} = – \frac{1}{N} \sum_{i=1}^{N} {log \frac{e^{\lVert x_{i} \rVert cos(m \theta_{y_{i,i}})} } {e^{\lVert x_{i} \rVert cos(m \theta_{y_{i,i}} ) }+\sum_{i \neq j}^{}{ e^{ \lVert x_{i} \rVert cos( \theta_{i,j})} }}}L 1 =−N 1 i =1 ∑N l o g e ∥x i ∥c o s (m θy i ,i )+∑i =j e ∥x i ∥c o s (θi ,j )e ∥x i ∥c o s (m θy i ,i )

2.4Cosface

A-softmax的缺点：
由决策边界：∥ x ∥ c o s ( m θ 1 ) = ∥ x ∥ c o s ( θ 2 ) \lVert x \rVert cos(m\theta_{1}) = \lVert x \rVert cos(\theta_{2})∥x ∥c o s (m θ1 )=∥x ∥c o s (θ2 ) 可知，当∥ x ∥ c o s ( m θ 1 ) > ∥ x ∥ c o s ( θ 2 ) \lVert x \rVert cos(m \theta_{1}) > \lVert x \rVert cos(\theta_{2})∥x ∥c o s (m θ1 )>∥x ∥c o s (θ2 )时，x被分类到类1。
A-Softmax的margin在所有θ值上都不一致：当θ减小时， margin变小，当θ= 0时，margin完全消失。

softmax在类c1、c2的区分上有重叠；NLS（即W归一化）c1、c2之间没有重叠也没有距离margin；A-softmax c1、c2之间有margin但随着角度θ \theta θ减小到0；LMCL(CosFace) c1、c2之间有清晰的边界。
A-softmax容易导致两个潜在的问题：
首先，对于视觉上相似因此在W1和W2之间具有较小角度的困难类别C1和C2，margin很小。
其次，从技术上讲，必须采用额外的技巧用ad-hoc piecewise function来克服计算余弦相似度时的非单调性困难(c o s θ cos\theta c o s θ递增区间和递减区间)。

因此，cosface loss改进了A-sofamax，W权重L2归一化，x输入向量归一化到一个固定值s，让cos(θ)加上m。对cosface来说，s需要足够大。
公式：
c o s ( m θ ) ⇒ c o s ( θ ) + m x = s ⋅ x ∗ ∥ x ∗ ∥ ⇒ ∥ x ∥ = s cos(m\theta) \Rightarrow cos(\theta) + m \\ x = s \cdot \frac{x^{}}{\lVert x_{} \rVert} \\ \Rightarrow \lVert x \rVert = s c o s (m θ)⇒c o s (θ)+m x =s ⋅∥x ∗∥x ∗⇒∥x ∥=s
推导出新的损失函数
L 1 = − 1 N ∑ i = 1 N l o g e s ( c o s ( θ y i , i ) − m ) e s ( c o s ( θ y i , i ) − m ) + ∑ i ≠ j e ∥ x i ∥ c o s ( θ i , j ) L_{1} = – \frac{1}{N} \sum_{i=1}^{N} {log \frac{e^{s (cos(\theta_{y_{i,i}}) – m)} } {e^{s (cos(\theta_{y_{i,i}} ) – m) }+\sum_{i \neq j}^{}{ e^{ \lVert x_{i} \rVert cos( \theta_{i,j})} }}}L 1 =−N 1 i =1 ∑N l o g e s (c o s (θy i ,i )−m )+∑i =j e ∥x i ∥c o s (θi ,j )e s (c o s (θy i ,i )−m )
决策边界c o s ( θ 1 ) − m = c o s ( θ 2 ) cos(\theta_{1}) – m = cos(\theta_{2})c o s (θ1 )−m =c o s (θ2 )，距离margin由m调节。

贡献：
引入数据x的归一化，优化θ \theta θ的边界问题。

; 2.5 arcface

在x x x和W i W_{i}W i 之间的θ上加上角度间隔m（注意是加在了角θ上），以加法的方式惩罚深度特征与其相应权重之间的角度，从而同时增强了类内紧度和类间差异。

比如训练时降到某一固定损失值时，有Margin和无Margin的e指数项是相等的，则有Margin的θ i \theta_{i}θi 就需要相对的减少了。这样来看有 Margin的训练就会把 i 类别的输入特征和权重间的夹角θ i \theta_{i}θi 缩小了，从一些角度的示图中可以看出，Margin把θ i \theta_{i}θi 挤得更类内聚合了，θ i \theta_{i}θi 和其他θ \theta θ类间也就更分离了。

公式：
L 1 = − 1 N ∑ i = 1 N l o g e s ( c o s ( θ y i , i + m ) ) e s ( c o s ( θ y i , i + m ) ) + ∑ i ≠ j e ∥ x i ∥ c o s ( θ i , j ) L_{1} = – \frac{1}{N} \sum_{i=1}^{N} {log \frac{e^{s (cos(\theta_{y_{i,i}} + m) )} } {e^{s (cos(\theta_{y_{i,i}} + m) ) }+\sum_{i \neq j}^{}{ e^{ \lVert x_{i} \rVert cos( \theta_{i,j})} }}}L 1 =−N 1 i =1 ∑N l o g e s (c o s (θy i ,i +m ))+∑i =j e ∥x i ∥c o s (θi ,j )e s (c o s (θy i ,i +m ))
决策边界：
c o s ( θ 1 + m ) = c o s ( θ 2 ) cos(\theta_{1} + m) = cos(\theta_{2})c o s (θ1 +m )=c o s (θ2 )
决策边界：
ArcFace：Additive Angular Margin，加法角度间隔
SphereFace(A-softmax)：Multiplicative Angular Margin，乘法角度间隔
CosFace：Additive Cosine margin，加法余弦间隔

3.参考

论文

center softmax A Discriminative Feature Learning Approach for Deep Face Recognition
L-softmax Large-Margin Softmax Loss for Convolutional Neural Networks
A-softmax Deep Hypersphere Embedding for Face Recognition
Cosface Large Margin Cosine Loss for Deep Face Recognition
Arcface Additive Angular Margin Loss for Deep Face Recognition

博客

center softmax 人脸识别论文再回顾之一：Center Loss
L-softmax 深度学习–Large Marge Softmax Loss损失
A-softmax 深度学习—A-softmax原理+代码
Cosface 人脸识别合集 | 9 CosFace解析
Arcface 人脸识别合集 | 10 ArcFace解析

Original: https://blog.csdn.net/qq_55796594/article/details/125662425
Author: 午夜零时
Title: arcface的前世今生

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/787797/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

【Pygame实战】单机游戏《赛车计划》评测：不一样的汽车题材游戏，除了技术还看运气~（附代码）

导语哈喽！哈喽~我是木木子，很久没给大家更新游戏的类似啦—— 所有文章完整的素材+ 源码都在👇👇 粉丝白嫖源码福利，请移步至CSDN社区或文末公众hao即可免费。有粉丝投稿，…

Python 2023年9月19日
0046
flask前端显示MySQL数据_python大佬养成计划—-基于flask_sqlalchemy的网页显示数据库信息…

网页显示数据库信息使用我们刚学习的flask_sqlalchemy，在网页中显示数据库表中的数据。在开始运行程序前，确保数据库中执行过创建表和创建用户的操作，详见链接描述。模…

Python 2023年8月13日
0061
基于文化算法优化的神经网络预测研究（Matlab代码实现）

目录 1 文化优化算法 2 人工神经网络 3 基于文化算法优化的神经网络预测研究（Matlab代码实现）运行结果 4 参考文献 5 Matlab代码实现 1 文化优化算法大自然里…

Python 2023年10月8日
0047
Python numpy广播机制

numpy广播机制 numpy 在算术运算期间采用”广播”来处理具有不同形状的 array ，即将较小的阵列在较大的阵列上”广播”，…

Python 2023年8月23日
0051
pygame3 图像

屏幕上画矩形，圆，或是画点，画曲线只是制作图形的一种方式。有时候我们还想用从别处得来的图片显示在我们的程序中。 1、获取图片 my_ball = pygame.image.load…

Python 2023年9月17日
0065
python报错ValueError: Must pass 2-d input. shape=(5, 1, 10)解决方案

1.承接上一篇博文，上文提到append（）函数中的参数ignore_index=True，如果加上该参数，是a.append（b，ignore_index=True)，数据不支持…

Python 2023年8月16日
0042
PIL+Numpy+Matplotlib 实现图像处理

PIL库安装 Image 模块是 PIL 库中重要的模块，它可以帮助我们实现图像的处理但 PIL 库在 Python 中不内置，需要安装后使用在控制台(cmd)中输入以下内容安装…

Python 2023年8月24日
0038
炒股太累？来试试机器炒股吧，异常快乐

软件架构 ; github仓库： https://gitee.com/linxinloningg/Stockquant https://github.com/linxinlonin…

Python 2023年8月10日
0049
Pandas学习笔记（包括示例代码、运算结果及详细注释）

1.Series 2.DataFrame的简单运用 3.pandas选择数据 * 3.1 实战筛选 3.2 筛选总结 4.Pandas设置值 * 4.1 创建数据 4.2 根据位置…

Python 2023年8月7日
0076
python pandas 读取excel 去重某一列_使用Python Pandas读取excel并将列/行隔离到p

下面是我如何绘制大型数据帧的第31行中的数据，将第0行设置为x轴。(更新答案)import pandas as pd import numpy as np import matpl…

Python 2023年8月8日
0056
Python自学笔记（蟒蛇书）

《Python编程——从入门到实践》（蟒蛇书）自学笔记2022年9月8日二、变量和简单数据类型变量命名不能使用大写字母，应该以下划线分割双引号与单引号并无区别，因此嵌套时可…

Python 2023年11月1日
0036
Unity UI、图片(Sprite)的显示层级(遮挡关系)控制

UI之间如何控制互相的遮挡关系：一个方法是给每个UI都添加一个Canvas组件，选中 Override Sorting之后就可以通过调整Sort Order的数值来控制遮挡关系。…

Python 2023年10月8日
0046
python怎么输出图像测试_【Python】使用Pytest集成Allure生成漂亮的图形测试报告

前言大概两个月前写过一篇《【测试设计】使用jenkins 插件Allure生成漂亮的自动化测试报告》的博客，但是其实Allure首先是一个可以独立运行的测试报告生成框架，然后才有…

Python 2023年9月14日
0089
Scrapy

简介 Scrapy是一个为了爬取网站数据，提取结构性数据而编写的应用框架。我们只需要少量代码，就能够快速的抓取。Scrapy使用了Twisted[‘twisted&#8…

Python 2023年10月6日
0042
动态加载数据

404. 抱歉，您访问的资源不存在。可能是网址有误，或者对应的内容被删除，或者处于私有状态。代码改变世界，联系邮箱 contact@cnblogs.com 园子的商业化努力-困…

Python 2023年6月9日
0085
Linux的前世今生

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

Python 2023年10月10日
0058

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31