; 1. Motivation
本文基于Transfer-Learning Based 以及 Faster R-CNN进行改进。
本文针对分类和回归任务中存在的矛盾点进行分析:
-
In this paper, we look closely into the conventional Faster R-CNN and analyze its contradictions from two orthogonal perspectives, namely multi-stage (RPN vs. RCNN) and multi-task (classification vs. localization).
-
Contribution
本文关于网络结构的2个创新点, Gradient Decoupled Layer用于多阶段的解耦,Prototypical Calibration Block用于多任务的解耦。
其中GDL是针对backbone,来解耦之前层和后面层;而PCB则是offline的prototype的分类层,用于boost原有的分类层。
-
We look closely into the conventional Faster R-CNN and propose a simple yet effective architecture for few-shot detection, named Decoupled Faster R-CNN, which can be learned end-to-end via straightforward fine-tuning.
-
To deal with the data-scarce scenario, we further present two novel modules, i.e. GDL and PCB, to perform de- coupling among multiple components of Faster R-CNN and boost classification performance respectively.
-
DeFRCN is remarkably superior to SOTAs on various benchmarks, revealing the effectiveness of our approach.
-
Method
Backbone、RPN、Box Classifier以及Regressor在fine-tune阶段是trainable,而RCNN是frozen的。
; Problem of multi-task learning
作者认为对于多任务学习来说,子网络的优化目标存在不一致性。
RPN是where to look, RCNN是what to look
classification head需要translation invariant features,而localization head 需要translation covariant features。
因此,可能导致一个suboptimal solution
由于Backbone的梯度回传和RCNN以及 RPN有关,但是这2者有一定的矛盾性,因此,作者认为这可能会导致FSOD性能的下降。并且在FSOD中,第二阶段的RPN会受到更多的前景-背景的混淆问题foreground-background confusion。因此可能造成对于base classes过拟合梯度的传播,到backbone以及RCNN
- which means a pro- posal that belongs to background in the base training phase is likely to be foreground in the novel fine-tuning phase
3.1 Gradient Decoupled Layer
- Perform Decoupling with GDL
- Optimization with GDL
; 3.2 Prototypical Calibration Block
对于PCB提出的动机:
本文注意到few shot 分类分支产生了很大部分低质量的分数,这驱使我们来消除高得分的FP以及修正低分数的正样本。
- We notice that the under-explored few-shot classification branch generates a large amount of low-quality scores, which motivates us to eliminate high-scored false positives and remedy low-scored missing samples by introducing a Prototypical Calibration Block (PCB) for score refinement.
PCB的组成是classifier、RoIAlign、prototype bank。
给定M-way K-shot 任务的support set S,PCB提取了原始的图片特征图,然后直接使用对于GTbox的RoIAlign操作(类似Attention RPN那篇的操作),这样就可以得到对于MK instance的特征表示。这样我们构建一个prototype bank P = p c c = 1 M P= {p_c}^M_{c=1}P =p c c =1 M ,其中对于每一个类别c的prototype的公式如下:
其中subset S只包含某一个类别的所有instance的集合。
给定一个proposal y ^ = ( c i , s i , b i ) \hat y =(c_i, s_i, b_i)y ^=(c i ,s i ,b i ) ,这个proposal就是Faster R-CNN原有分支中fine-tune阶段得到的特征,c是label,s是score,b是box;PCB首先使用RoIAlign在b i b_i b i 上,接着对应x i x_i x i 以及p c i p_{c_i}p c i 应用余弦相似度。
然后使用weight aggregation进行加权:
由于PCB是offline的结构,因此它即插即用,并不会对网络的训练造成很大的开销。并且PCB和proposal的分类分支不贡献参数。
- Further- more, since the PCB module is offline without any further training, it can be plug-and-play and easily equipped to any other architectures to build stronger few-shot detectors.
因此,我认为总体而言,这篇DEFRCN,它是对于meta-learning以及transfer-learning的融合,使用Transfer-Learning的总体框架,但是对于分类任务的问题上,他们使用support set 进行一个weight reweighting的融合的操作;不过它还对于backbone的特征的反向进行了修改,尽可能区分分类中的平移不变性以及回归问题的平移协变(covariance)性质。
- Experiment
4.1.1 VOC
; 4.1.2 COCO
4.1.3 COCO to VOC
; 4.2 Ablation Study
4.2.1 Effectiveness of different modules
; 4.2.2 Effectiveness of the degree of decoupling
This observation prompts us to perform stop-gradient for RPN and scale-gradient for RCNN in DeFRCN
4.2.3 Can GDL boost conventional detection?
Original: https://blog.csdn.net/weixin_43823854/article/details/120071759
Author: Ah丶Weii
Title: [DeFRCN] Decouple Faster R-CNN for Few-Shot Object Detection(ICCV 2021)
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/687818/
转载文章受原作者版权保护。转载请注明原作者出处!