聚类生成anchor框的尺寸和比例

2023年5月31日下午12:05 • 人工智能 • 阅读 91

前言：

anchor是锚的意思，就是固定船的大铁块儿。在目标检测中，anchor box意为预设固定尺寸的参考框。目标检测要解决的问题是图像中哪个位置有什么样的物体，传统算法的解决方法是采用滑窗的方式，遍历整个图像，判断此位置是否有物体，非常低效耗时。

anchor box的概念首先出现在Faster RCNN中，通过一组9个人工预先设置固定尺寸的框，对经过backbone网络提取的特征图进行遍历，为每一个点都设置这9个固定的先验框，对每个框再采用卷积的方式进行分类(是否包含目标)和回归(anchor的坐标偏移量和缩放因子)，再通过非极大抑制，去除重叠框，实现对目标物体的定位和分类。不同尺寸和ratio的框代表着能够适应不同尺度的目标物体。

先验框是由人为指定的，由此带来的问题是先验框设置的好坏会影响模型的训练以及收敛。在Faster RCNN中，对物体的定位是通过计算偏移量实现的，码！

def regression_box_shift(p, g):
"""
    compute t to transform p to g
    :param p: proposal box
    :param g: ground truth
    :return: t
"""
    w_p = p[2] - p[0]
    h_p = p[3] - p[1]
    w_g = g[2] - g[0]
    h_g = g[3] - g[1]
    tx = (g[0] - p[0])/w_p
    ty = (g[1] - p[1])/h_p
    tw = np.log(w_g/w_p)
    th = np.log(h_g/h_p)
    t = [tx, ty, tw, th]
    return t

举个例子来说，如果训练样本中都是尺寸较小的物体，而先验框的尺寸却很大，这意味着在bounding box regression阶段，将需要更多的调整以及更大的调整幅度以使得proposal框更接近ground truth框，从而影响模型的收敛速度。

那么就要求先验框的设置应当能够适应检测样本中的目标尺寸，也就是说，对于检测样本中的不同物体，9个anchor box中总有一个先验框的尺寸很接近物体的尺寸，而不是所有的anchor都偏离目标物体的尺寸。

聚类生成先验框尺寸：

yolov2中率先提出了使用K-means聚类的方法自动生成anchor尺寸，消除了anchor设置的主观性，在使用5个anchor的情况下就能达到Faster RCNN中使用9个anchor的精度，效果很好。

在K-means聚类算法中，主要概念为距离度量函数和聚类中心。对应于anchor聚类，不同的是样本距离度量函数的设置，定义为：

Distance = 1 – IOU

其中Distance为样本间距离，IOU为某个anchor和某ground truth的交并比，计算时将两个box的中心自动对齐，IOU越大，Distance越小，表明两个box尺寸越相近。

算法流程：码！

def kmeans(boxes, k, dist=np.median):
    # number of boxes
    box_num = len(boxes)
    # store cluster center of each box
    nearest_id = np.zeros(box_num)
    np.random.seed(42)
    # initialize the cluster
    clusters = boxes[np.random.choice([i for i in range(box_num)], k, replace=False)]
    while True:
        # store iou distance between each pair of boxes and anchors
        distance = []
        for i in range(box_num):
            ious = compute_iou(boxes[i], clusters)
            dis = [1-iou for iou in ious]
            distance.append(dis)
        distance = np.array(distance)
        # calculate box cluster id
        new_nearest_id = np.argmin(distance, axis=1)
        # break condition
        if (new_nearest_id == nearest_id).all():
            break
        # update clusters using median strategy
        for j in range(k):
            clusters[j] = dist(boxes[new_nearest_id == j], axis=0)
        nearest_id = new_nearest_id
    return clusters

其中boxes为ground truth标注框数据，实际上只需传入框的高和宽就可以了。k为聚类中心的个数，即需要多少个anchor先验框。dist为更新聚类中心时的策略，本文使用取中间值。算法流程：初始化每个box的聚类中心id，随机选取k个box初始化聚类中心。开始聚类：计算每个box和k个聚类中心的距离，得到mxk大小的distance数组。计算每个box的聚类中心id，根据id采取中位数的策略更新聚类中心进行迭代，如果新旧id不发生变化则完成聚类。

完整代码：

import numpy as np
from glob import glob
input_dim = 1024

def compute_iou(box, anchors):
    # distance = 1 - iou
    # dis = []
    ious = []
    for anchor in anchors:
        w_min = np.min([box[0], anchor[0]])
        h_min = np.min([box[1], anchor[1]])
        intersection = w_min*h_min
        union = box[0]*box[1] + anchor[0]*anchor[1]
        iou = intersection/(union - intersection)
        # dis.append(1 - iou)
        ious.append(iou)
    return ious

def kmeans(boxes, k, dist=np.median):
    # number of boxes
    box_num = len(boxes)
    # store cluster center of each box
    nearest_id = np.zeros(box_num)
    np.random.seed(42)
    # initialize the cluster
    clusters = boxes[np.random.choice([i for i in range(box_num)], k, replace=False)]
    while True:
        # store iou distance between each pair of boxes and anchors
        distance = []
        for i in range(box_num):
            ious = compute_iou(boxes[i], clusters)
            dis = [1-iou for iou in ious]
            distance.append(dis)
        distance = np.array(distance)
        # calculate box cluster id
        new_nearest_id = np.argmin(distance, axis=1)
        # break condition
        if (new_nearest_id == nearest_id).all():
            break
        # update clusters using median strategy
        for j in range(k):
            clusters[j] = dist(boxes[new_nearest_id == j], axis=0)
        nearest_id = new_nearest_id
    return clusters

def load_dataset(path):
    # load normalization width and height of boxes
    path = path + '/*.txt'
    txt_list = glob(path)
    data_set = []
    for txt in txt_list:
        with open(txt, 'r') as f:
            lines = f.readlines()
        for line in lines:
            coordinate = line.split(' ')
            w, h = np.array(coordinate[3:5], dtype=np.float64)
            data_set.append([w, h])
    data_set = np.array(data_set)
    return data_set

def main():
    txt_path = 'C:\\Users\\XQ\\Desktop\\labels'
    data = load_dataset(txt_path)
    # number of cluster center
    clusters = kmeans(data, 9)
    print('cluster center:*************')
    print(clusters*input_dim)
    accuracy = np.mean([np.max(compute_iou(box, clusters)) for box in data])*100
    print('Accuracy(Average iou): %.4f%%' % accuracy)
    anchor_ratio = np.around(clusters[:, 0] / clusters[:, 1], decimals=2)
    anchor_ratio = list(anchor_ratio)
    print('Final anchor_ratio: ', anchor_ratio)
    print('Sorted anchor ratio: ', sorted(anchor_ratio))

if __name__ == "__main__":
    main()

此代码使用的数据格式为yolo模型的数据标签格式，其它类型的也可以啦，自行转换即可。

输出效果：

输出有四项，第一项为k个聚类中心anchor的尺寸，注意乘上input_dim进行转换。第二项为Accuracy，其实为所有box和它的聚类中心anchor的IOU均值，这个值越大表明k个anchor能够适应的标注框越多，效果越好。后两项为anchor长宽比的值。

才疏学浅，欢迎指正！

Original: https://blog.csdn.net/joker_xiansen/article/details/120002013
Author: joker_xiansen
Title: 聚类生成anchor框的尺寸和比例

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/550871/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

【书籍】《Pytorch深度学习》——实现第一个神经网络

文章目录准备数据为神经网络创建数据 * 创建学习参数神经网络模型网络的实现损失函数优化神经网络加载数据 * Dataset类 DataLoader类 Pytorch …

人工智能 2023年7月14日
0077
pytorch之常用函数整理

pytorch之常用函数整理一、图像预处理函数 * 1.1 torchvision.datasets.ImageFolder()函数二、参数优化函数 * 2.1 torch.o…

人工智能 2023年7月21日
0061
Teams app LukcyDraw 的升级之路

我已经有很长一段时间没有更新我的 Teams App：LuckyDraw 了，有很多用户反馈给我，因为快到圣诞，新年和春节了，很多公司都开始要使用LuckyDraw来搞抽奖活动，希…

人工智能 2023年7月30日
0080
python Clickhouse 分布式表介绍和创建、插入和查询数据，以及解决遇到的问题

目录一、分布式表和本地表原理解析：二、Clickhouse创建分布式表结构三、python代码实现（亲测有效）四、解决遇到的问题解决 DB::Exception: Mi…

人工智能 2023年6月28日
00123
Python学习-Scipy库优化与拟合optimize(最小二乘法拟合、B-样条拟合)

Python学习-Scipy库优化与拟合optimize 目录 1、最小二乘法拟合least_squares() 2、B-样条拟合interpolate.BSpline() 导入库…

人工智能 2023年6月18日
00101
张量——Pytorch中Tensor的维度，形状，意义

搬运一篇文章！阅读原文在深度学习里，Tensor实际上就是一个多维数组（multidimensional array）。而Tensor的目的是能够创造更高维度的矩阵、向量。对P…

人工智能 2023年7月21日
0075
【深度学习】（三）图像分类

; 图像分类🍉 文章目录 * – 图像分类🍉* 前言🎠* 一、ILSVRC竞赛* 二、卷积神经网络（CNN）发展* – 1.网络进化 – 2.A…

人工智能 2023年7月26日
0095
数据分析之python数据计算方法汇总(math|numpy|pandas)

数据分析之python数据计算方法上篇(math|numpy)_Backup and share的博客-CSDN博客本文重点介绍pandas，math和numpy参见上篇>…

人工智能 2023年6月19日
0095
Label,Verify,Correct：一种简单的Few Shot 目标检测方法

关注并星标从此不迷路计算机视觉研究院公众号ID｜ ComputerVisionGzq 学习群｜扫码在主页获取加入方式论文链接: https://arxiv.org/pdf…

人工智能 2023年7月9日
0059
【LSTM数据预测】基于matlab LSTM神经网络空调能耗数据预测【含Matlab源码 051期】

⛄一、LSTM简介 1997年，Hochreiter和Schmidhuber一起提出长短期记忆神经网络,即LSTM神经网络。LSTM神经网络很好地解决了RNN中存在的梯度消失和梯度…

人工智能 2023年7月13日
0069
【modlearts】华为人工智能平台_modelarts平台系列教程3_预置算法_语音处理3

文章目录前言 1.场景简介 2.代码解析 * 2.1语音合成 2.2 语音识别前言华为modelarts训练，能够面向三类用户提供解决AI开发支持。对于无AI基础的业务开发员…

人工智能 2023年5月25日
0094
标点符号预测(Punctuation Prediction/Restoration)相关调研

目录 1.工作意义 2.工作难点 3.常见方法分类 4.论文分享 (1) Punctuation prediction for unsegmented transcript bas…

人工智能 2023年5月25日
0079
ABSA论文阅读

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 2023年5月27日
0079
【物体检测快速入门系列 | 01 】基于Tensorflow2.x Object Detection API构建自定义物体检测器

这是机器未来的第1篇文章原文首发地址:https://blog.csdn.net/RobotFutures/article/details/124745966 CSDN话题挑战赛…

人工智能 2023年7月26日
0053
RDD、DataFrame和DataSet的区别

原文链接：http://www.jianshu.com/p/c0181667daa0 RDD、DataFrame和DataSet是容易产生混淆的概念，必须对其相互之间对比，才可以知…

人工智能 2023年6月2日
00104
自监督模型—MoCoV3

摘要本文没有描述一种新的方法。相反，考虑到计算机视觉的进展，它研究了一个直接的、增量的、但必须知道的基线：视觉变压器(ViT)的自我监督学习。虽然标准卷积网络的训练配方已经高度…

人工智能 2023年6月18日
0097

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

聚类生成anchor框的尺寸和比例

前言：

聚类生成先验框尺寸：

大家都在看