文章目录
- 前言
- 实现效果
- 实现细节
* - 1、在YOLOv7源码的基础上进行修改和添加操作
- 2、hook函数
- 3、GradCAM基本实现思路
- 修改部分
* - 1、Detect类中的forward函数
- 2、Detect类中的_make_grid函数
- 3、attempt_load函数
- 添加部分
* - 1、gradcam.py
- 2、yolov7_object_detector.py
- 3、主函数main_gradcam.py
- 4、 GradCAM++(待完善…)
- References
前言
本文使用类激活图(Class Activation Mapping, CAM)方法,对YOLOv7模型的检测结果进行特征可视化,便于我们观察网络在经过backbone和head之后,到底关注了图像的哪些区域才识别定位出了目标,也便于我们分析比较不同的YOLOv7模型,对同一个目标的关注程度。
为了方便各位小伙伴复现,这里放上了YOLOv7的论文和源码(不断更新中…),以及YOLOv7结合GradCAM的Gitee源码(参考:yolov5-gradcam)。
实现效果
; 实现细节
1、在YOLOv7源码的基础上进行修改和添加操作
---model/gradcam.py
---model/yolov7_object_detector.py
---main_gradcam.py
特别说明:官方给出yolov7.pt预训练权重,最后一层是Detect,不是IDetect层,如果自己使用 training/yolov7.yaml
进行训练得到的权重,最后则是IDetect,可以仿照在Detect中进行的修改。
2、hook函数
基于Pytorch实现类激活图GradCAM方法的核心,是使用 hook
函数:在Pytorch的计算过程中,一些中间变量会释放掉,比如特征图、非叶子节点的梯度,在模型前向传播、反向传播的时候添加 hook
函数,提取一些释放掉而后面又需要用到的变量,也可以用 hook
函数来改变中间变量的梯度。
在YOLOv7中实现GradCAM时,具体会用到两个 hook
函数: register_forward_hook
和 register_full_backward_hook
class YOLOV7GradCAM:
def __init__(self, model, layer_name, img_size=(640, 640)):
self.model = model
self.gradients = dict()
self.activations = dict()
def forward_hook(module, input, output):
self.activations['value'] = output
return None
def backward_hook(module, grad_input, grad_output):
self.gradients['value'] = grad_output[0]
return None
target_layer = find_yolo_layer(self.model, layer_name)
target_layer.register_forward_hook(forward_hook)
target_layer.register_full_backward_hook(backward_hook)
3、GradCAM基本实现思路
- 对目标层的梯度的每个通道求平均(GAP,全局平均池化操作)
- 将GAP后的梯度值与目标层的输出值逐点相乘
- 由于热力图关心的是对分类有正面影响的特征,所以加上
ReLU
函数以移除负值
def forward(self, input_img, class_idx=True):
"""
Args:
input_img: input image with shape of (1, 3, H, W)
Return:
mask: saliency map of the same spatial dimension with input
logit: model output
preds: The object predictions
"""
saliency_maps = []
b, c, h, w = input_img.size()
preds, logits = self.model(input_img)
for logit, cls, cls_name in zip(logits[0], preds[1][0], preds[2][0]):
if class_idx:
score = logit[cls]
else:
score = logit.max()
self.model.zero_grad()
score.backward(retain_graph=True)
gradients = self.gradients['value']
activations = self.activations['value']
b, k, u, v = gradients.size()
alpha = gradients.view(b, k, -1).mean(2)
weights = alpha.view(b, k, 1, 1)
saliency_map = (weights * activations).sum(1, keepdim=True)
saliency_map = F.relu(saliency_map)
saliency_map = F.interpolate(saliency_map, size=(h, w), mode='bilinear', align_corners=False)
saliency_map_min, saliency_map_max = saliency_map.min(), saliency_map.max()
saliency_map = (saliency_map - saliency_map_min).div(saliency_map_max - saliency_map_min).data
saliency_maps.append(saliency_map)
return saliency_maps, logits, preds
修改部分
1、Detect类中的forward函数
- 修改
model/yolo.py
文件中的Detect
类中的forward
函数 - 添加如下四条语句:
logits_ = []
logits = x[i][..., 5:]
logits_.append(logits.view(bs, -1, self.no - 5))
out = (torch.cat(z, 1), torch.cat(logits_, 1), x)
def forward(self, x):
z = []
logits_ = []
self.training |= self.export
for i in range(self.nl):
x[i] = self.m[i](x[i])
bs, _, ny, nx = x[i].shape
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
if not self.training:
if self.grid[i].shape[2:4] != x[i].shape[2:4]:
self.grid[i] = self._make_grid(nx, ny).to(x[i].device)
logits = x[i][..., 5:]
y = x[i].sigmoid()
if not torch.onnx.is_in_onnx_export():
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]
else:
xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]
wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].data
y = torch.cat((xy, wh, y[..., 4:]), -1)
z.append(y.view(bs, -1, self.no))
logits_.append(logits.view(bs, -1, self.no - 5))
if self.training:
out = x
elif self.end2end:
out = torch.cat(z, 1)
elif self.include_nms:
z = self.convert(z)
out = (z,)
else:
out = (torch.cat(z, 1), torch.cat(logits_, 1), x)
return out
2、Detect类中的_make_grid函数
- 修改
model/yolo.py
文件中的Detect
类中的_make_grid
函数 - 为了使得
torch.meshgrid
函数适配torch>=1.10.0
版本
@staticmethod
def _make_grid(nx=20, ny=20):
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)], indexing='ij')
return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()
3、attempt_load函数
- 仿照YOLOv5,将
inplace
作为一个参数进行传入 - 避免
Silu
等激活函数无法进行反向传播
def attempt_load(weights, map_location=None, inplace=True):
model = Ensemble()
for w in weights if isinstance(weights, list) else [weights]:
attempt_download(w)
ckpt = torch.load(w, map_location=map_location)
model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval())
for m in model.modules():
if type(m) in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU]:
m.inplace = inplace
添加部分
1、gradcam.py
- 在
model
文件夹中,添加gradcam.py
文件:
import time
import torch
import torch.nn.functional as F
def find_yolo_layer(model, layer_name):
"""Find yolov5 layer to calculate GradCAM and GradCAM++
Args:
model: yolov5 model.
layer_name (str): the name of layer with its hierarchical information.
Return:
target_layer: found layer
"""
hierarchy = layer_name.split('_')
target_layer = model.model.model._modules[hierarchy[0]]
for h in hierarchy[1:]:
target_layer = target_layer._modules[h]
return target_layer
class YOLOV7GradCAM:
def __init__(self, model, layer_name, img_size=(640, 640)):
self.model = model
self.gradients = dict()
self.activations = dict()
def backward_hook(module, grad_input, grad_output):
self.gradients['value'] = grad_output[0]
return None
def forward_hook(module, input, output):
self.activations['value'] = output
return None
target_layer = find_yolo_layer(self.model, layer_name)
target_layer.register_forward_hook(forward_hook)
target_layer.register_full_backward_hook(backward_hook)
device = 'cuda' if next(self.model.model.parameters()).is_cuda else 'cpu'
self.model(torch.zeros(1, 3, *img_size, device=device))
def forward(self, input_img, class_idx=True):
"""
Args:
input_img: input image with shape of (1, 3, H, W)
Return:
mask: saliency map of the same spatial dimension with input
logit: model output
preds: The object predictions
"""
saliency_maps = []
b, c, h, w = input_img.size()
preds, logits = self.model(input_img)
for logit, cls, cls_name in zip(logits[0], preds[1][0], preds[2][0]):
if class_idx:
score = logit[cls]
else:
score = logit.max()
self.model.zero_grad()
score.backward(retain_graph=True)
gradients = self.gradients['value']
activations = self.activations['value']
b, k, u, v = gradients.size()
alpha = gradients.view(b, k, -1).mean(2)
weights = alpha.view(b, k, 1, 1)
saliency_map = (weights * activations).sum(1, keepdim=True)
saliency_map = F.relu(saliency_map)
saliency_map = F.interpolate(saliency_map, size=(h, w), mode='bilinear', align_corners=False)
saliency_map_min, saliency_map_max = saliency_map.min(), saliency_map.max()
saliency_map = (saliency_map - saliency_map_min).div(saliency_map_max - saliency_map_min).data
saliency_maps.append(saliency_map)
return saliency_maps, logits, preds
def __call__(self, input_img):
return self.forward(input_img)
2、yolov7_object_detector.py
- 在
model
文件夹中,添加yolov7_object_detector.py
文件:
import numpy as np
import torch
from models.experimental import attempt_load
from utils.general import xywh2xyxy
from utils.datasets import letterbox
import cv2
import time
import torchvision
import torch.nn as nn
from utils.general import box_iou
class YOLOV7TorchObjectDetector(nn.Module):
def __init__(self,
model_weight,
device,
img_size,
names=None,
mode='eval',
confidence=0.45,
iou_thresh=0.45,
agnostic_nms=False):
super(YOLOV7TorchObjectDetector, self).__init__()
self.device = device
self.model = None
self.img_size = img_size
self.mode = mode
self.confidence = confidence
self.iou_thresh = iou_thresh
self.agnostic = agnostic_nms
self.model = attempt_load(model_weight, map_location=device, inplace=False)
self.model.requires_grad_(True)
self.model.to(device)
if self.mode == 'train':
self.model.train()
else:
self.model.eval()
if names is None:
self.names = ['your dataset classnames']
else:
self.names = names
img = torch.zeros((1, 3, *self.img_size), device=device)
self.model(img)
@staticmethod
def non_max_suppression(prediction, logits, conf_thres=0.3, iou_thres=0.45, classes=None, agnostic=False,
multi_label=False, labels=(), max_det=300):
"""Runs Non-Maximum Suppression (NMS) on inference and logits results
Returns:
list of detections, on (n,6) tensor per image [xyxy, conf, cls] and pruned input logits (n, number-classes)
"""
nc = prediction.shape[2] - 5
xc = prediction[..., 4] > conf_thres
assert 0 conf_thres 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
assert 0 iou_thres 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'
min_wh, max_wh = 2, 4096
max_nms = 30000
time_limit = 10.0
redundant = True
multi_label &= nc > 1
merge = False
output = [torch.zeros((0, 6), device=prediction.device)] * prediction.shape[0]
logits_output = [torch.zeros((0, nc), device=logits.device)] * logits.shape[0]
for xi, (x, log_) in enumerate(zip(prediction, logits)):
x = x[xc[xi]]
log_ = log_[xc[xi]]
if labels and len(labels[xi]):
l = labels[xi]
v = torch.zeros((len(l), nc + 5), device=x.device)
v[:, :4] = l[:, 1:5]
v[:, 4] = 1.0
v[range(len(l)), l[:, 0].long() + 5] = 1.0
x = torch.cat((x, v), 0)
if not x.shape[0]:
continue
x[:, 5:] *= x[:, 4:5]
box = xywh2xyxy(x[:, :4])
if multi_label:
i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).T
x = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1)
else:
conf, j = x[:, 5:].max(1, keepdim=True)
x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]
log_ = log_[conf.view(-1) > conf_thres]
if classes is not None:
x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]
n = x.shape[0]
if not n:
continue
elif n > max_nms:
x = x[x[:, 4].argsort(descending=True)[:max_nms]]
c = x[:, 5:6] * (0 if agnostic else max_wh)
boxes, scores = x[:, :4] + c, x[:, 4]
i = torchvision.ops.nms(boxes, scores, iou_thres)
if i.shape[0] > max_det:
i = i[:max_det]
if merge and (1 < n < 3E3):
iou = box_iou(boxes[i], boxes) > iou_thres
weights = iou * scores[None]
x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)
if redundant:
i = i[iou.sum(1) > 1]
output[xi] = x[i]
logits_output[xi] = log_[i]
assert log_[i].shape[0] == x[i].shape[0]
return output, logits_output
@staticmethod
def yolo_resize(img, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True):
return letterbox(img, new_shape=new_shape, color=color, auto=auto, scaleFill=scaleFill, scaleup=scaleup)
def forward(self, img):
prediction, logits, _ = self.model(img, augment=False)
prediction, logits = self.non_max_suppression(prediction, logits, self.confidence, self.iou_thresh,
classes=None,
agnostic=self.agnostic)
self.boxes, self.class_names, self.classes, self.confidences = [[[] for _ in range(img.shape[0])] for _ in
range(4)]
for i, det in enumerate(prediction):
if len(det):
for *xyxy, conf, cls in det:
bbox = [int(b) for b in xyxy]
self.boxes[i].append(bbox)
self.confidences[i].append(round(conf.item(), 2))
cls = int(cls.item())
self.classes[i].append(cls)
if self.names is not None:
self.class_names[i].append(self.names[cls])
else:
self.class_names[i].append(cls)
return [self.boxes, self.classes, self.class_names, self.confidences], logits
def preprocessing(self, img):
if len(img.shape) != 4:
img = np.expand_dims(img, axis=0)
im0 = img.astype(np.uint8)
img = np.array([self.yolo_resize(im, new_shape=self.img_size)[0] for im in im0])
img = img.transpose((0, 3, 1, 2))
img = np.ascontiguousarray(img)
img = torch.from_numpy(img).to(self.device)
img = img / 255.0
return img
3、主函数main_gradcam.py
- 在根目录下新建
main_gradcam.py
文件 - 三个detect检测层,每层都会输出每个目标的热力图可视化
import os
import random
import time
import argparse
import numpy as np
from models.gradcam import YOLOV7GradCAM
from models.yolov7_object_detector import YOLOV7TorchObjectDetector
import cv2
names = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush']
target_layers = ['102_act', '103_act', '104_act']
parser = argparse.ArgumentParser()
parser.add_argument('--model-path', type=str, default="weights/yolov7.pt", help='Path to the model')
parser.add_argument('--img-path', type=str, default='figure/cam', help='input image path')
parser.add_argument('--output-dir', type=str, default='outputs/', help='output dir')
parser.add_argument('--img-size', type=int, default=640, help="input image size")
parser.add_argument('--target-layer', type=str, default='76_act',
help='The layer hierarchical address to which gradcam will applied,'
' the names should be separated by underline')
parser.add_argument('--method', type=str, default='gradcam', help='gradcam method: gradcam, gradcampp')
parser.add_argument('--device', type=str, default='cpu', help='cuda or cpu')
parser.add_argument('--names', type=str, default=None,
help='The name of the classes. The default is set to None and is set to coco classes. Provide your custom names as follow: object1,object2,object3')
parser.add_argument('--no_text_box', action='store_true',
help='do not show label and box on the heatmap')
args = parser.parse_args()
def get_res_img(bbox, mask, res_img):
mask = mask.squeeze(0).mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).detach().cpu().numpy().astype(
np.uint8)
heatmap = cv2.applyColorMap(mask, cv2.COLORMAP_JET)
n_heatmat = (heatmap / 255).astype(np.float32)
res_img = res_img / 255
res_img = cv2.add(res_img, n_heatmat)
res_img = (res_img / res_img.max())
return res_img, n_heatmat
def plot_one_box(x, img, color=None, label=None, line_thickness=3):
cv2.imwrite('temp.jpg', (img * 255).astype(np.uint8))
img = cv2.imread('temp.jpg')
tl = line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1
color = color or [random.randint(0, 255) for _ in range(3)]
c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
if label:
tf = max(tl - 1, 1)
t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
outside = c1[1] - t_size[1] - 3 >= 0
c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3 if outside else c1[1] + t_size[1] + 3
outsize_right = c2[0] - img.shape[:2][1] > 0
c1 = c1[0] - (c2[0] - img.shape[:2][1]) if outsize_right else c1[0], c1[1]
c2 = c2[0] - (c2[0] - img.shape[:2][1]) if outsize_right else c2[0], c2[1]
cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA)
cv2.putText(img, label, (c1[0], c1[1] - 2 if outside else c2[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf,
lineType=cv2.LINE_AA)
return img
def main(img_path):
colors = [[random.randint(0, 255) for _ in range(3)] for _ in names]
device = args.device
input_size = (args.img_size, args.img_size)
img = cv2.imread(img_path)
print('[INFO] Loading the model')
model = YOLOV7TorchObjectDetector(args.model_path, device, img_size=input_size, names=names)
torch_img = model.preprocessing(img[..., ::-1])
tic = time.time()
for target_layer in target_layers:
if args.method == 'gradcam':
saliency_method = YOLOV7GradCAM(model=model, layer_name=target_layer, img_size=input_size)
elif args.method == 'gradcampp':
saliency_method = YOLOV7GradCAMPP(model=model, layer_name=target_layer, img_size=input_size)
masks, logits, [boxes, _, class_names, conf] = saliency_method(torch_img)
result = torch_img.squeeze(0).mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).detach().cpu().numpy()
result = result[..., ::-1]
imgae_name = os.path.basename(img_path)
save_path = f'{args.output_dir}{imgae_name[:-4]}/{args.method}'
if not os.path.exists(save_path):
os.makedirs(save_path)
print(f'[INFO] Saving the final image at {save_path}')
for i, mask in enumerate(masks):
res_img = result.copy()
bbox, cls_name = boxes[0][i], class_names[0][i]
label = f'{cls_name}{conf[0][i]}'
res_img, heat_map = get_res_img(bbox, mask, res_img)
res_img = plot_one_box(bbox, res_img, label=label, color=colors[int(names.index(cls_name))],
line_thickness=3)
res_img = cv2.resize(res_img, dsize=(img.shape[:-1][::-1]))
output_path = f'{save_path}/{target_layer[:-4]}_{i}.jpg'
cv2.imwrite(output_path, res_img)
print(f'{imgae_name[:-4]}_{target_layer[:-4]}_{i}.jpg done!!')
print(f'Total time : {round(time.time() - tic, 4)} s')
if __name__ == '__main__':
if os.path.isdir(args.img_path):
img_list = os.listdir(args.img_path)
print(img_list)
for item in img_list:
main(os.path.join(args.img_path, item))
else:
main(args.img_path)
4、 GradCAM++(待完善…)
- 参考gradcam_plus_plus-pytorch初步在yolov7中实现了gradcam++可视化,但效果并不太好,比gradcam差很多,仍需要优化改进
- 目前的做法是:在
model/gradcam.py
文件中,加入如下函数:
class YOLOV7GradCAMPP(YOLOV7GradCAM):
def __init__(self, model, layer_name, img_size=(640, 640)):
super(YOLOV7GradCAMPP, self).__init__(model, layer_name, img_size)
def forward(self, input_img, class_idx=True):
saliency_maps = []
b, c, h, w = input_img.size()
preds, logits = self.model(input_img)
for logit, cls, cls_name in zip(logits[0], preds[1][0], preds[2][0]):
if class_idx:
score = logit[cls]
else:
score = logit.max()
self.model.zero_grad()
score.backward(retain_graph=True)
gradients = self.gradients['value']
activations = self.activations['value']
b, k, u, v = gradients.size()
alpha_num = gradients.pow(2)
alpha_denom = gradients.pow(2).mul(2) + \
activations.mul(gradients.pow(3)).view(b, k, u * v).sum(-1, keepdim=True).view(b, k, 1, 1)
alpha_denom = torch.where(alpha_denom != 0.0, alpha_denom, torch.ones_like(alpha_denom))
alpha = alpha_num.div(alpha_denom + 1e-7)
positive_gradients = F.relu(score.exp() * gradients)
weights = (alpha * positive_gradients).view(b, k, u * v).sum(-1).view(b, k, 1, 1)
saliency_map = (weights * activations).sum(1, keepdim=True)
saliency_map = F.relu(saliency_map)
saliency_map = F.interpolate(saliency_map, size=(h, w), mode='bilinear', align_corners=False)
saliency_map_min, saliency_map_max = saliency_map.min(), saliency_map.max()
saliency_map = (saliency_map - saliency_map_min).div(saliency_map_max - saliency_map_min).data
saliency_maps.append(saliency_map)
return saliency_maps, logits, preds
References
[1] yolov5-gradcam
[2] gradcam_plus_plus-pytorch
[3] pytorch学习笔记十五:Hook函数与CAM可视化
[4] GAP CAM Grad-CAM Grad-CAM++的解释
Original: https://blog.csdn.net/weixin_43799388/article/details/126190981
Author: 嗜睡的篠龙
Title: 【YOLOv7】结合GradCAM热力图可视化
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/705750/
转载文章受原作者版权保护。转载请注明原作者出处!