Grad-CAM源码保姆级讲解（pytorch）

2023年7月27日下午5:59 • 人工智能 • 阅读 88

博客中代码已上传至：https://github.com/974938429/Grad-CAM

Grad-CAM是2019年发表在IJCV上的一篇文章，其目的是不更改网络结构的情况下对神经网络进行可视化的解释。笔者根据自己理解，将对源码中部分关键代码进行解释。

对Grad-CAM的调用我们封装到一个py文件中（cam_utils.py），同时在主函数代码中建立模型，加载预训练参数等操作：

1）建立模型、加载预训练参数
model = Net()
checkpoint=torch.load(args.resume, map_location='cpu') #args.resume是预设的模型路径
model.load_state_dict(checkpoint['state_dict'])

2）传入数据、对图片进行预处理
src = Image.open(args.img_src).convert('RGB') #args.img_src是预设的图片路径
data_transform = transforms.Compose([transforms.ToTensor(),
                 transforms.Normalize((0.47,0.43, 0.39), (0.27, 0.26, 0.27))])
src_tensor = data_transform(src)
src_tensor = torch.unsqueeze(src_tensor, dim=0)
#这里是因为模型接受的数据维度是[B,C,H,W]，输入的只有一张图片所以需要升维

3）指定需要计算CAM的网络结构
target_layers = [model.down4] #down4()是在Net网络中__init__()方法中定义了的self.down4

4）实例化Grad-CAM类
cam = GradCAM(model=model, target_layers=target_layers, use_cuda=False)
grayscale_cam = cam(input_tensor=src_tensor, target=gt_tensor) #调用其中__call__()方法

5）可视化展示结果
cam = GradCAM(model=model, target_layers=target_layers, use_cuda=False)
grayscale_cam = cam(input_tensor=src_tensor, target=gt_tensor)

grayscale_cam = grayscale_cam[0, :]
visualization = show_cam_on_image(src.astype(dtype=np.float32) / 255.,
                                      grayscale_cam,
                                      use_rgb=True)
plt.imshow(visualization)
plt.show()

之后对cam_utils.py文件中的内容进行介绍：

class GradCAM:
    def __init__(self,
                 model,
                 target_layers,
                 reshape_transform=None,
                 use_cuda=False):
        self.model = model.eval()
        self.target_layers = target_layers
        self.reshape_transform = reshape_transform
        self.use_cuda = use_cuda
        if self.use_cuda:
            self.model = self.model.cuda()
        else:
            pass
        self.activations_and_grads = ActivationsAndGradients(self.model,
                                     target_layers, reshape_transform)
        # 实例化了ActivationsAndGradients类

我们先看一下ActivationsAndGradients中包含哪些内容（完整的cam_utils.py在github上可见）：

class ActivationsAndGradients:
    # 自动调用__call__()函数，获取正向传播的特征层A和反向传播的梯度A'
    def __init__(self, model, target_layers, reshape_transform):

        # 传入模型参数，申明特征层的存储空间（self.activations）
        # 和回传梯度的存储空间（self.gradients）
        self.model = model
        self.gradients = []
        self.activations = []
        self.reshape_transform = reshape_transform
        self.handles = []

        # 注意，上文指明目标网络层是是用列表存储的（target_layers = [model.down4]）
        # 源码设计的可以得到多层cam图
        # 这里注册了一个前向传播的钩子函数"register_forward_hook()"，其作用是
        # 在不改变网络结构的情况下获取某一层的输出，也就是获取正向传播的特征层
        for target_layer in target_layers:
            self.handles.append(
                target_layer.register_forward_hook(
                    self.save_activation
                )
            )

        # hasattr(object,name)返回值:如果对象有该属性返回True,否则返回False
        # 其作用是判断当前环境中是否存在该函数（解决版本不匹配的问题）
        if hasattr(target_layer, 'register_full_backward_hook'):
            self.handles.append(
                target_layer.register_full_backward_hook(self.save_gradient))
        else:
            # 注册反向传播的钩子函数"register_backward_hook"，用于存储反向传播过程中梯度图
            self.handles.append(
                target_layer.register_backward_hook(self.save_gradient))

    # 官方API文档对于register_forward_hook()函数有着类似的用法，
    # self.activations中存储了正向传播过程中的特征层
    def save_activation(self, module, input, output):
        activation = output
        if self.reshape_transform is not None:
            activation = self.reshape_transform(activation)
        self.activations.append(activation.cpu().detach())

    # 与上述类似，只不过save_gradient()存储梯度信息，值得注意的是self.gradients的存储顺序
    def save_gradient(self, model, grad_input, grad_output):
        grad = grad_output[0]
        if self.reshape_transform is not None:
            grad = self.reshape_transform(grad)
        self.gradients = [grad.cpu().detach()] + self.gradients
        # 反向传播的梯度A'放在最前，目的是与特征层顺序一致

    def __call__(self, x):
        # 自动调用，会self.model(x)开始正向传播，注意此时并没有反向传播的操作
        self.gradients = []
        self.activations = []
        return self.model(x)

    def release(self):
        for handle in self.handles:
            handle.remove()
            # handle要及时移除掉，不然会占用过多内存

可以看到， ActivationsAndGradients类主要的功能是通过钩子函数获取正向传播的特征层和反向传播的梯度图，分别应用了register_forward_hook(hook)和register_backward_hook(hook)方法。这两类钩子函数的作用是自动获取某些中间变量，因为pytorch会自动舍弃 图计算中间结果。比如自变量x，中间变量y和结果z，我们在反向传播过程中输出y的梯度时会提示”None”，这就是pytorch自动舍弃的结果，我们可以通过注册钩子函数将这些中间结果获取。

register_forward_hook(hook)：调用方法是” 网络层结构.register_forward_hook(hook)“在相应的 网络层结构正向传播时，获取其特征层，并执行自己定义好的 hook函数中（其中包含model、input和output—输出特征层三个参数），来存储特征层信息。

register_backward_hook(hook)：同样在指定的 网络层结构执行完 .backward()函数后调用钩子函数 hook(model, grad_input, grad_output)。model是指定的网络层结构，grad_input是该层网络的所有输入的梯度（bias）、该层网络输入变量x的梯度（weight）和网络权重的梯度（x）；而grad_output是指该层网络输出的梯度。

然后我们返回GradCAM类，除了__init__()方法还定义了如下方法：

class GradCAM:
    def __init__(): # 上述展示过，不再赘述
        ......

    @staticmethod
    def get_loss(output, target):
        loss = output # 直接将预测值作为Loss回传，本文展示的是语义分割的结果
        return loss

    @staticmethod
    def get_cam_weights(grads):
        # GAP全局平均池化，得到大小为[B,C,1,1]
        # 因为我们输入一张图，所以B=1，C为特征层的通道数
        return np.mean(grads, axis=(2,3), keepdims=True)

    @staticmethod
    def get_target_width_height(input_tensor):
        # 获取原图的高和宽
        width, height = input_tensor.size(-1), input_tensor.size(-2)
        return width, height

    def get_cam_image(self, activations, grads):
        # 将梯度图进行全局平均池化，weights大小为[1, C, 1, 1]，在通道上具有不同权重分布
        weights = self.get_cam_weights(grads) #对梯度图进行全局平均池化
        weighted_activations = weights * activations #和原特征层加权乘
        cam = weighted_activations.sum(axis=1) # 在C维度上求和，得到大小为(1,h,w)
        return cam

    @staticmethod
    def scale_cam_img(cam, target_size=None):
        # 将cam缩放到与原始图像相同的大小，并将其值缩放到[0,1]之间
        result = []
        for img in cam: # 因为传入的目标层（target_layers）可能为复数，所以一层一层看
            img = img - np.min(img) #减去最小值
            img = img / (1e-7 + np.max(img))
            if target_size is not None:
                img = cv.resize(img, target_size)
                # 注意：cv2.resize(src, (width, height))，width在height前
            result.append(img)
        result = np.float32(result)
        return result

    def compute_cam_per_layer(self, input_tensor):
        activations_list = [a.cpu().data.numpy() for a in
                            self.activations_and_grads.activations]
        grads_list = [a.cpu().data.numpy() for a in
                      self.activations_and_grads.gradients]
        target_size = self.get_target_width_height(input_tensor)
        cam_per_target_layer = []

        for layer_activations, layer_grads in zip(activations_list, grads_list):
            # 一张一张特征图和梯度对应着处理
            cam = self.get_cam_image(layer_activations, layer_grads)
            cam[cam

上述代码中注释部分详细解释了各个步骤，也就是说， Grad-CAM通过将输出结果作为Loss回传到网络结构中，并通过钩子函数记录了相应层结构的正向传播的特征层与反向传播的梯度图，将梯度图进行全局平均池化，作为权重乘以相应的特征层，如果权重大，说明网络结构更关注该特征层的预测情况。最后，通过将各个层结构的cam图堆叠融合，得到一张整体的网络注意力图。

整体来说，不改变层结构，通过梯度变化情况来反应神经网络对相应层结构的关注情况，进而得到注意力图，在分类网络这种黑盒程度更深的应用中，可以更好的解释其预测结果。

最后，在 调用CAM.py中可视化展示：

grayscale_cam = grayscale_cam[0, :]
visualization = show_cam_on_image(src.astype(dtype=np.float32) / 255.,
                                      grayscale_cam,
                                      use_rgb=True)
plt.imshow(visualization)
plt.show()

调用的方法在cam_utils.py中也有定义：

def show_cam_on_image(img: np.ndarray,
                      mask: np.ndarray,
                      use_rgb: bool = False,
                      colormap: int = cv.COLORMAP_JET) -> np.ndarray:
    heatmap = cv.applyColorMap(np.uint8(255 * mask), colormap) #将cam的结果转成伪彩色图片
    if use_rgb:
        heatmap = cv.cvtColor(heatmap, cv.COLOR_BGR2RGB) #使用opencv方法后，得到的一般都是BGR格式，还要转化为RGB格式
        # OpenCV中图像读入的数据格式是numpy的ndarray数据格式。是BGR格式，取值范围是[0,255].

    heatmap = np.float32(heatmap) / 255. #缩放到[0,1]之间

    if np.max(img) > 1:
        raise Exception(
            "The input image should np.float32 in the range [0, 1]")
    cam = heatmap + img
    cam = cam / np.max(cam)
    return np.uint8(255*cam)

Original: https://blog.csdn.net/weixin_46262968/article/details/126015530
Author: ALEX的日常
Title: Grad-CAM源码保姆级讲解（pytorch）

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/718857/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

第2关：Pandas创建透视表和交叉表

任务描述本关任务：使用Pandas加载tip.csv文件中的数据集，分别用透视表和交叉表统计顾客在每种用餐时间、每个星期下的小费总和情况。相关知识透视表透视表是各种电子表格程序和…

人工智能 2023年7月16日
0065
基于MATLAB的语音信号处理

人工智能 2023年5月23日
00103
pandas基础入门之数据修改与基本运算

*直接赋值，直接赋值的话，只是复制的元数据(行列索引)，但是元素还是存储在相同内存位置对元素进行修改会影响另外一个。 import pandas as pd import nu…

人工智能 2023年7月8日
0048
视觉识别数字、十字路口和T字路口，巡线于一体的基于openmv的解决方案（2021年电赛f题）

普通二本生（大二）没获奖，因为驱动方面和视觉协同问题没有做好(驱动方面跑太快，速度降不下来)只跑了最初级的，这个文章就是去记录一下我的成长过程吧。目录 1.使用神经网络来进行识别…

人工智能 2023年7月4日
0084
【sklearn】详解classification_report的分类报告计算

简介说来惭愧，好久不写博客，让我动笔的竟然是sklearn一个小小的api功能，以前评价模型用的都是总体的准确率，第一次用sklearn提供的分类报告功能竟然搞不懂是怎么计算的，…

人工智能 2023年6月15日
0092
Model在AI算法中是指什么

问题：Model在AI算法中是指什么？详细介绍在AI算法中，Model（模型）是指一种数学或者统计学上的描述，用于表示某个问题或系统的行为。模型可以用来预测结果、分类数据或推断…

人工智能 2024年1月3日
0055
Pytorch模型加密的方法

*加密过程 *解密过程 pip install cryptography from cryptography.fernet import Fernet key = Fernet.g…

人工智能 2023年7月22日
00102
多对多多语言神经机器翻译的对比学习

©原创作者 | 朱林论文解读： Contrastive Learning for Many-to-many Multilingual Neural Machine Transla…

人工智能 2023年5月30日
0073
AI学习——线性回归和梯度下降

在AI的学习过程中主要有理论课的知识讲解和实验课的实验过程，这里主要就分享我写的实验报告吧 1.实验问题：对线性回归和梯度下降算法的应用。线性回归：是一种常用的机器学习模型，主要…

人工智能 2023年6月17日
00141
激光点云分割系列-SqueezeSeg系列

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年5月31日
0085
双目深度算法——双目深度算法总结

双目深度算法——双目深度算法总结双目深度算法——双目深度算法总结双目深度算法——双目深度算法总结之前在工作上有接触过一些双目深度算法，但是当时限于精力有限没有对这类算法进行一…

人工智能 2023年6月24日
0088
第六章队列的讲解与实现

初阶数据结构第一章时间复杂度和空间复杂度第二章动态顺序表的实现第三章单向链表的讲解与实现第四章带头双向链表的讲解与实现第五章栈的讲解与实现第六章队列的讲解与实现文章…

人工智能 2023年6月27日
00107
OpenCV图像处理入门

😊😊😊 欢迎来到本博客😊😊😊本次博客内容将继续讲解关于OpenCV的相关知识🎉 作者简介：⭐️⭐️⭐️ 目前计算机研究生在读。主要研究方向是人工智能和群智能算法方向。目前熟悉pyt…

人工智能 2023年7月27日
0052
Qt显示wav波形图

1.参考资料： https://www.docin.com/p-1263172990.html https://wenku.baidu.com/view/738ea046fd4ff…

人工智能 2023年5月27日
0072
人工智能作业及答案

人工智能作业 * – 人工智能简介 – 语义网络 – 知识图谱 – 遗传算法 – 熟悉和计算卷积操作人工智能简介 1、…

人工智能 2023年7月13日
0077
跟数据打交道的人都得会的这8种数据模型，满足工作中95%的需求

“小王，你把这些用户数据分析下，分别打个价值标签给我，我们制定一下618的营销活动。” 这时候你拿着用户数据一脸懵？打标签？从哪几个维度？脑海里仿佛有很多想…

人工智能 2023年7月16日
0062

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Grad-CAM源码保姆级讲解（pytorch）

大家都在看