语义分割损失函数系列(1):交叉熵损失函数

2023年7月21日下午5:07 • 人工智能 • 阅读 36

最近一直在做一些语义分割相关的项目,找损失函数的时候发现网上这些大佬的写得各有千秋,也没说怎么用,在此记录一下自己在训练过程中使用损失函数的一些心得.本人是使用的Pytorch框架,故这一系列都会基于Pytorch来实现。

首先是交叉熵损失函数，语义分割其实是一个逐像素分类的一个分类问题，做过图像分类的应该都比较熟悉交叉熵损失函数。

pytorch中自带有写好的交叉熵函数，只需要调用就行：

loss_func = nn.CrossEntropyLoss()

torch.nn模块中写好的损失函数都是以类的方式写的，只需要提前声明一下后面即可调用。

pytorch中交叉熵损失函数的实现：

class CrossEntropyLoss(_WeightedLoss):
    r"""This criterion computes the cross entropy loss between input and target.

    It is useful when training a classification problem with C classes.

    If provided, the optional argument :attr:weight should be a 1D Tensor
    assigning weight to each of the classes.

    This is particularly useful when you have an unbalanced training set.

    The input is expected to contain raw, unnormalized scores for each class.

    input has to be a Tensor of size either :math:(minibatch, C) or
    :math:(minibatch, C, d_1, d_2, ..., d_K) with :math:K \geq 1 for the
    K-dimensional case. The latter is useful for higher dimension inputs, such
    as computing cross entropy loss per-pixel for 2D images.

    The target that this criterion expects should contain either:

    - Class indices in the range :math:[0, C-1] where :math:C is the number of classes; if
      ignore_index is specified, this loss also accepts this class index (this index
      may not necessarily be in the class range). The unreduced (i.e. with :attr:reduction
      set to 'none') loss for this case can be described as:

      .. math::
          \ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
          l_n = - w_{y_n} \log \frac{\exp(x_{n,y_n})}{\sum_{c=1}^C \exp(x_{n,c})}
          \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}

      where :math:x is the input, :math:y is the target, :math:w is the weight,
      :math:C is the number of classes, and :math:N spans the minibatch dimension as well as
      :math:d_1, ..., d_k for the K-dimensional case. If
      :attr:reduction is not 'none' (default 'mean'), then

      .. math::
          \ell(x, y) = \begin{cases}
              \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n} \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}} l_n, &
               \text{if reduction} = \text{mean';}\\
                \sum_{n=1}^N l_n,  &
                \text{if reduction} = \text{'.}
            \end{cases}

      Note that this case is equivalent to the combination of :class:~torch.nn.LogSoftmax and
      :class:~torch.nn.NLLLoss.

    - Probabilities for each class; useful when labels beyond a single class per minibatch item
      are required, such as for blended labels, label smoothing, etc. The unreduced (i.e. with
      :attr:reduction set to 'none') loss for this case can be described as:

      .. math::
          \ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
          l_n = - \sum_{c=1}^C w_c \log \frac{\exp(x_{n,c})}{\exp(\sum_{i=1}^C x_{n,i})} y_{n,c}

      where :math:x is the input, :math:y is the target, :math:w is the weight,
      :math:C is the number of classes, and :math:N spans the minibatch dimension as well as
      :math:d_1, ..., d_k for the K-dimensional case. If
      :attr:reduction is not 'none' (default 'mean'), then

      .. math::
          \ell(x, y) = \begin{cases}
              \frac{\sum_{n=1}^N l_n}{N}, &
               \text{if reduction} = \text{mean';}\\
                \sum_{n=1}^N l_n,  &
                \text{if reduction} = \text{'.}
            \end{cases}

    .. note::
        The performance of this criterion is generally better when target contains class
        indices, as this allows for optimized computation. Consider providing target as
        class probabilities only when a single class label per minibatch item is too restrictive.

    Args:
        weight (Tensor, optional): a manual rescaling weight given to each class.

            If given, has to be a Tensor of size C
        size_average (bool, optional): Deprecated (see :attr:reduction). By default,
            the losses are averaged over each loss element in the batch. Note that for
            some losses, there are multiple elements per sample. If the field :attr:size_average
            is set to , the losses are instead summed for each minibatch. Ignored
            when :attr:reduce is . Default: 
        ignore_index (int, optional): Specifies a target value that is ignored
            and does not contribute to the input gradient. When :attr:size_average is
            , the loss is averaged over non-ignored targets. Note that
            :attr:ignore_index is only applicable when the target contains class indices.

        reduce (bool, optional): Deprecated (see :attr:reduction). By default, the
            losses are averaged or summed over observations for each minibatch depending
            on :attr:size_average. When :attr:reduce is , returns a loss per
            batch element instead and ignores :attr:size_average. Default: 
        reduction (string, optional): Specifies the reduction to apply to the output:
            'none' | 'mean' | 'sum'. 'none': no reduction will
            be applied, 'mean': the weighted mean of the output is taken,
            'sum': the output will be summed. Note: :attr:size_average
            and :attr:reduce are in the process of being deprecated, and in
            the meantime, specifying either of those two args will override
            :attr:reduction. Default: 'mean'
        label_smoothing (float, optional): A float in [0.0, 1.0]. Specifies the amount
            of smoothing when computing the loss, where 0.0 means no smoothing. The targets
            become a mixture of the original ground truth and a uniform distribution as described in
            Rethinking the Inception Architecture for Computer Vision __. Default: :math:0.0.

    Shape:
        - Input: :math:(N, C) where C = number of classes, or
          :math:(N, C, d_1, d_2, ..., d_K) with :math:K \geq 1
          in the case of K-dimensional loss.

        - Target: If containing class indices, shape :math:(N) where each value is
          :math:0 \leq \text{targets}[i] \leq C-1, or :math:(N, d_1, d_2, ..., d_K) with
          :math:K \geq 1 in the case of K-dimensional loss. If containing class probabilities,
          same shape as the input.

        - Output: If :attr:reduction is 'none', shape :math:(N) or
          :math:(N, d_1, d_2, ..., d_K) with :math:K \geq 1 in the case of K-dimensional loss.

          Otherwise, scalar.

    Examples::

        >>> # Example of target with class indices
        >>> loss = nn.CrossEntropyLoss()
        >>> input = torch.randn(3, 5, requires_grad=True)
        >>> target = torch.empty(3, dtype=torch.long).random_(5)
        >>> output = loss(input, target)
        >>> output.backward()
        >>>
        >>> # Example of target with class probabilities
        >>> input = torch.randn(3, 5, requires_grad=True)
        >>> target = torch.randn(3, 5).softmax(dim=1)
        >>> output = loss(input, target)
        >>> output.backward()
"""
    __constants__ = ['ignore_index', 'reduction', 'label_smoothing']
    ignore_index: int
    label_smoothing: float

    def __init__(self, weight: Optional[Tensor] = None, size_average=None, ignore_index: int = -100,
                 reduce=None, reduction: str = 'mean', label_smoothing: float = 0.0) -> None:
        super(CrossEntropyLoss, self).__init__(weight, size_average, reduce, reduction)
        self.ignore_index = ignore_index
        self.label_smoothing = label_smoothing

    def forward(self, input: Tensor, target: Tensor) -> Tensor:
        return F.cross_entropy(input, target, weight=self.weight,
                               ignore_index=self.ignore_index, reduction=self.reduction,
                               label_smoothing=self.label_smoothing)

在声明损失函数的时候可以加上一些参数，其中比较重要的是weight: Optional[Tensor] = None，
weight权重是在计算中给每一类加上相应的计算权重，是一个tensor，长度和类别数一致；以及label_smoothing，label_smoothing方法在交叉熵损失函数中自带有，假设label_smoothing = 0.1的话，在二分类问题中就会将类别为1的变为0.9，类别为0的变为0.1，这样做能够让损失更加平滑，更容易收敛，避免错误分类带来的过大的损失。

在使用也就是计算loss值的时候需要两个参数，一个是input，一个是target，两个都是tensor，input是你模型的预测结果，target是真实标注。例如：

import torch
import torch.nn as nn

loss_func = nn.CrossEntropyLoss()

input = torch.randn(3, 5, requires_grad=True)

target = torch.empty(3, dtype=torch.long).random_(5)
output = loss_func(input, target)
print(output)

input: tensor([[ 1.6738,  0.0526,  0.6329, -0.8809,  1.4822],
        [-0.5908,  1.5717,  1.3402,  0.4227, -0.3498],
        [-0.3359, -2.3797, -1.6206, -2.3070,  0.6010]], requires_grad=True)
target: tensor([3, 4, 1])
loss: tensor(3.2306, grad_fn=<NllLossBackward0>)

上面这就类似与一个五分类的问题，input是模型最后的全连接层的输出，或者是全卷积网络最后的输出。

加入你把分类输出的概率做了argmax操作以后，计算会出错，例如：

import torch
import torch.nn as nn

loss_func = nn.CrossEntropyLoss()

input = torch.randn(3,requires_grad=True)
print("input:",input)
target = torch.empty(3, dtype=torch.long).random_(5)
print("target:",target)
output = loss_func(input, target)
print("loss:",output)

input: tensor([-0.3463,  1.2289,  0.2517], requires_grad=True)
target: tensor([3, 4, 3])
Traceback (most recent call last):
  File "/home/lwf/Project/MRI-Segmentation/tets.py", line 19, in <module>
    output = loss_func(input, target)
  File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 1152, in forward
    label_smoothing=self.label_smoothing)
  File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/functional.py", line 2846, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected floating point type for target with class probabilities, got Long

也就是说，input的输出必须是每个类别的概率。

Original: https://blog.csdn.net/lwf1881/article/details/123078571
Author: spectrelwf
Title: 语义分割损失函数系列(1):交叉熵损失函数

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/707484/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

新鲜出炉的 NLP 算法岗社招面试经验分享

最近终于做好了选择，决定从杭州”搬迁”到了上海，一切安顿好之后，终于有功夫可以好好整理一下近期面试遇到的一些问题以及自己的一些小经验啦，希望对同样有跳槽需求…

人工智能 2023年5月28日
00101
李沐动手学深度学习pytorch :问题：找不到d2l包，No module named ‘d2l’

### 回答1：动手学_深度第二版（ _D2L 包）是一套用于深度学习_的 _Python_开源软件 _包。它的目标是帮助用户轻松地学_习和实践 _深度学习_算法。 _D2…

人工智能 2023年7月22日
0085
TensorRT&Triton学习笔记(一)：triton和模型部署+client

前言先介绍TensorRT、Triton的关系和区别： TensorRT：为inference（推理）为生，是NVIDIA研发的一款针对深度学习模型在GPU上的计算，显著提高GP…

人工智能 2023年6月17日
0084
【OpenCV】在Python环境下安装OpenCV并检测是否安装成功

一、OpenCV概述 OpenCV是一个开源的计算机视觉库，可以在Windows、Linux、MacOS等操作系统上运行。它起源于英特尔性能实验室的实验研究，由俄罗斯的专家负责实现…

人工智能 2023年5月26日
0075
Python Pandas窗口函数

为了能更好地处理数值型数据，Pandas 提供了几种窗口函数，比如移动函数（rolling）、扩展函数（expanding）和指数加权函数（ewm）。窗口函数应用场景非常多。举一…

人工智能 2023年7月8日
0079
Neo4j Desktop 导入RDF

Neo4j Desktop 导入RDF Neo4j Desktop 导入RDF * 安装插件Neosemantics (n10s) 打开Neo4j Browser 创建约束初始化…

人工智能 2023年6月1日
0096
（FFT）快速傅里叶变换在目标跟踪中的运用

随着科学技术的不断发展，许多用于加快计算速度的算法应运而生，快速傅里叶变换就是其中之一，快速傅里叶变换是傅里叶变换的一种快速计算方式。傅里叶变换在科学研究中运用非常广泛，刚开始出现…

人工智能 2023年6月20日
00165
nn.Upsample

写在前面：在PyTorch中有两种上采样/下采样的方法，一种是Upsample，另一种是interpolate这两个函数的使用方法略有差异，这里仅介绍Upsample Upsamp…

人工智能 2023年7月21日
0046
java 合成mp3_java如何把文本合成音频格式（MP3）

[ JAVA_开发人员必备是HTML格式的 _Java_TM 2 Platform Standard Edition 6 API 规范本文档是 _Java 2 Platform …

人工智能 2023年5月27日
0068
【深度学习】第四章：循环神经网络

文章目录 1. 为什么要使用循环神经网络？ 2. 简单循环神经网络 3. 双向循环神经网络 4. 应用到机器学习 5. 参数学习 * 5.1 随时间反向传播算法（BPTT） 6. …

人工智能 2023年6月27日
0050
【深度学习】全连接层

4.5 全连接层全连接层是一个列向量(单个样本)。通常用于深度神经网络的后面几层，用于图像分类任务。全连接层，是每一个结点都与上一层的所有结点相连，用来把前边提取到的特征综合起…

人工智能 2023年6月16日
00122
计算机网络 – IPv4 常考知识点详解(超详细！)

目录一、IPv4分组 1、IPv4分组的格式 2、IP数据报分片 3、网络层转发分组的流程二、IPv4地址与NAT 1、IPv4地址 2、NAT 三、子网划分与子网掩码、CID…

人工智能 2023年6月27日
00137
【ResNet】Pytorch从零构建ResNet50

Pytorch从零构建ResNet 第一章从零构建ResNet18第二章从零构建ResNet50 文章目录 Pytorch从零构建ResNet 前言一、Res50和Res18…

人工智能 2023年7月30日
0046
python数据处理 – 4

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年7月8日
0042
conda常用命令汇总

目录一、conda命令二、conda info 三、conda create 四、conda install 五、conda remove 六、conda list 七、con…

人工智能 2023年7月3日
00103
什么是随机森林回归

什么是随机森林回归？随机森林回归是一种集成学习方法，通过使用多个决策树进行回归任务的预测。它基于随机子样本选择和随机特征选择的策略，可以有效地处理高维数据和特征之间的复杂关系。随…

人工智能 2023年12月31日
0061

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

语义分割损失函数系列(1):交叉熵损失函数

大家都在看