语义分割损失函数系列(1):交叉熵损失函数

最近一直在做一些语义分割相关的项目,找损失函数的时候发现网上这些大佬的写得各有千秋,也没说怎么用,在此记录一下自己在训练过程中使用损失函数的一些心得.本人是使用的Pytorch框架,故这一系列都会基于Pytorch来实现。

首先是交叉熵损失函数,语义分割其实是一个逐像素分类的一个分类问题,做过图像分类的应该都比较熟悉交叉熵损失函数。

pytorch中自带有写好的交叉熵函数,只需要调用就行:

loss_func = nn.CrossEntropyLoss()

torch.nn模块中写好的损失函数都是以类的方式写的,只需要提前声明一下后面即可调用。

pytorch中交叉熵损失函数的实现:

class CrossEntropyLoss(_WeightedLoss):
    r"""This criterion computes the cross entropy loss between input and target.

    It is useful when training a classification problem with C classes.

    If provided, the optional argument :attr:weight should be a 1D Tensor
    assigning weight to each of the classes.

    This is particularly useful when you have an unbalanced training set.

    The input is expected to contain raw, unnormalized scores for each class.

    input has to be a Tensor of size either :math:(minibatch, C) or
    :math:(minibatch, C, d_1, d_2, ..., d_K) with :math:K \geq 1 for the
    K-dimensional case. The latter is useful for higher dimension inputs, such
    as computing cross entropy loss per-pixel for 2D images.

    The target that this criterion expects should contain either:

    - Class indices in the range :math:[0, C-1] where :math:C is the number of classes; if
      ignore_index is specified, this loss also accepts this class index (this index
      may not necessarily be in the class range). The unreduced (i.e. with :attr:reduction
      set to 'none') loss for this case can be described as:

      .. math::
          \ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
          l_n = - w_{y_n} \log \frac{\exp(x_{n,y_n})}{\sum_{c=1}^C \exp(x_{n,c})}
          \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}

      where :math:x is the input, :math:y is the target, :math:w is the weight,
      :math:C is the number of classes, and :math:N spans the minibatch dimension as well as
      :math:d_1, ..., d_k for the K-dimensional case. If
      :attr:reduction is not 'none' (default 'mean'), then

      .. math::
          \ell(x, y) = \begin{cases}
              \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n} \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}} l_n, &
               \text{if reduction} = \text{mean';}\\
                \sum_{n=1}^N l_n,  &
                \text{if reduction} = \text{'.}
            \end{cases}

      Note that this case is equivalent to the combination of :class:~torch.nn.LogSoftmax and
      :class:~torch.nn.NLLLoss.

    - Probabilities for each class; useful when labels beyond a single class per minibatch item
      are required, such as for blended labels, label smoothing, etc. The unreduced (i.e. with
      :attr:reduction set to 'none') loss for this case can be described as:

      .. math::
          \ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
          l_n = - \sum_{c=1}^C w_c \log \frac{\exp(x_{n,c})}{\exp(\sum_{i=1}^C x_{n,i})} y_{n,c}

      where :math:x is the input, :math:y is the target, :math:w is the weight,
      :math:C is the number of classes, and :math:N spans the minibatch dimension as well as
      :math:d_1, ..., d_k for the K-dimensional case. If
      :attr:reduction is not 'none' (default 'mean'), then

      .. math::
          \ell(x, y) = \begin{cases}
              \frac{\sum_{n=1}^N l_n}{N}, &
               \text{if reduction} = \text{mean';}\\
                \sum_{n=1}^N l_n,  &
                \text{if reduction} = \text{'.}
            \end{cases}

    .. note::
        The performance of this criterion is generally better when target contains class
        indices, as this allows for optimized computation. Consider providing target as
        class probabilities only when a single class label per minibatch item is too restrictive.

    Args:
        weight (Tensor, optional): a manual rescaling weight given to each class.

            If given, has to be a Tensor of size C
        size_average (bool, optional): Deprecated (see :attr:reduction). By default,
            the losses are averaged over each loss element in the batch. Note that for
            some losses, there are multiple elements per sample. If the field :attr:size_average
            is set to , the losses are instead summed for each minibatch. Ignored
            when :attr:reduce is . Default: 
        ignore_index (int, optional): Specifies a target value that is ignored
            and does not contribute to the input gradient. When :attr:size_average is
            , the loss is averaged over non-ignored targets. Note that
            :attr:ignore_index is only applicable when the target contains class indices.

        reduce (bool, optional): Deprecated (see :attr:reduction). By default, the
            losses are averaged or summed over observations for each minibatch depending
            on :attr:size_average. When :attr:reduce is , returns a loss per
            batch element instead and ignores :attr:size_average. Default: 
        reduction (string, optional): Specifies the reduction to apply to the output:
            'none' | 'mean' | 'sum'. 'none': no reduction will
            be applied, 'mean': the weighted mean of the output is taken,
            'sum': the output will be summed. Note: :attr:size_average
            and :attr:reduce are in the process of being deprecated, and in
            the meantime, specifying either of those two args will override
            :attr:reduction. Default: 'mean'
        label_smoothing (float, optional): A float in [0.0, 1.0]. Specifies the amount
            of smoothing when computing the loss, where 0.0 means no smoothing. The targets
            become a mixture of the original ground truth and a uniform distribution as described in
            Rethinking the Inception Architecture for Computer Vision __. Default: :math:0.0.

    Shape:
        - Input: :math:(N, C) where C = number of classes, or
          :math:(N, C, d_1, d_2, ..., d_K) with :math:K \geq 1
          in the case of K-dimensional loss.

        - Target: If containing class indices, shape :math:(N) where each value is
          :math:0 \leq \text{targets}[i] \leq C-1, or :math:(N, d_1, d_2, ..., d_K) with
          :math:K \geq 1 in the case of K-dimensional loss. If containing class probabilities,
          same shape as the input.

        - Output: If :attr:reduction is 'none', shape :math:(N) or
          :math:(N, d_1, d_2, ..., d_K) with :math:K \geq 1 in the case of K-dimensional loss.

          Otherwise, scalar.

    Examples::

        >>> # Example of target with class indices
        >>> loss = nn.CrossEntropyLoss()
        >>> input = torch.randn(3, 5, requires_grad=True)
        >>> target = torch.empty(3, dtype=torch.long).random_(5)
        >>> output = loss(input, target)
        >>> output.backward()
        >>>
        >>> # Example of target with class probabilities
        >>> input = torch.randn(3, 5, requires_grad=True)
        >>> target = torch.randn(3, 5).softmax(dim=1)
        >>> output = loss(input, target)
        >>> output.backward()
"""
    __constants__ = ['ignore_index', 'reduction', 'label_smoothing']
    ignore_index: int
    label_smoothing: float

    def __init__(self, weight: Optional[Tensor] = None, size_average=None, ignore_index: int = -100,
                 reduce=None, reduction: str = 'mean', label_smoothing: float = 0.0) -> None:
        super(CrossEntropyLoss, self).__init__(weight, size_average, reduce, reduction)
        self.ignore_index = ignore_index
        self.label_smoothing = label_smoothing

    def forward(self, input: Tensor, target: Tensor) -> Tensor:
        return F.cross_entropy(input, target, weight=self.weight,
                               ignore_index=self.ignore_index, reduction=self.reduction,
                               label_smoothing=self.label_smoothing)

在声明损失函数的时候可以加上一些参数,其中比较重要的是weight: Optional[Tensor] = None,
weight权重是在计算中给每一类加上相应的计算权重,是一个tensor,长度和类别数一致;以及label_smoothing,label_smoothing方法在交叉熵损失函数中自带有,假设label_smoothing = 0.1的话,在二分类问题中就会将类别为1的变为0.9,类别为0的变为0.1,这样做能够让损失更加平滑,更容易收敛,避免错误分类带来的过大的损失。

在使用也就是计算loss值的时候需要两个参数,一个是input,一个是target,两个都是tensor,input是你模型的预测结果,target是真实标注。例如:

import torch
import torch.nn as nn

loss_func = nn.CrossEntropyLoss()

input = torch.randn(3, 5, requires_grad=True)

target = torch.empty(3, dtype=torch.long).random_(5)
output = loss_func(input, target)
print(output)

input: tensor([[ 1.6738,  0.0526,  0.6329, -0.8809,  1.4822],
        [-0.5908,  1.5717,  1.3402,  0.4227, -0.3498],
        [-0.3359, -2.3797, -1.6206, -2.3070,  0.6010]], requires_grad=True)
target: tensor([3, 4, 1])
loss: tensor(3.2306, grad_fn=<NllLossBackward0>)

上面这就类似与一个五分类的问题,input是模型最后的全连接层的输出,或者是全卷积网络最后的输出。

加入你把分类输出的概率做了argmax操作以后,计算会出错,例如:

import torch
import torch.nn as nn

loss_func = nn.CrossEntropyLoss()

input = torch.randn(3,requires_grad=True)
print("input:",input)
target = torch.empty(3, dtype=torch.long).random_(5)
print("target:",target)
output = loss_func(input, target)
print("loss:",output)

input: tensor([-0.3463,  1.2289,  0.2517], requires_grad=True)
target: tensor([3, 4, 3])
Traceback (most recent call last):
  File "/home/lwf/Project/MRI-Segmentation/tets.py", line 19, in <module>
    output = loss_func(input, target)
  File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 1152, in forward
    label_smoothing=self.label_smoothing)
  File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/functional.py", line 2846, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected floating point type for target with class probabilities, got Long

也就是说,input的输出必须是每个类别的概率。

Original: https://blog.csdn.net/lwf1881/article/details/123078571
Author: spectrelwf
Title: 语义分割损失函数系列(1):交叉熵损失函数

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/707484/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球