mask rcnn 超详细代码解读（一）

2023年5月25日上午2:49 • 人工智能 • 阅读 119

mask r-cnn 代码解读（一）

文章目录

1 代码架构
2 model.py 的结构
3 train过程代码解析
*
3.1 Resnet Graph
3.2 Region Proposal Network (RPN)
3.3 Proposal Layer

本系列将对 mask r-cnn 的代码做非常详细的讲解。
默认教程使用者已经对mask r-cnn的结构基本了解，因此不对原论文做解析

、最好是读者手头有完整的mrcnn代码（没有也没事，会贴），对照着代码和博客来理解。

本文将通过对代码的解析，重新梳理网络结构中的歧义。

[En]

This article will sort out the ambiguities in the network structure again by parsing the code.

1 代码架构

如下图所示，mrcnn 中包含四个主要的python文件：

config,py ：代码中涉及的超参数放在此文件中
model.py ：深度网络的build代码
utils.py ：涉及一些工具方法
visualize.py ：预测(detect)得到结果后，将预测的Bbox和mask结合原图重新绘制
另外还有 parallel_model.py ，是为了方便模型在多GPU上训练。

本文将主要根据 model.py 的内容进行，此过程中调用了别的代码时，也会对被调用的解析。

; 2 model.py 的结构

model.py 特别长，可根据内容划分成如下块：

块作用Utility Functions对日志记录等做一些规定Resnet Graph基础网络，提取特征（feature）Region Proposal Network (RPN)在feature上生成anchors，并给出粗略的前景/背景判断结果、粗略的Bbox坐标Proposal Layer输入rpn网络得到的Bbox，过滤背景得到proposalsROIAlign Layer输入基础网络提取到的特征和 rois 的坐标，剪裁得到特征上坐标对应的部分。并reshape到相同大小Detection Target Layertrain过程中，根据ground truth 和 proposal得到 rois（region of interest）Detection Layerinference过程中，根据proposals和Bbox deltas得到最后的Bbox结果Feature Pyramid Network Heads输入pool后的特征，预测物体class和maskMaskRCNN Class将上述结构串起来，最后返回一个model对象Loss Functions损失函数相关Data Generator管理训练数据等Data Formatting对输入图片的一些处理Miscellenous Graph Functions其他一些函数

接下来将按照 Resnet Graph → RPN → Proposal Layer → ROIAlign Layer → Detection Target Layer → Feature Pyramid Network Heads → MaskRCNN Class 的顺序讲解 train 部分的代码。

然后是 inference 中不同的代码部分。

然后是丢失功能和数据优先操作。

[En]

Then there is the loss function and data first off operation.

整个解析中，会：

贴上增加注释的代码
将代码转换为易于理解的流程图

[En]

convert the code into an easy-to-understand flowchart*
对代码中不易理解/别扭（个人认为）的地方额外解释

3 train过程代码解析

3.1 Resnet Graph

这部分代码比较简单，网络结构也很清晰。有三种方法：

[En]

This part of the code is relatively simple, and the network structure is very clear. There are three methods:

def identity_block
def conv_block
def resnet_graph

其中 identity_block 和 conv_block 是定义的两个卷积块，也就是两种不同的卷积方式：

具体的代码也贴在下方：

def identity_block(input_tensor, kernel_size, filters, stage, block,
                   use_bias=True, train_bn=True):
    """The identity_block is the block that has no conv layer at shortcut
    # Arguments
        input_tensor: input tensor
        kernel_size: default 3, the kernel size of middle conv layer at main path
        filters: list of integers, the nb_filters of 3 conv layer at main path
        stage: integer, current stage label, used for generating layer names
        block: 'a','b'..., current block label, used for generating layer names
        use_bias: Boolean. To use or not use a bias in conv layers.

        train_bn: Boolean. Train or freeze Batch Norm layers
"""
    '''
    一个1x1卷积 → 一个kxk卷积（kernel size） → 一个1x1卷积 → 卷积结果和input加起来（像素相加）
    '''
    nb_filter1, nb_filter2, nb_filter3 = filters
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    x = KL.Conv2D(nb_filter1, (1, 1), name=conv_name_base + '2a',
                  use_bias=use_bias)(input_tensor)
    x = BatchNorm(name=bn_name_base + '2a')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',
                  name=conv_name_base + '2b', use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2b')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c',
                  use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2c')(x, training=train_bn)

    x = KL.Add()([x, input_tensor])
    x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x)
    return x

def conv_block(input_tensor, kernel_size, filters, stage, block,
               strides=(2, 2), use_bias=True, train_bn=True):
    """conv_block is the block that has a conv layer at shortcut
    # Arguments
        input_tensor: input tensor
        kernel_size: default 3, the kernel size of middle conv layer at main path
        filters: list of integers, the nb_filters of 3 conv layer at main path
        stage: integer, current stage label, used for generating layer names
        block: 'a','b'..., current block label, used for generating layer names
        use_bias: Boolean. To use or not use a bias in conv layers.

        train_bn: Boolean. Train or freeze Batch Norm layers
    Note that from stage 3, the first conv layer at main path is with subsample=(2,2)
    And the shortcut should have subsample=(2,2) as well
"""
    '''
    一个1x1卷积 → 一个kxk卷积（kernel size） → 一个1x1卷积 → 卷积结果和input 1x1卷积后 加起来（像素相加）
    '''
    nb_filter1, nb_filter2, nb_filter3 = filters
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    x = KL.Conv2D(nb_filter1, (1, 1), strides=strides,
                  name=conv_name_base + '2a', use_bias=use_bias)(input_tensor)
    x = BatchNorm(name=bn_name_base + '2a')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',
                  name=conv_name_base + '2b', use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2b')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base +
                  '2c', use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2c')(x, training=train_bn)

    shortcut = KL.Conv2D(nb_filter3, (1, 1), strides=strides,
                         name=conv_name_base + '1', use_bias=use_bias)(input_tensor)
    shortcut = BatchNorm(name=bn_name_base + '1')(shortcut, training=train_bn)

    x = KL.Add()([x, shortcut])
    x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x)
    return x

在 resnet_graph 中，调用这两种卷积块，得到特征图。
首先：

def resnet_graph(input_image, architecture, stage5=False, train_bn=True):
    """Build a ResNet graph.

        architecture: Can be resnet50 or resnet101
        stage5: Boolean. If False, stage5 of the network is not created
        train_bn: Boolean. Train or freeze Batch Norm layers
"""
    assert architecture in ["resnet50", "resnet101"]

assert 是断言，如果输入的architecture不是resnet50或者resnet101中某一个的话，将报错。

然后调用上述两个卷积块进行特征提取。

[En]

Then the above two convolution blocks are called for feature extraction.

def resnet_graph(input_image, architecture, stage5=False, train_bn=True):
    ...

    x = KL.ZeroPadding2D((3, 3))(input_image)
    x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)
    x = BatchNorm(name='bn_conv1')(x, training=train_bn)
    x = KL.Activation('relu')(x)
    C1 = x = KL.MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)

    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), train_bn=train_bn)
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', train_bn=train_bn)
    C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', train_bn=train_bn)

    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', train_bn=train_bn)
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', train_bn=train_bn)
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', train_bn=train_bn)
    C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', train_bn=train_bn)

    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', train_bn=train_bn)
    block_count = {"resnet50": 5, "resnet101": 22}[architecture]
    for i in range(block_count):
        x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i), train_bn=train_bn)
    C4 = x

    if stage5:
        x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', train_bn=train_bn)
        x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', train_bn=train_bn)
        C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', train_bn=train_bn)
    else:
        C5 = None
    return [C1, C2, C3, C4, C5]

这部分代码也很简单，基本上是直截了当的，整理出这部分的网络结构如下：

[En]

This part of the code is also very simple, basically straightforward, sort out this part of the network structure is as follows:

3.2 Region Proposal Network (RPN)

这部分包括两个方法：

def rpn_graph(feature_map, anchors_per_location, anchor_stride)
def build_rpn_model(anchor_stride, anchors_per_location, depth)

在 build_rpn_graph 中调用 rpn_graph ，得到output然后返回一个model对象。（先在 **_graph 方法中写好网络结构，然后在 build_** 方法中调用它，返回一个model对象。这种写代码思路是CNN中常见的写法。后文不再提及。）

其中输入参数说明如下：

feature_map，shape = [batch, height, width, depth]，是上文 resnet 输出的结果。
anchors_per_location ：feature 中每个 pixel 产生的 anchors 的数量
anchor_stride ：产生anchors的间隔，一般是1（也就是每个pixel都产生anchors），或者2（间隔一个）

需要说明的是，之前 resnet 产生了五个不同尺寸的 feature ，而 RPN 的输入中没有体现这一点。

这是因为：在后期 Mask RCNN 调用这两个 layer 的时候，将 resnet 生成的 feature 放入 list 中，然后循环 list ，对每一层的 feature 单独输入 RPN。

rpn_class_logits，shape = [batch, H * W * anchors_per_location, 2] 这个是anchors 分类结果激活前的张量。
rpn_probs，shape = [batch, H * W * anchors_per_location, 2] 这是上面logits激活（softmax）之后的结果，表示 anchor 分类结果的得分。（或者说可能性，probs就是probabilities）shape中最后一维是”2″，代表前景（有对象）/背景的初步判断结果。
rpn_bbox，shape = [batch, H * W * anchors_per_location, 4]，最后一维的4具体是 (dy, dx, log(dh), log(dw))。这是anchors回归的delta。

具体代码如下：

def rpn_graph(feature_map, anchors_per_location, anchor_stride):
    """Builds the computation graph of Region Proposal Network.

    feature_map: backbone features [batch, height, width, depth]
    anchors_per_location: number of anchors per pixel in the feature map
    anchor_stride: Controls the density of anchors. Typically 1 (anchors for
                   every pixel in the feature map), or 2 (every other pixel).

    Returns:
        rpn_class_logits: [batch, H * W * anchors_per_location, 2] Anchor classifier logits (before softmax)
        rpn_probs: [batch, H * W * anchors_per_location, 2] Anchor classifier probabilities.

        rpn_bbox: [batch, H * W * anchors_per_location, (dy, dx, log(dh), log(dw))] Deltas to be
                  applied to anchors.

"""

    shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',
                       strides=anchor_stride,
                       name='rpn_conv_shared')(feature_map)

    x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',
                  activation='linear', name='rpn_class_raw')(shared)

    rpn_class_logits = KL.Lambda(
        lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)

    rpn_probs = KL.Activation(
        "softmax", name="rpn_class_xxx")(rpn_class_logits)

    x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",
                  activation='linear', name='rpn_bbox_pred')(shared)

    rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x)

    return [rpn_class_logits, rpn_probs, rpn_bbox]

选其中稍微别扭的解释：

rpn_class_logits = KL.Lambda(
        lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)

其中 x 是 feature_map 经过一个 3×3 卷积（padding）之后得到 share、再经过 1×1 卷积的结果，feature_map 的 shape=[batch, w, h, channel]，share 的 shape=[batch, w, h, channel=512]。

所以 x.shape = [batch, w, h, 2k]（k指每个pixel产生几个anchor，系数2是因为要划分前景和背景）。
x 是 rpn_class_logits 的前身，shape 需要变成 [batch, hwk, 2]，这个好理解。

这句话的含义就等价于：

def function(x):
    batch = x.shape[0]
    result = tf.reshape(x, [batch, -1, 2])

    return result

rpn_class_logits = function(x)

那么又有一个新问题：为什么不按照我这个写法，而是用 KL.Lambda 呢？
(这并不是因为它太简洁了。你好)

[En]

(it’s not because it’s so concise. Hello)

因为 keras 的数据流可以被 tensorflow 直接处理，但是 tensorflow 出来的东西，keras 就不认了……在这个例子中， tf.reshape(x, [batch, -1, 2]) 就是从 tensorflow 出来的数据流，没办法被 keras 直接处理。

一种简单的解决方案是： 引入Keras的Lamda函数将TensorFlow的操作转化为Keras的数据流。这样就可以将TensorFlow写好的函数输出直接转换为Keras的Module可以接收的类型。

（下文还会提到另一种解决方案：继承Layer，重写一个类）

结构图可以从代码中获得，如下所示：

[En]

The structure diagram can be obtained from the code as follows:

3.3 Proposal Layer

本节定义了两个函数和一个类：

[En]

This section defines two functions and a class:

def apply_box_deltas_graph(boxes, deltas)
def clip_boxes_graph(boxes, window)
class ProposalLayer(KE.Layer)

前两个方法是工具方法， apply_box_deltas_graph 输入 Bbox 坐标和 delta ，输出根据delta 精调后的 Bbox。 clip_boxes_graph 输入 bbox 和 window ，输出这两个区域的重合部分的坐标。

诶……那么这个 window 是什么呢?

在 Data Formatting 的部分中，有两个函数：

def compose_image_meta：对原始图像信息打包
def parse_image_meta(meta)：对原始图像信息解包

这两个代码非常容易理解，并直接发布在下面：

[En]

These two codes are very easy to understand and are posted directly below:

def compose_image_meta(image_id, original_image_shape, image_shape,
                       window, scale, active_class_ids):
    """Takes attributes of an image and puts them in one 1D array.

    image_id: An int ID of the image. Useful for debugging.

    original_image_shape: [H, W, C] before resizing or padding.

    image_shape: [H, W, C] after resizing and padding
    window: (y1, x1, y2, x2) in pixels. The area of the image where the real
            image is (excluding the padding)
    scale: The scaling factor applied to the original image (float32)
    active_class_ids: List of class_ids available in the dataset from which
        the image came. Useful if training on images from multiple datasets
        where not all classes are present in all datasets.

"""
    meta = np.array(
        [image_id] +
        list(original_image_shape) +
        list(image_shape) +
        list(window) +
        [scale] +
        list(active_class_ids)
    )
    return meta

def parse_image_meta(meta):
    """Parses an array that contains image attributes to its components.

    See compose_image_meta() for more details.

    meta: [batch, meta length] where meta length depends on NUM_CLASSES

    Returns a dict of the parsed values.

"""
    image_id = meta[:, 0]
    original_image_shape = meta[:, 1:4]
    image_shape = meta[:, 4:7]
    window = meta[:, 7:11]
    scale = meta[:, 11]
    active_class_ids = meta[:, 12:]
    return {
        "image_id": image_id.astype(np.int32),
        "original_image_shape": original_image_shape.astype(np.int32),
        "image_shape": image_shape.astype(np.int32),
        "window": window.astype(np.int32),
        "scale": scale.astype(np.float32),
        "active_class_ids": active_class_ids.astype(np.int32),
    }

从这里就可以知道， window 是 原始图片 resize + padding 后，处理后的新图像中具有原始图像的部分。

但！是！在 ProposalLayer 的 call 方法调用这个函数的时候，是在归一化的坐标系里， 并没有从image_meta中读数据，而是：

window = np.array([0, 0, 1, 1], dtype=np.float32)

具体的 apply_box_deltas_graph 和 clip_boxes_graph 代码也贴在下面。（虽然跳过这个不会对理解mask rcnn造成影响）

def apply_box_deltas_graph(boxes, deltas):
    """Applies the given deltas to the given boxes.

    boxes: [N, (y1, x1, y2, x2)] boxes to update
    deltas: [N, (dy, dx, log(dh), log(dw))] refinements to apply
"""

    height = boxes[:, 2] - boxes[:, 0]
    width = boxes[:, 3] - boxes[:, 1]
    center_y = boxes[:, 0] + 0.5 * height
    center_x = boxes[:, 1] + 0.5 * width

    center_y += deltas[:, 0] * height
    center_x += deltas[:, 1] * width
    height *= tf.exp(deltas[:, 2])
    width *= tf.exp(deltas[:, 3])

    y1 = center_y - 0.5 * height
    x1 = center_x - 0.5 * width
    y2 = y1 + height
    x2 = x1 + width
    result = tf.stack([y1, x1, y2, x2], axis=1, name="apply_box_deltas_out")
    return result

def clip_boxes_graph(boxes, window):
"""
    保留交集的函数
    boxes: [N, (y1, x1, y2, x2)]
    window: [4] in the form y1, x1, y2, x2
"""

    wy1, wx1, wy2, wx2 = tf.split(window, 4)
    y1, x1, y2, x2 = tf.split(boxes, 4, axis=1)

    y1 = tf.maximum(tf.minimum(y1, wy2), wy1)
    x1 = tf.maximum(tf.minimum(x1, wx2), wx1)
    y2 = tf.maximum(tf.minimum(y2, wy2), wy1)
    x2 = tf.maximum(tf.minimum(x2, wx2), wx1)
    clipped = tf.concat([y1, x1, y2, x2], axis=1, name="clipped_boxes")
    clipped.set_shape((clipped.shape[0], 4))
    return clipped

然后是 Proposal Layer 的重点： ProposalLayer 类，它继承 keras.engine.Layer ，包括三个方法：

def init
def call
def compute_output_shape

为什么要写一个类继承Layer，而不是直接 def 一个方法解决呢？跟上文提到使用 KL.Lambda 类似，TensorFlow的函数可以操作Keras的Tensor，但是它返回的TensorFlow的Tensor不能被Keras继续处理，因此我们需要：

建立新的Keras层进行转换，将TensorFlow的Tensor作为Keras层的__init__函数进行构建层，然后在__call__方法中使用TensorFlow的函数进行细粒度的数据处理，最后返回Keras层对象。

然后根据 Keras 的 API 文档，实现自己的类需要重写：

build(input_shape) ：这是定义权重的方法，可训练的权应该在这里被加入列表
call(x) ：这是定义层功能的方法，除非你希望你写的层支持masking，否则你只需要关心call的第一个参数：输入张量
compute_output_shape(input_shape) ：如果你的层修改了输入数据的shape，你应该在这里指定shape变化的方法，这个函数使得Keras可以做自动shape推断

诶那为什么只实现了 call 和 compute_output_shape(input_shape)，而不见 build 的踪影呢？

我们点进keras的源代码：

    def build(self, input_shape):
        """Creates the layer weights.

        Must be implemented on all layers that have weights.

        # Arguments
            input_shape: Keras tensor (future input to layer)
                or list/tuple of Keras tensors to reference
                for weight shape computations.

"""
        self.built = True

注意，是所有有权重的层才要重写这个方法。那咱们 ProposalLayer 有权重吗？

没有！

所以不需要重写。

先看 __init__ 方法：

def __init__(self, proposal_count, nms_threshold, config=None, **kwargs):
        super(ProposalLayer, self).__init__(**kwargs)
        self.config = config
        self.proposal_count = proposal_count
        self.nms_threshold = nms_threshold

其中， proposal_count 是一个整数，用于指定生成proposal数目，不足时会生成坐标为[0,0,0,0]的空值进行补全。这个数由config文件中 POST_NMS_ROIS_TRAINING 或者 POST_NMS_ROIS_INFERENCE 决定。

nms_threshold 是非极大值抑制的阈值，由 config.RPN_NMS_THRESHOLD 决定，如果调大这个参数，会产生更多的proposals。

接下来重点看 call 方法的实现。其中输入参数是：

rpn_probs ：[batch, num_anchors, 2]，2具体是(bg prob, fg prob)。其中bg指 background，fg指 foreground
rpn_bbox ：[batch, num_anchors, 4]， 4具体是(dy, dx, log(dh), log(dw))
anchors ：[batch, num_anchors, 4]， 4具体是(y1, x1, y2, x2)。坐标是在 normalized 的坐标系中的值。

在 normalized 的坐标系中的 proposals ，shape=[batch, rois, 4]，4具体是(y1, x1, y2, x2)

然后发布带注释的源代码：

[En]

Then post the annotated source code:

    def call(self, inputs):

        scores = inputs[0][:, :, 1]

        deltas = inputs[1]
        deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4])

        anchors = inputs[2]

        pre_nms_limit = tf.minimum(self.config.PRE_NMS_LIMIT, tf.shape(anchors)[1])

        ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
                         name="top_anchors").indices

        scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
        deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
        pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),
                                    self.config.IMAGES_PER_GPU,
                                    names=["pre_nms_anchors"])
        '''
        关于batch_slice: 这个函数将只支持batch为1的函数进行了扩展（实际就是不能有batch维度的函数），
        tf.gather函数只能进行一维数组的切片，而scares为2维[batch, num_rois]，
        相对的ix也是二维[batch, top_k]，所以我们需要将两者切片应用函数后将结果拼接。
        '''

        boxes = utils.batch_slice([pre_nms_anchors, deltas],
                                  lambda x, y: apply_box_deltas_graph(x, y),
                                  self.config.IMAGES_PER_GPU,
                                  names=["refined_anchors"])

        '''
        我们的锚框坐标实际上是位于一个归一化了的图上, 上一步的修正进行之后不再能够保证这一点，
        所以我们需要切除锚框越界的的部分（即只保留锚框和[0,0,1,1]画布的交集），通过调用clip_boxes_graph
        '''
        window = np.array([0, 0, 1, 1], dtype=np.float32)
        boxes = utils.batch_slice(boxes,
                                  lambda x: clip_boxes_graph(x, window),
                                  self.config.IMAGES_PER_GPU,
                                  names=["refined_anchors_clipped"])

        def nms(boxes, scores):
            '''
            非极大值抑制子函数
            :param boxes: [top_k, (y1, x1, y2, x2)]
            :param scores: [top_k]
            :return:
            '''
            indices = tf.image.non_max_suppression(
                boxes, scores, self.proposal_count,
                self.nms_threshold, name="rpn_non_max_suppression")
            proposals = tf.gather(boxes, indices)

            padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)

            proposals = tf.pad(proposals, [(0, padding), (0, 0)])
            return proposals
        proposals = utils.batch_slice([boxes, scores], nms,
                                      self.config.IMAGES_PER_GPU)
        return proposals

另外的一些说明：

1、最后 maskrcnn 连接所有的层的时候，对 ProposalLayer 是 input=[rpn_class, rpn_bbox, anchors] ，其中 rpn_class 和 rpn_bbox 是 RPN的输出，而 anchors 是生成之后没有经过任何处理的锚框。

2、这个 PRE_NMS_LIMIT 是在 tf.nn.top_k 之后、非极大值抑制之前保留的 rois的数量。在config文件中设置。

3、看一句代码：

ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
                         name="top_anchors").indices

tf.nn.top_k 的作用是，选出 scores 中最大的 pre_nms_limit 个数，会返回两个东西：values 和 indices。前者是数的具体值，后者是最大值的坐标。在这个例子中，输入参数 scores 的shape为 [Batch, num_rois, 1] ，返回的坐标能够索引到最大值的位置。
后面一段代码：

scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),
                                    self.config.IMAGES_PER_GPU,
                                    names=["pre_nms_anchors"])

调用了 utils.batch_slice ，这个工具方法的目的是，有很多函数不支持多个batch操作，所以就在这个方法里循环一下，单独输进去。这段源码如下：


def batch_slice(inputs, graph_fn, batch_size, names=None):
    """Splits inputs into slices and feeds each slice to a copy of the given
    computation graph and then combines the results. It allows you to run a
    graph on a batch of inputs even if the graph is written to support one
    instance only.

    inputs: list of tensors. All must have the same first dimension length
    graph_fn: A function that returns a TF tensor that's part of a graph.

    batch_size: number of slices to divide the data into.

    names: If provided, assigns names to the resulting tensors.

"""
    if not isinstance(inputs, list):

        inputs = [inputs]

    outputs = []

    for i in range(batch_size):
        inputs_slice = [x[i] for x in inputs]
        output_slice = graph_fn(*inputs_slice)
        if not isinstance(output_slice, (tuple, list)):
            output_slice = [output_slice]
        outputs.append(output_slice)

    outputs = list(zip(*outputs))

    if names is None:
        names = [None] * len(outputs)

    result = [tf.stack(o, axis=0, name=n)
              for o, n in zip(outputs, names)]
    if len(result) == 1:
        result = result[0]

    return result

其实就是利用索引值，获得前 pre_nms_limit 个anchors 对应的 scores 、deltas、 anchors(记为pre_nms_anchors)

三点说明结束，最后瞄一眼 compute_output_shape ，因为经过这个layer之后，tensor的形状发生了变化，所以要在这个函数中写清楚 shape 的形状，方便keras自动推算。
贴代码如下：

def compute_output_shape(self, input_shape):
        return (None, self.proposal_count, 4)

其中第0维，None代表的是batch，表示任意或不确定长度。最后 proposals 的形状是 [batch, rois, (y1, x1, y2, x2)]。rois的具体值在config中有设置，train模式下是2000，inference模式下是1000。

mask rcnn 超详细代码解读（二）直通车

Original: https://blog.csdn.net/Cleo_Gao/article/details/115122223
Author: Cleo_Gao
Title: mask rcnn 超详细代码解读（一）

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/511660/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

手把手搭建一个【卷积神经网络】

前言本文介绍卷积神经网络的入门案例，通过搭建和训练一个模型，来对10种常见的物体进行识别分类；使用到CIFAR10数据集，它包含10 类，即：”飞机”，&…

人工智能 2023年7月13日
0071
《动手学深度学习》参考答案(第二版)-第三章

最近在学习《动手学深度学习》，结合百度和课后的大家的讨论(侵删)，整理出这一份可能并不完全正确的参考答案(菜鸡的做题记录)，因为个人水平有限，有错误的地方欢迎在公众号联系我，后…

人工智能 2023年6月16日
0076
独家消息：阿里云悄然推出RPA云电脑，已与多家RPA厂商开放合作

独家消息：阿里云悄然推出RPA云电脑，已与多家RPA厂商开放合作 RPA云电脑，让RPA开箱即用算力无限？文/王吉伟这几天，王吉伟频道通过业内人士获得独家消息，阿里云近期推出了…

人工智能 2023年6月4日
0077
记一次调试YOLOv5+DeepSort车辆跟踪项目的经过

摘要：学习别人的开源项目是日常的一项必备技能，本文通过一个车辆跟踪（YOLOv5+DeepSort）的例子介绍如何配置和调试GitHub上的开源代码。以第一人称的视角给出本人调试代…

人工智能 2023年7月27日
0063
【学习记录】基于知识图谱的虚假新闻检测

菜鸟自救学习记录。 “基于知识图谱的虚假新闻检测”，要解决的关键词大致包含了”知识图谱”、”虚假新闻检测”，…

人工智能 2023年6月1日
0094
[总结] VLAD & NetVLAD & NeXtVLAD

参考：https://www.jianshu.com/p/7d48bff4d1c3 NeXtVLAD 是一个特征聚合的网络，可以在向量空间中提取全局描述子特征，减少参数，提升性…

人工智能 2023年5月31日
0061
（12）目标检测_SSD主干网络基于pytorch搭建代码

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。 Original: https://blo…

人工智能 2023年7月23日
0058
pycharm 软件详细使用教程，新手必看篇

pycharm是一款 python IDE工具，具有跨平台性，功能强大，操作方便，下面我就给使用这款软件的初学者做一个简单的使用教程，希望能给你们带来一些帮助！学习pycharm…

人工智能 2023年7月3日
0092
【机器学习入门】(13) 实战：心脏病预测，补充: ROC曲线、精确率–召回率曲线，附python完整代码和数据集

各位同学好，经过前几章python机器学习的探索，想必大家对各种预测方法也有了一定的认识。今天我们来进行一次实战，心脏病病例预测，本文对一些基础方法就不进行详细解释，有疑问的同学可…

人工智能 2023年6月15日
0073
要点初见：开源AI绘画工具Stable Diffusion代码分析（文本转图像）、论文介绍（下）

前文链接如下：要点初见：开源AI绘画工具Stable Diffusion代码分析（文本转图像）、论文介绍（上）_BingLiHanShuang的博客-CSDN博客二、Stabl…

人工智能 2023年7月27日
0060
微信支付APIv3

文章目录微信支付 * 之前我的密钥啥的都是放到配置文件里面以后可以再写一个文件基础支付APIv3介绍获取验签和HttpClient – APIv3证书与密钥使用说…

人工智能 2023年5月30日
00105
《基于条件Wasserstein GAN的表格式数据过采样的不平衡学习》代码运行

标题：Conditional Wasserstein GAN-based Oversampling of Tabular Data for Imbalanced Learning论…

人工智能 2023年6月19日
0099
IDL基础学习资料+监督分类

学习资料 IDL和ENVI的详细文档：查询具体的函数和接口 Documentation Center https://www.l3harrisgeospatial.com/docs…

人工智能 2023年7月3日
00110
【零基础玩转BLDC系列】基于反电动势过零检测法的无刷直流电机控制原理

无刷直流电动机基本转动原理请参考《基于HALL传感器的无刷直流电机控制原理》，基本原理及基础知识本篇不再赘述。目录反电势过零检测法的原理反电势过零检测实现方法位置传感器的存…

人工智能 2023年6月23日
00112
ubuntu简单快速安装搭建pytorch tensorflow环境RTX3060

显卡RTX-3060Ubuntu 18.04pytorch 1.9cuda 11.1时间2021-8 安装步骤安装ubuntu18.04 手动安装nvidia驱动，就在在软件更新…

人工智能 2023年5月26日
00160
YOLOv5 网络组件(含common.py文件注解)（六）

课程链接目录 Conv Focus BCSPn SPP Concat common.py代码解析 Conv Conv是yolov5中的最核心模块，代码如下：为same卷积或sa…

人工智能 2023年7月9日
0081

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

mask rcnn 超详细代码解读（一）

文章目录

3.1 Resnet Graph

3.2 Region Proposal Network (RPN)

3.3 Proposal Layer

大家都在看