mask rcnn 超详细代码解读(一)

mask r-cnn 代码解读(一)

文章目录

本系列将对 mask r-cnn 的代码做非常详细的讲解。
默认教程使用者已经对mask r-cnn的结构基本了解,因此不对原论文做解析

、最好是读者手头有完整的mrcnn代码(没有也没事,会贴),对照着代码和博客来理解。

本文将通过对代码的解析,重新梳理网络结构中的歧义。

[En]

This article will sort out the ambiguities in the network structure again by parsing the code.

1 代码架构

如下图所示,mrcnn 中包含四个主要的python文件:

  • config,py :代码中涉及的超参数放在此文件中
  • model.py :深度网络的build代码
  • utils.py :涉及一些工具方法
  • visualize.py :预测(detect)得到结果后,将预测的Bbox和mask结合原图重新绘制
    另外还有 parallel_model.py ,是为了方便模型在多GPU上训练。
    mask rcnn 超详细代码解读(一)
    本文将主要根据 model.py 的内容进行,此过程中调用了别的代码时,也会对被调用的解析。

; 2 model.py 的结构

model.py 特别长,可根据内容划分成如下块:

块作用Utility Functions对日志记录等做一些规定Resnet Graph基础网络,提取特征(feature)Region Proposal Network (RPN)在feature上生成anchors,并给出粗略的前景/背景判断结果、粗略的Bbox坐标Proposal Layer输入rpn网络得到的Bbox,过滤背景得到proposalsROIAlign Layer输入基础网络提取到的特征和 rois 的坐标,剪裁得到特征上坐标对应的部分。并reshape到相同大小Detection Target Layertrain过程中,根据ground truth 和 proposal得到 rois(region of interest)Detection Layerinference过程中,根据proposals和Bbox deltas得到最后的Bbox结果Feature Pyramid Network Heads输入pool后的特征,预测物体class和maskMaskRCNN Class将上述结构串起来,最后返回一个model对象Loss Functions损失函数相关Data Generator管理训练数据等Data Formatting对输入图片的一些处理Miscellenous Graph Functions其他一些函数

接下来将按照 Resnet Graph → RPN → Proposal Layer → ROIAlign Layer → Detection Target Layer → Feature Pyramid Network Heads → MaskRCNN Class 的顺序讲解 train 部分的代码。

然后是 inference 中不同的代码部分。

然后是丢失功能和数据优先操作。

[En]

Then there is the loss function and data first off operation.

整个解析中,会:

  • 贴上增加注释的代码
  • 将代码转换为易于理解的流程图
    [En]

    convert the code into an easy-to-understand flowchart*

  • 对代码中不易理解/别扭(个人认为)的地方额外解释

3 train过程代码解析

3.1 Resnet Graph

这部分代码比较简单,网络结构也很清晰。有三种方法:

[En]

This part of the code is relatively simple, and the network structure is very clear. There are three methods:

  • def identity_block
  • def conv_block
  • def resnet_graph

其中 identity_block 和 conv_block 是定义的两个卷积块,也就是两种不同的卷积方式:

mask rcnn 超详细代码解读(一)
mask rcnn 超详细代码解读(一)
具体的代码也贴在下方:
def identity_block(input_tensor, kernel_size, filters, stage, block,
                   use_bias=True, train_bn=True):
    """The identity_block is the block that has no conv layer at shortcut
    # Arguments
        input_tensor: input tensor
        kernel_size: default 3, the kernel size of middle conv layer at main path
        filters: list of integers, the nb_filters of 3 conv layer at main path
        stage: integer, current stage label, used for generating layer names
        block: 'a','b'..., current block label, used for generating layer names
        use_bias: Boolean. To use or not use a bias in conv layers.

        train_bn: Boolean. Train or freeze Batch Norm layers
"""
    '''
    一个1x1卷积 → 一个kxk卷积(kernel size) → 一个1x1卷积 → 卷积结果和input加起来(像素相加)
    '''
    nb_filter1, nb_filter2, nb_filter3 = filters
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    x = KL.Conv2D(nb_filter1, (1, 1), name=conv_name_base + '2a',
                  use_bias=use_bias)(input_tensor)
    x = BatchNorm(name=bn_name_base + '2a')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',
                  name=conv_name_base + '2b', use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2b')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c',
                  use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2c')(x, training=train_bn)

    x = KL.Add()([x, input_tensor])
    x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x)
    return x

def conv_block(input_tensor, kernel_size, filters, stage, block,
               strides=(2, 2), use_bias=True, train_bn=True):
    """conv_block is the block that has a conv layer at shortcut
    # Arguments
        input_tensor: input tensor
        kernel_size: default 3, the kernel size of middle conv layer at main path
        filters: list of integers, the nb_filters of 3 conv layer at main path
        stage: integer, current stage label, used for generating layer names
        block: 'a','b'..., current block label, used for generating layer names
        use_bias: Boolean. To use or not use a bias in conv layers.

        train_bn: Boolean. Train or freeze Batch Norm layers
    Note that from stage 3, the first conv layer at main path is with subsample=(2,2)
    And the shortcut should have subsample=(2,2) as well
"""
    '''
    一个1x1卷积 → 一个kxk卷积(kernel size) → 一个1x1卷积 → 卷积结果和input 1x1卷积后 加起来(像素相加)
    '''
    nb_filter1, nb_filter2, nb_filter3 = filters
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    x = KL.Conv2D(nb_filter1, (1, 1), strides=strides,
                  name=conv_name_base + '2a', use_bias=use_bias)(input_tensor)
    x = BatchNorm(name=bn_name_base + '2a')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',
                  name=conv_name_base + '2b', use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2b')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base +
                  '2c', use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2c')(x, training=train_bn)

    shortcut = KL.Conv2D(nb_filter3, (1, 1), strides=strides,
                         name=conv_name_base + '1', use_bias=use_bias)(input_tensor)
    shortcut = BatchNorm(name=bn_name_base + '1')(shortcut, training=train_bn)

    x = KL.Add()([x, shortcut])
    x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x)
    return x

resnet_graph 中,调用这两种卷积块,得到特征图。
首先:

def resnet_graph(input_image, architecture, stage5=False, train_bn=True):
    """Build a ResNet graph.

        architecture: Can be resnet50 or resnet101
        stage5: Boolean. If False, stage5 of the network is not created
        train_bn: Boolean. Train or freeze Batch Norm layers
"""
    assert architecture in ["resnet50", "resnet101"]

assert 是断言,如果输入的architecture不是resnet50或者resnet101中某一个的话,将报错。

然后调用上述两个卷积块进行特征提取。

[En]

Then the above two convolution blocks are called for feature extraction.

def resnet_graph(input_image, architecture, stage5=False, train_bn=True):
    ...

    x = KL.ZeroPadding2D((3, 3))(input_image)
    x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)
    x = BatchNorm(name='bn_conv1')(x, training=train_bn)
    x = KL.Activation('relu')(x)
    C1 = x = KL.MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)

    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), train_bn=train_bn)
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', train_bn=train_bn)
    C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', train_bn=train_bn)

    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', train_bn=train_bn)
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', train_bn=train_bn)
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', train_bn=train_bn)
    C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', train_bn=train_bn)

    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', train_bn=train_bn)
    block_count = {"resnet50": 5, "resnet101": 22}[architecture]
    for i in range(block_count):
        x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i), train_bn=train_bn)
    C4 = x

    if stage5:
        x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', train_bn=train_bn)
        x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', train_bn=train_bn)
        C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', train_bn=train_bn)
    else:
        C5 = None
    return [C1, C2, C3, C4, C5]

这部分代码也很简单,基本上是直截了当的,整理出这部分的网络结构如下:

[En]

This part of the code is also very simple, basically straightforward, sort out this part of the network structure is as follows:

mask rcnn 超详细代码解读(一)

3.2 Region Proposal Network (RPN)

这部分包括两个方法:

  • def rpn_graph(feature_map, anchors_per_location, anchor_stride)
  • def build_rpn_model(anchor_stride, anchors_per_location, depth)

build_rpn_graph 中调用 rpn_graph ,得到output然后返回一个model对象。(先在 **_graph 方法中写好网络结构,然后在 build_** 方法中调用它,返回一个model对象。这种写代码思路是CNN中常见的写法。后文不再提及。 )

其中输入参数说明如下:

  • feature_map,shape = [batch, height, width, depth],是上文 resnet 输出的结果。
  • anchors_per_location :feature 中每个 pixel 产生的 anchors 的数量
  • anchor_stride :产生anchors的间隔,一般是1(也就是每个pixel都产生anchors),或者2(间隔一个)

需要说明的是,之前 resnet 产生了五个不同尺寸的 feature ,而 RPN 的输入中没有体现这一点。

这是因为:在后期 Mask RCNN 调用这两个 layer 的时候,将 resnet 生成的 feature 放入 list 中,然后循环 list ,对每一层的 feature 单独输入 RPN。

返回:

  • rpn_class_logits,shape = [batch, H * W * anchors_per_location, 2] 这个是anchors 分类结果激活前的张量。
  • rpn_probs,shape = [batch, H * W * anchors_per_location, 2] 这是上面logits激活(softmax)之后的结果,表示 anchor 分类结果的得分。(或者说可能性,probs就是probabilities)shape中最后一维是”2″,代表 前景(有对象)/背景 的初步判断结果。
  • rpn_bbox,shape = [batch, H * W * anchors_per_location, 4],最后一维的4具体是 (dy, dx, log(dh), log(dw))。这是anchors回归的delta。

具体代码如下:

def rpn_graph(feature_map, anchors_per_location, anchor_stride):
    """Builds the computation graph of Region Proposal Network.

    feature_map: backbone features [batch, height, width, depth]
    anchors_per_location: number of anchors per pixel in the feature map
    anchor_stride: Controls the density of anchors. Typically 1 (anchors for
                   every pixel in the feature map), or 2 (every other pixel).

    Returns:
        rpn_class_logits: [batch, H * W * anchors_per_location, 2] Anchor classifier logits (before softmax)
        rpn_probs: [batch, H * W * anchors_per_location, 2] Anchor classifier probabilities.

        rpn_bbox: [batch, H * W * anchors_per_location, (dy, dx, log(dh), log(dw))] Deltas to be
                  applied to anchors.

"""

    shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',
                       strides=anchor_stride,
                       name='rpn_conv_shared')(feature_map)

    x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',
                  activation='linear', name='rpn_class_raw')(shared)

    rpn_class_logits = KL.Lambda(
        lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)

    rpn_probs = KL.Activation(
        "softmax", name="rpn_class_xxx")(rpn_class_logits)

    x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",
                  activation='linear', name='rpn_bbox_pred')(shared)

    rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x)

    return [rpn_class_logits, rpn_probs, rpn_bbox]

选其中稍微别扭的解释:

rpn_class_logits = KL.Lambda(
        lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)

其中 xfeature_map 经过一个 3×3 卷积(padding)之后得到 share、再经过 1×1 卷积的结果,feature_map 的 shape=[batch, w, h, channel],share 的 shape=[batch, w, h, channel=512]。

所以 x.shape = [batch, w, h, 2k](k指每个pixel产生几个anchor,系数2是因为要划分前景和背景)。
xrpn_class_logits 的前身,shape 需要变成 [batch, hwk, 2],这个好理解。

这句话的含义就等价于:

def function(x):
    batch = x.shape[0]
    result = tf.reshape(x, [batch, -1, 2])

    return result

rpn_class_logits = function(x)

那么又有一个新问题:为什么不按照我这个写法,而是用 KL.Lambda 呢?
(这并不是因为它太简洁了。你好)

[En]

(it’s not because it’s so concise. Hello)

因为 keras 的数据流可以被 tensorflow 直接处理,但是 tensorflow 出来的东西,keras 就不认了……在这个例子中, tf.reshape(x, [batch, -1, 2]) 就是从 tensorflow 出来的数据流,没办法被 keras 直接处理。

一种简单的解决方案是: 引入Keras的Lamda函数将TensorFlow的操作转化为Keras的数据流。这样就可以将TensorFlow写好的函数输出直接转换为Keras的Module可以接收的类型。

(下文还会提到另一种解决方案:继承Layer,重写一个类)

结构图可以从代码中获得,如下所示:

[En]

The structure diagram can be obtained from the code as follows:

mask rcnn 超详细代码解读(一)

3.3 Proposal Layer

本节定义了两个函数和一个类:

[En]

This section defines two functions and a class:

  • def apply_box_deltas_graph(boxes, deltas)
  • def clip_boxes_graph(boxes, window)
  • class ProposalLayer(KE.Layer)

前两个方法是工具方法, apply_box_deltas_graph 输入 Bbox 坐标和 delta ,输出根据delta 精调后的 Bbox。 clip_boxes_graph 输入 bbox 和 window ,输出这两个区域的重合部分的坐标。

诶……那么这个 window 是什么呢?

Data Formatting 的部分中,有两个函数:

  • def compose_image_meta:对原始图像信息打包
  • def parse_image_meta(meta):对原始图像信息解包

这两个代码非常容易理解,并直接发布在下面:

[En]

These two codes are very easy to understand and are posted directly below:

def compose_image_meta(image_id, original_image_shape, image_shape,
                       window, scale, active_class_ids):
    """Takes attributes of an image and puts them in one 1D array.

    image_id: An int ID of the image. Useful for debugging.

    original_image_shape: [H, W, C] before resizing or padding.

    image_shape: [H, W, C] after resizing and padding
    window: (y1, x1, y2, x2) in pixels. The area of the image where the real
            image is (excluding the padding)
    scale: The scaling factor applied to the original image (float32)
    active_class_ids: List of class_ids available in the dataset from which
        the image came. Useful if training on images from multiple datasets
        where not all classes are present in all datasets.

"""
    meta = np.array(
        [image_id] +
        list(original_image_shape) +
        list(image_shape) +
        list(window) +
        [scale] +
        list(active_class_ids)
    )
    return meta

def parse_image_meta(meta):
    """Parses an array that contains image attributes to its components.

    See compose_image_meta() for more details.

    meta: [batch, meta length] where meta length depends on NUM_CLASSES

    Returns a dict of the parsed values.

"""
    image_id = meta[:, 0]
    original_image_shape = meta[:, 1:4]
    image_shape = meta[:, 4:7]
    window = meta[:, 7:11]
    scale = meta[:, 11]
    active_class_ids = meta[:, 12:]
    return {
        "image_id": image_id.astype(np.int32),
        "original_image_shape": original_image_shape.astype(np.int32),
        "image_shape": image_shape.astype(np.int32),
        "window": window.astype(np.int32),
        "scale": scale.astype(np.float32),
        "active_class_ids": active_class_ids.astype(np.int32),
    }

从这里就可以知道, window原始图片 resize + padding 后,处理后的新图像中具有原始图像的部分

但!是!在 ProposalLayercall 方法调用这个函数的时候,是在归一化的坐标系里, 并没有从image_meta中读数据,而是:

window = np.array([0, 0, 1, 1], dtype=np.float32)

具体的 apply_box_deltas_graph 和 clip_boxes_graph 代码也贴在下面。(虽然跳过这个不会对理解mask rcnn造成影响)

def apply_box_deltas_graph(boxes, deltas):
    """Applies the given deltas to the given boxes.

    boxes: [N, (y1, x1, y2, x2)] boxes to update
    deltas: [N, (dy, dx, log(dh), log(dw))] refinements to apply
"""

    height = boxes[:, 2] - boxes[:, 0]
    width = boxes[:, 3] - boxes[:, 1]
    center_y = boxes[:, 0] + 0.5 * height
    center_x = boxes[:, 1] + 0.5 * width

    center_y += deltas[:, 0] * height
    center_x += deltas[:, 1] * width
    height *= tf.exp(deltas[:, 2])
    width *= tf.exp(deltas[:, 3])

    y1 = center_y - 0.5 * height
    x1 = center_x - 0.5 * width
    y2 = y1 + height
    x2 = x1 + width
    result = tf.stack([y1, x1, y2, x2], axis=1, name="apply_box_deltas_out")
    return result

def clip_boxes_graph(boxes, window):
"""
    保留交集的函数
    boxes: [N, (y1, x1, y2, x2)]
    window: [4] in the form y1, x1, y2, x2
"""

    wy1, wx1, wy2, wx2 = tf.split(window, 4)
    y1, x1, y2, x2 = tf.split(boxes, 4, axis=1)

    y1 = tf.maximum(tf.minimum(y1, wy2), wy1)
    x1 = tf.maximum(tf.minimum(x1, wx2), wx1)
    y2 = tf.maximum(tf.minimum(y2, wy2), wy1)
    x2 = tf.maximum(tf.minimum(x2, wx2), wx1)
    clipped = tf.concat([y1, x1, y2, x2], axis=1, name="clipped_boxes")
    clipped.set_shape((clipped.shape[0], 4))
    return clipped

然后是 Proposal Layer 的重点: ProposalLayer 类,它继承 keras.engine.Layer ,包括三个方法:

  • def init
  • def call
  • def compute_output_shape

为什么要写一个类继承Layer,而不是直接 def 一个方法解决呢?跟上文提到使用 KL.Lambda 类似,TensorFlow的函数可以操作Keras的Tensor,但是它返回的TensorFlow的Tensor不能被Keras继续处理,因此我们需要:

建立新的Keras层进行转换,将TensorFlow的Tensor作为Keras层的__init__函数进行构建层,然后在__call__方法中使用TensorFlow的函数进行细粒度的数据处理,最后返回Keras层对象。

然后根据 Keras 的 API 文档,实现自己的类需要重写:

  • build(input_shape) :这是定义权重的方法,可训练的权应该在这里被加入列表
  • call(x) :这是定义层功能的方法,除非你希望你写的层支持masking,否则你只需要关心call的第一个参数:输入张量
  • compute_output_shape(input_shape) :如果你的层修改了输入数据的shape,你应该在这里指定shape变化的方法,这个函数使得Keras可以做自动shape推断

诶那为什么只实现了 call 和 compute_output_shape(input_shape),而不见 build 的踪影呢?

我们点进keras的源代码:

    def build(self, input_shape):
        """Creates the layer weights.

        Must be implemented on all layers that have weights.

        # Arguments
            input_shape: Keras tensor (future input to layer)
                or list/tuple of Keras tensors to reference
                for weight shape computations.

"""
        self.built = True

注意,是所有有权重的层才要重写这个方法。那咱们 ProposalLayer 有权重吗?

没有!

所以不需要重写。

先看 __init__ 方法:

def __init__(self, proposal_count, nms_threshold, config=None, **kwargs):
        super(ProposalLayer, self).__init__(**kwargs)
        self.config = config
        self.proposal_count = proposal_count
        self.nms_threshold = nms_threshold

其中, proposal_count 是一个整数,用于指定生成proposal数目,不足时会生成坐标为[0,0,0,0]的空值进行补全。这个数由config文件中 POST_NMS_ROIS_TRAINING 或者 POST_NMS_ROIS_INFERENCE 决定。

nms_threshold 是非极大值抑制的阈值,由 config.RPN_NMS_THRESHOLD 决定,如果调大这个参数,会产生更多的proposals。

接下来重点看 call 方法的实现。其中输入参数是:

  • rpn_probs :[batch, num_anchors, 2],2具体是(bg prob, fg prob)。其中bg指 background,fg指 foreground
  • rpn_bbox :[batch, num_anchors, 4], 4具体是(dy, dx, log(dh), log(dw))
  • anchors :[batch, num_anchors, 4], 4具体是(y1, x1, y2, x2)。坐标是在 normalized 的坐标系中的值。

返回:

  • 在 normalized 的坐标系中的 proposals ,shape=[batch, rois, 4],4具体是(y1, x1, y2, x2)

然后发布带注释的源代码:

[En]

Then post the annotated source code:

    def call(self, inputs):

        scores = inputs[0][:, :, 1]

        deltas = inputs[1]
        deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4])

        anchors = inputs[2]

        pre_nms_limit = tf.minimum(self.config.PRE_NMS_LIMIT, tf.shape(anchors)[1])

        ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
                         name="top_anchors").indices

        scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
        deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
        pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),
                                    self.config.IMAGES_PER_GPU,
                                    names=["pre_nms_anchors"])
        '''
        关于batch_slice: 这个函数将只支持batch为1的函数进行了扩展(实际就是不能有batch维度的函数),
        tf.gather函数只能进行一维数组的切片,而scares为2维[batch, num_rois],
        相对的ix也是二维[batch, top_k],所以我们需要将两者切片应用函数后将结果拼接。
        '''

        boxes = utils.batch_slice([pre_nms_anchors, deltas],
                                  lambda x, y: apply_box_deltas_graph(x, y),
                                  self.config.IMAGES_PER_GPU,
                                  names=["refined_anchors"])

        '''
        我们的锚框坐标实际上是位于一个归一化了的图上, 上一步的修正进行之后不再能够保证这一点,
        所以我们需要切除锚框越界的的部分(即只保留锚框和[0,0,1,1]画布的交集),通过调用clip_boxes_graph
        '''
        window = np.array([0, 0, 1, 1], dtype=np.float32)
        boxes = utils.batch_slice(boxes,
                                  lambda x: clip_boxes_graph(x, window),
                                  self.config.IMAGES_PER_GPU,
                                  names=["refined_anchors_clipped"])

        def nms(boxes, scores):
            '''
            非极大值抑制子函数
            :param boxes: [top_k, (y1, x1, y2, x2)]
            :param scores: [top_k]
            :return:
            '''
            indices = tf.image.non_max_suppression(
                boxes, scores, self.proposal_count,
                self.nms_threshold, name="rpn_non_max_suppression")
            proposals = tf.gather(boxes, indices)

            padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)

            proposals = tf.pad(proposals, [(0, padding), (0, 0)])
            return proposals
        proposals = utils.batch_slice([boxes, scores], nms,
                                      self.config.IMAGES_PER_GPU)
        return proposals

另外的一些说明:

1、 最后 maskrcnn 连接所有的层的时候,对 ProposalLayerinput=[rpn_class, rpn_bbox, anchors] ,其中 rpn_class 和 rpn_bbox 是 RPN的输出,而 anchors 是生成之后没有经过任何处理的锚框。

2、 这个 PRE_NMS_LIMIT 是在 tf.nn.top_k 之后、非极大值抑制之前保留的 rois的数量。在config文件中设置。

3、看一句代码:

ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
                         name="top_anchors").indices

tf.nn.top_k 的作用是,选出 scores 中最大的 pre_nms_limit 个数,会返回两个东西:values 和 indices。前者是数的具体值,后者是最大值的坐标。在这个例子中,输入参数 scores 的shape为 [Batch, num_rois, 1] ,返回的坐标能够索引到最大值的位置。
后面一段代码:

scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),
                                    self.config.IMAGES_PER_GPU,
                                    names=["pre_nms_anchors"])

调用了 utils.batch_slice ,这个工具方法的目的是,有很多函数不支持多个batch操作,所以就在这个方法里循环一下,单独输进去。这段源码如下:


def batch_slice(inputs, graph_fn, batch_size, names=None):
    """Splits inputs into slices and feeds each slice to a copy of the given
    computation graph and then combines the results. It allows you to run a
    graph on a batch of inputs even if the graph is written to support one
    instance only.

    inputs: list of tensors. All must have the same first dimension length
    graph_fn: A function that returns a TF tensor that's part of a graph.

    batch_size: number of slices to divide the data into.

    names: If provided, assigns names to the resulting tensors.

"""
    if not isinstance(inputs, list):

        inputs = [inputs]

    outputs = []

    for i in range(batch_size):
        inputs_slice = [x[i] for x in inputs]
        output_slice = graph_fn(*inputs_slice)
        if not isinstance(output_slice, (tuple, list)):
            output_slice = [output_slice]
        outputs.append(output_slice)

    outputs = list(zip(*outputs))

    if names is None:
        names = [None] * len(outputs)

    result = [tf.stack(o, axis=0, name=n)
              for o, n in zip(outputs, names)]
    if len(result) == 1:
        result = result[0]

    return result

其实就是利用索引值,获得前 pre_nms_limit 个anchors 对应的 scores 、deltas、 anchors(记为pre_nms_anchors)

三点说明结束,最后瞄一眼 compute_output_shape ,因为经过这个layer之后,tensor的形状发生了变化,所以要在这个函数中写清楚 shape 的形状,方便keras自动推算。
贴代码如下:

def compute_output_shape(self, input_shape):
        return (None, self.proposal_count, 4)

其中第0维,None代表的是batch,表示任意或不确定长度。最后 proposals 的形状是 [batch, rois, (y1, x1, y2, x2)]。rois的具体值在config中有设置,train模式下是2000,inference模式下是1000。

mask rcnn 超详细代码解读(二)直通车

Original: https://blog.csdn.net/Cleo_Gao/article/details/115122223
Author: Cleo_Gao
Title: mask rcnn 超详细代码解读(一)

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/511660/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球