MAE 代码实战详解

2023年7月28日下午2:04 • 人工智能 • 阅读 74

MAE 代码实战详解

if__name__==”main“
*
model.forward
–

if__name__==” main “

MAE 模型选择

def mae_vit_base_patch16_dec512d8b(**kwargs):
    model = MaskedAutoencoderViT(
        patch_size=16, embed_dim=768, depth=12, num_heads=12,
        decoder_embed_dim=512, decoder_depth=8, decoder_num_heads=16,
        mlp_ratio=4, norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)
    return model

debug 调试

if__name__=="__main__":
    model = mae_vit_base_patch16_dec512d8b()
    input = torch.rand(1,3,224,224)
    output = model(input)

model.forward

    def forward(self, imgs, mask_ratio=0.75):
        latent, mask, ids_restore = self.forward_encoder(imgs, mask_ratio)
        pred = self.forward_decoder(latent, ids_restore)
        loss = self.forward_loss(imgs, pred, mask)
        return loss, pred, mask

model.forward.encorder

latent, mask, ids_restore = self.forward_encoder(imgs, mask_ratio)

x = self.patch_embed(x)
PatchEmbed理解 x.shape:[B,C,H,W]->[B,H*W,C]

    PatchEmbed(
          (proj): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))
          (norm): Identity()
        )

    def forward(self, x):
        B, C, H, W = x.shape
        _assert(H == self.img_size[0], f"Input image height ({H}) doesn't match model ({self.img_size[0]}).")
        _assert(W == self.img_size[1], f"Input image width ({W}) doesn't match model ({self.img_size[1]}).")
        x = self.proj(x)
        if self.flatten:
            x = x.flatten(2).transpose(1, 2)
        x = self.norm(x)
        return x

LayerNorm与BatchNorm区别

pos_embed = get_2d_sincos_pos_embed(self.pos_embed.shape[-1], int(self.patch_embed.num_patches**.5), cls_token=True)

def get_2d_sincos_pos_embed(embed_dim, grid_size, cls_token=False):
"""
    grid_size: int of the grid height and width
    return:
    pos_embed: [grid_size*grid_size, embed_dim] or [1+grid_size*grid_size, embed_dim] (w/ or w/o cls_token)
"""
    grid_h = np.arange(grid_size, dtype=np.float32)
    grid_w = np.arange(grid_size, dtype=np.float32)
    grid = np.meshgrid(grid_w, grid_h)
    grid = np.stack(grid, axis=0)

    grid = grid.reshape([2, 1, grid_size, grid_size])
    pos_embed = get_2d_sincos_pos_embed_from_grid(embed_dim, grid)
    if cls_token:
        pos_embed = np.concatenate([np.zeros([1, embed_dim]), pos_embed], axis=0)
    return pos_embed

np.meshgrid
no.stack 填充

def get_2d_sincos_pos_embed_from_grid(embed_dim, grid):
    assert embed_dim % 2 == 0

    emb_h = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[0])
    emb_w = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[1])

    emb = np.concatenate([emb_h, emb_w], axis=1)
    return emb

def get_1d_sincos_pos_embed_from_grid(embed_dim, pos):
"""
    embed_dim: output dimension for each position
    pos: a list of positions to be encoded: size (M,)
    out: (M, D)
"""
    assert embed_dim % 2 == 0
    omega = np.arange(embed_dim // 2, dtype=np.float)
    omega /= embed_dim / 2.

    omega = 1. / 10000**omega

    pos = pos.reshape(-1)
    out = np.einsum('m,d->md', pos, omega)

    emb_sin = np.sin(out)
    emb_cos = np.cos(out)

    emb = np.concatenate([emb_sin, emb_cos], axis=1)
    return emb

Transformer学习笔记一：Positional Encoding（位置编码）
如何理解和使用NumPy.einsum？

model.forward.decorder

model.forward.loss

大小排序索引-有点神奇


        ids_shuffle = torch.argsort(noise, dim=1)
        ids_restore = torch.argsort(ids_shuffle, dim=1)

torch.gather

torch.gather(input, dim, index, out=None) &#x2192; Tensor

 Gathers values along an axis specified by dim.

 For a 3-D tensor the output is specified by:

 out[i][j][k] = input[index[i][j][k]][j][k] # dim=0
 out[i][j][k] = input[i][index[i][j][k]][k] # dim=1
 out[i][j][k] = input[i][j][index[i][j][k]] # dim=2

 Parameters:

  input (Tensor) &#x2013; The source tensor
  dim (int) &#x2013; The axis along which to index
  index (LongTensor) &#x2013; The indices of elements to gather
  out (Tensor, optional) &#x2013; Destination tensor

 Example:

 >>> t = torch.Tensor([[1,2],[3,4]])
 >>> torch.gather(t, 1, torch.LongTensor([[0,0],[1,0]]))
  1 1
  4 3
 [torch.FloatTensor of size 2x2]

 For a 2-D tensor the output is specified by:

 out[i][j] = input[    index[i][j]   ][j] # dim=0
 out[i][j] = input[i][    index[i][j][k]   ][k] # dim=1

Example:

 >>> t = torch.Tensor([[1,2],[3,4]])
 >>> torch.gather(t, 1, torch.LongTensor([[0,0],[1,0]]))
  1 1
  4 3
output[i][j]   =   input[i][   index[i][j]   ]

 >>> t = torch.Tensor([[1,2],[3,4]])
 >>> torch.gather(t, 0, torch.LongTensor([[0,0],[1,0]]))
  1 2
  3 2
  output[i][j]   =   input[   index[i][j]   ][j]

参考1

Original: https://blog.csdn.net/weixin_49117441/article/details/126095170
Author: @bnu_smile
Title: MAE 代码实战详解

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/720552/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

优序图权重计算指标解读

一、应用问卷研究中，如果涉及计算权重，通常有以下几种做法，分别是AHP层次法、优序图法、熵值法、因子分析法（探索性因子和验证性因子分析）。二、操作 1.SPSSAU操作如下图：…

人工智能 2023年7月18日
0073
基于yolov5的Android版本目标检测app开发（部署安卓手机）

基于yolov5的Android版本目标检测app开发（部署安卓手机） 0、项目开发需求（1）开发app部署到安卓手机（2）支持VOC数据集上所有的目标检测1、开发环境搭建wind…

人工智能 2023年6月17日
0069
指定区域内随机填充圆之matlab实现

之前有伙伴在QQ交流群里询问如何在matlab中编程实现上图所示，刨去图中所有修饰，这个听起来相对高级的颗粒堆叠非均相模型实则就是在矩形区域随机填充不同半径大小的圆。再更进一步抽象…

人工智能 2023年7月18日
0041
超实用的公众号运营经验总结分享

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年6月28日
0073
一款高速的NET版的离线免费OCR

一款基于Paddle的OCR，项目使用ONNX模型，速度更快。本项目同时支持X64和X86的CPU上使用。本项目是一个基于PaddleOCR的C++代码修改并封装的.NET的工具类…

人工智能 2023年6月4日
0098
大数据是什么？1分钟了解大数据的概念

在21世纪我们迎来了大数据时代，大数据不仅对个人的日常生活产生了巨大的影响，对企业日常经营的影响更是深远。当前，企业的成长规模以及发展规划，都可以依靠大数据进行统计和分析，进而为企…

人工智能 2023年7月16日
0057
OpenCV中图像特征提取与描述

目录图像特征提取与描述 * 图像的特征 Harris和Shi-Tomas算法 – Harris角点检测 Shi-Tomasi角点检测小结 SIFT/SURF算法 &…

人工智能 2023年5月28日
0060
手把手教你DouZero项目的环境配置及运行，使用注意事项（否则会退出）

新手小白，即使啥都不懂，也完全可以成功（因为我也是小白）。Ai斗地主未经训练，胜率可能比较低。训练方法目前还不会，有兴趣的小伙伴，大家可以去查找资料。代码来自GitHub 源码在…

人工智能 2023年7月26日
0059
【深度学习】6-卷积过程中数据的结构变化

🚩 前言在学习卷积神经网络时，我对于卷积过程中数据的结构变化常感困惑不解（如改变数组的维度顺序），因此在这里做一些整理。文章目录 🚩 前言多通道特征图 * 1. 多通道的形成…

人工智能 2023年6月16日
0072
python3.9安装tensorflow-gpu2.6以上

1、机器环境说明： CPU：i5-7300HQ GPU：NVIDIA GeForce GTX 1050 1、查询对应版本链接：在 Windows 环境中从源代码构建 | Tens…

人工智能 2023年5月24日
0076
深度学习之CSPNet网络分析

一、简介 CSPNet：Cross Stage Partial Network，跨阶段局部网络作用：从网络设计角度来缓解以前推理时需要很大计算量的问题推理计算过高的原因：由于网…

人工智能 2023年6月25日
0061
使用MobileViT替换YOLOv5主干网络

使用MobileViT替换YOLOv5主干网络，并训练前述 * 使用MobileViT替换YOLOv5主干网络训练前述读了MobileViT这篇论文之后觉得文章里面提到的技…

人工智能 2023年7月27日
0051
AdamW优化器（自适应梯度方法）

DECOUPLED WEIGHT DECAY REGULARIZATION解耦权值衰减正则化摘要 L2正则化和权值衰减正则化对于标准随机梯度下降是等价的(当按学习率重新调整时)，…

人工智能 2023年7月22日
00112
VS2019下载地址+PCL安装教程(Win10+PCL1.12.1)

visual studio2019 社区版的下载链接：链接：https://pan.baidu.com/s/1f0y9DIxa7roXboRK_u–Ew提取码：hrqa…

人工智能 2023年6月10日
0079
tensorflow学习3 — 建立网络详解

首先总结一下核心过程：导入数据、建立网络、制定相关标准、培训、评估模型和做出预测。 [En] Import data, establish networks, set releva…

人工智能 2023年5月23日
0083
python 创建空的dataframe_python 创建一个空dataframe 然后添加行数据的实例

实例如下所示： import pandas as pd import re import math dframe1 = pd.read_excel(“window re…

人工智能 2023年7月8日
0047

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

MAE 代码实战详解

MAE 代码实战详解

model.forward

model.forward.encorder

model.forward.decorder

model.forward.loss

大小排序索引-有点神奇

torch.gather

大家都在看