【深度学习】pix2pix GAN理论及代码实现

2023年7月12日下午7:46 • 人工智能 • 阅读 53

1.什么是pix2pix GAN

Pix2pixgan本质上是一个cgan，图片x作为此cGAN的条件，需要输入到G和D中。G的输入是x(x是需要转换的图片)，输出是生成的图片G(x)。D则需要分辨出（x，G(x)）和(x，y）

pix2pixGAN主要用于图像之间的转换，又称图像翻译。

2.pix2pixGAN生成器的设计

对于图像翻译任务来说，输入和输出之间会共享很多信息。比如轮廓信息是共享的。如何解决共享问题？需要我们从损失函数的设计当中去思考。

如果使用普通的卷积神经网络，那么会导致每一层都承载保存着所有的信息。这样神经网络很容易出错（容易丢失一些信息）

所以，我们使用UNet模型作为生成器

3.pix2pixGAN判别器的设计

D要输入成对的图像。这类似于cGAN,如果G(x)和x是对应的，对于生成器来说希望判别为1；

如果G（x）和x不是对应的，对于生成器来说希望判别器判别为0

pix2pixGAN中的D被论文中被实现为patch_D.所谓patch,是指无论生成的图片有多大，将其切分为多个固定大小的patch输入进D去判断。如上图所示。

这样设计的好处是：D的输入变小，计算量小，训练速度快

4.损失函数

D网络损失函数：输入真实的成对图像希望判定为1；输入生成图像与原图希望判定为0

G网络损失函数：输入生成图像与原图像希望判定为1

公式如下图所示：

对于图像翻译任务而言，G的输入和输出之间其实共享了很多信息。因而为了保证输入图像和输出图像之间的相似度，还加入了L1loss,公式如下所示：

5.代码实现

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils import data
import torchvision #&#x52A0;&#x8F7D;&#x56FE;&#x7247;
from torchvision import transforms #&#x56FE;&#x7247;&#x53D8;&#x6362;

import numpy as np
import matplotlib.pyplot as plt #&#x7ED8;&#x56FE;
import os
import glob
from PIL import Image

imgs_path = glob.glob('base/*.jpg')
annos_path = glob.glob('base/*.png')
#&#x9884;&#x5904;&#x7406;
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Resize((256,256)),
    transforms.Normalize(mean=0.5,std=0.5
                )

])

#&#x5B9A;&#x4E49;&#x6570;&#x636E;&#x96C6;
class CMP_dataset(data.Dataset):
    def __init__(self,imgs_path,annos_path):
        self.imgs_path =imgs_path
        self.annos_path = annos_path
    def __getitem__(self,index):
        img_path = self.imgs_path[index]
        anno_path = self.annos_path[index]
        pil_img = Image.open(img_path) #&#x8BFB;&#x53D6;&#x6570;&#x636E;
        pil_img = transform(pil_img)  #&#x8F6C;&#x6362;&#x6570;&#x636E;
        anno_img = Image.open(anno_path) #&#x8BFB;&#x53D6;&#x6570;&#x636E;
        anno_img = anno_img.convert("RGB")
        pil_anno = transform(anno_img)  #&#x8F6C;&#x6362;&#x6570;&#x636E;
        return pil_anno,pil_img
    def __len__(self):
        return len(self.imgs_path)

#&#x521B;&#x5EFA;&#x6570;&#x636E;&#x96C6;
dataset = CMP_dataset(imgs_path,annos_path)
#&#x5C06;&#x6570;&#x636E;&#x8F6C;&#x5316;&#x4E3A;dataloader&#x7684;&#x683C;&#x5F0F;&#xFF0C;&#x65B9;&#x4FBF;&#x8FED;&#x4EE3;
BATCHSIZE = 32
dataloader = data.DataLoader(dataset,
                            batch_size = BATCHSIZE,
                            shuffle = True)
annos_batch,imgs_batch = next(iter(dataloader))

#&#x5B9A;&#x4E49;&#x4E0B;&#x91C7;&#x6837;&#x6A21;&#x5757;
class Downsample(nn.Module):
    def __init__(self,in_channels,out_channels):
        super(Downsample,self).__init__()
        self.conv_relu = nn.Sequential(
        nn.Conv2d(in_channels,out_channels,
                 kernel_size=3,
                 stride=2,
                 padding=1),
        nn.LeakyReLU(inplace=True))
        self.bn = nn.BatchNorm2d(out_channels)
    def forward(self,x,is_bn=True):
        x=self.conv_relu(x)
        if is_bn:
            x=self.bn(x)
        return x

#&#x5B9A;&#x4E49;&#x4E0A;&#x91C7;&#x6837;&#x6A21;&#x5757;
class Upsample(nn.Module):
    def __init__(self,in_channels,out_channels):
        super(Upsample,self).__init__()
        self.upconv_relu = nn.Sequential(
        nn.ConvTranspose2d(in_channels,out_channels,
                 kernel_size=3,
                 stride=2,
                 padding=1,
                 output_padding=1), #&#x53CD;&#x5377;&#x79EF;&#xFF0C;&#x53D8;&#x4E3A;&#x539F;&#x6765;&#x7684;2&#x500D;
        nn.LeakyReLU(inplace=True))
        self.bn = nn.BatchNorm2d(out_channels)
    def forward(self,x,is_drop=False):
        x=self.upconv_relu(x)
        x=self.bn(x)
        if is_drop:
            x=F.dropout2d(x)
        return x

#&#x5B9A;&#x4E49;&#x751F;&#x6210;&#x5668;&#xFF1A;&#x5305;&#x542B;6&#x4E2A;&#x4E0B;&#x91C7;&#x6837;&#xFF0C;5&#x4E2A;&#x4E0A;&#x91C7;&#x6837;&#xFF0C;&#x4E00;&#x4E2A;&#x8F93;&#x51FA;&#x5C42;
class Generator(nn.Module):
    def __init__(self):
        super(Generator,self).__init__()
        self.down1 = Downsample(3,64)      #64,128,128
        self.down2 = Downsample(64,128)    #128,64,64
        self.down3 = Downsample(128,256)   #256,32,32
        self.down4 = Downsample(256,512)   #512,16,16
        self.down5 = Downsample(512,512)   #512,8,8
        self.down6 = Downsample(512,512)   #512,4,4

        self.up1 = Upsample(512,512)    #512,8,8
        self.up2 = Upsample(1024,512)   #512,16,16
        self.up3 = Upsample(1024,256)   #256,32,32
        self.up4 = Upsample(512,128)    #128,64,64
        self.up5 = Upsample(256,64)     #64,128,128

        self.last = nn.ConvTranspose2d(128,3,
                                      kernel_size=3,
                                      stride=2,
                                      padding=1,
                                      output_padding=1)  #3,256,256

    def forward(self,x):
        x1 = self.down1(x)
        x2 = self.down2(x1)
        x3 = self.down3(x2)
        x4 = self.down4(x3)
        x5 = self.down5(x4)
        x6 = self.down6(x5)

        x6 = self.up1(x6,is_drop=True)
        x6 = torch.cat([x6,x5],dim=1)

        x6 = self.up2(x6,is_drop=True)
        x6 = torch.cat([x6,x4],dim=1)

        x6 = self.up3(x6,is_drop=True)
        x6 = torch.cat([x6,x3],dim=1)

        x6 = self.up4(x6)
        x6 = torch.cat([x6,x2],dim=1)

        x6 = self.up5(x6)
        x6 = torch.cat([x6,x1],dim=1)

        x6 = torch.tanh(self.last(x6))
        return x6

#&#x5B9A;&#x4E49;&#x5224;&#x522B;&#x5668;  &#x8F93;&#x5165;anno+img(&#x751F;&#x6210;&#x6216;&#x8005;&#x771F;&#x5B9E;)  concat
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator,self).__init__()
        self.down1 = Downsample(6,64)
        self.down2 = Downsample(64,128)
        self.conv1 = nn.Conv2d(128,256,3)
        self.bn = nn.BatchNorm2d(256)
        self.last = nn.Conv2d(256,1,3)
    def forward(self,anno,img):
        x=torch.cat([anno,img],axis =1)
        x=self.down1(x,is_bn=False)
        x=self.down2(x,is_bn=True)
        x=F.dropout2d(self.bn(F.leaky_relu(self.conv1(x))))
        x=torch.sigmoid(self.last(x))  #batch*1*60*60
        return x

device = "cuda" if torch.cuda.is_available() else'cpu'
gen = Generator().to(device)
dis = Discriminator().to(device)
d_optimizer = torch.optim.Adam(dis.parameters(),lr=1e-3,betas=(0.5,0.999))
g_optimizer = torch.optim.Adam(gen.parameters(),lr=1e-3,betas=(0.5,0.999))
#&#x7ED8;&#x56FE;
def generate_images(model,test_anno,test_real):
    prediction = model(test_anno).permute(0,2,3,1).detach().cpu().numpy()
    test_anno = test_anno.permute(0,2,3,1).cpu().numpy()
    test_real = test_real.permute(0,2,3,1).cpu().numpy()
    plt.figure(figsize = (10,10))
    display_list = [test_anno[0],test_real[0],prediction[0]]
    title = ['Input','Ground Truth','Output']
    for i in range(3):
        plt.subplot(1,3,i+1)
        plt.title(title[i])
        plt.imshow(display_list[i])
        plt.axis('off') #&#x5750;&#x6807;&#x7CFB;&#x5173;&#x6389;
    plt.show()

test_imgs_path = glob.glob('extended/*.jpg')
test_annos_path = glob.glob('extended/*.png')

test_dataset = CMP_dataset(test_imgs_path,test_annos_path)

test_dataloader = torch.utils.data.DataLoader(
test_dataset,
batch_size=BATCHSIZE,)

#&#x5B9A;&#x4E49;&#x635F;&#x5931;&#x51FD;&#x6570;
#cgan &#x635F;&#x5931;&#x51FD;&#x6570;
loss_fn = torch.nn.BCELoss()
#L1 loss

annos_batch,imgs_batch = annos_batch.to(device),imgs_batch.to(device)
LAMBDA = 7  #L1&#x635F;&#x5931;&#x7684;&#x6743;&#x91CD;

D_loss = []#&#x8BB0;&#x5F55;&#x8BAD;&#x7EC3;&#x8FC7;&#x7A0B;&#x4E2D;&#x5224;&#x522B;&#x5668;loss&#x53D8;&#x5316;
G_loss = []#&#x8BB0;&#x5F55;&#x8BAD;&#x7EC3;&#x8FC7;&#x7A0B;&#x4E2D;&#x751F;&#x6210;&#x5668;loss&#x53D8;&#x5316;

#&#x5F00;&#x59CB;&#x8BAD;&#x7EC3;
for epoch in range(10):
    D_epoch_loss = 0
    G_epoch_loss = 0
    count = len(dataloader)
    for step,(annos,imgs) in enumerate(dataloader):
        imgs = imgs.to(device)
        annos = annos.to(device)
        #&#x5B9A;&#x4E49;&#x5224;&#x522B;&#x5668;&#x7684;&#x635F;&#x5931;&#x8BA1;&#x7B97;&#x4EE5;&#x53CA;&#x4F18;&#x5316;&#x7684;&#x8FC7;&#x7A0B;
        d_optimizer.zero_grad()
        disc_real_output = dis(annos,imgs)#&#x8F93;&#x5165;&#x771F;&#x5B9E;&#x6210;&#x5BF9;&#x56FE;&#x7247;
        d_real_loss = loss_fn(disc_real_output,torch.ones_like(disc_real_output,
                                                             device=device))
        d_real_loss.backward()

        gen_output = gen(annos)
        disc_gen_output = dis(annos,gen_output.detach())
        d_fack_loss = loss_fn(disc_gen_output,torch.zeros_like(disc_gen_output,
                                                              device=device))
        d_fack_loss.backward()

        disc_loss = d_real_loss+d_fack_loss#&#x5224;&#x522B;&#x5668;&#x7684;&#x635F;&#x5931;&#x8BA1;&#x7B97;
        d_optimizer.step()

        #&#x5B9A;&#x4E49;&#x751F;&#x6210;&#x5668;&#x7684;&#x635F;&#x5931;&#x8BA1;&#x7B97;&#x4EE5;&#x53CA;&#x4F18;&#x5316;&#x7684;&#x8FC7;&#x7A0B;
        g_optimizer.zero_grad()
        disc_gen_out = dis(annos,gen_output)
        gen_loss_crossentropyloss = loss_fn(disc_gen_out,
                                            torch.ones_like(disc_gen_out,
                                                              device=device))
        gen_l1_loss = torch.mean(torch.abs(gen_output-imgs))
        gen_loss = gen_loss_crossentropyloss +LAMBDA*gen_l1_loss
        gen_loss.backward() #&#x53CD;&#x5411;&#x4F20;&#x64AD;
        g_optimizer.step() #&#x4F18;&#x5316;

        #&#x7D2F;&#x8BA1;&#x6BCF;&#x4E00;&#x4E2A;&#x6279;&#x6B21;&#x7684;loss
        with torch.no_grad():
            D_epoch_loss +=disc_loss.item()
            G_epoch_loss +=gen_loss.item()
    #&#x6C42;&#x5E73;&#x5747;&#x635F;&#x5931;
    with torch.no_grad():
            D_epoch_loss /=count
            G_epoch_loss /=count
            D_loss.append(D_epoch_loss)
            G_loss.append(G_epoch_loss)
            #&#x8BAD;&#x7EC3;&#x5B8C;&#x4E00;&#x4E2A;Epoch,&#x6253;&#x5370;&#x63D0;&#x793A;&#x5E76;&#x7ED8;&#x5236;&#x751F;&#x6210;&#x7684;&#x56FE;&#x7247;
            print("Epoch:",epoch)
            generate_images(gen,annos_batch,imgs_batch)

Original: https://blog.csdn.net/weixin_51781852/article/details/126238330
Author: 无咎.lsy
Title: 【深度学习】pix2pix GAN理论及代码实现

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/688279/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

Python 元组tuple详解（超详细）

元组是有序且不可更改的集合。在Python中，元组使用圆括号 () 编写的。 1、创建元组元组的创建很简单，使用圆括号 () 直接创建或者使用 tuple() 函数创建，只需…

人工智能 2023年7月4日
0078
那些离开大厂，回归学术界的科学家们

「离开大厂，回归学术界」在近两年似乎已经成为了一种趋势，尤其是对于 AI 产业界而言，更是如此。产业界，留给”失败”的时间不多？近日，Google 公司…

人工智能 2023年7月17日
0065
【Transformers】BertModel模块的输入与输出

背景通常我们使用bert进行模型fine-tune时，大多是使用Transformer这个包，官方地址：https://huggingface.co/. 如果想使用Bert获取任…

人工智能 2023年5月27日
00251
自编码器(Auto-Encoder)

一、自编码器原理自编码器算法属于自监督学习范畴，如果算法把x作为监督信号来学习，这里算法称为自监督学习(Self-supervised Learning) 在监督学习中神经网络的…

人工智能 2023年7月25日
0064
ML之LIME：基于boston波士顿房价数据集回归预测利用LIME/SP-LIME局部解释图/权重图结合RF随机森林模型实现模型事后解释案例之详细攻略

ML之LIME：基于boston波士顿房价数据集回归预测利用LIME/SP-LIME局部解释图/权重图结合RF随机森林模型实现模型事后解释案例之详细攻略目录基于boston波士…

人工智能 2023年6月17日
0065
“Ninja is required to load C++ extensions”解决方案

问题描述 Ninja is required to load C++ extensions 在跑一份代码时，由于该代码中需要调用 torch/utils/cpp_extension…

人工智能 2023年6月23日
00416
【Python模块】- Numpy在AI中的应用

Numpy在AI中的应用 * – 1. 随机数(数组)操作 – 2. 线性代数 – 3. Numpy IO – 4. 计算激活函数 …

人工智能 2023年6月21日
0097
第28章锁

This program, x86.py, allows you to see how different thread interleavings either cause or…

人工智能 2023年6月28日
0051
机器学习——聚类

分类 vs 聚类分类：有监督学习（需要标签）；依据已知标签的数据，根据一定规则或模式，对新输入数据标记上影响标签（有明确的训练集，有人为给定标签）。聚类：无监督学习（没有…

人工智能 2023年5月31日
00129
【数据结构】带头双向循环链表的实现（C语言）

目录前言初始化增删及打印查找指定位置后插入删除指定位置销毁尾言前言之前我们讲了单链表的实现【数据结构】单链表定义的介绍及增删查改的实现，带头双向循…

人工智能 2023年6月30日
0081
人体姿态估计openpose学习与应用

前言 2021年时，就有做&#x4EBA…

人工智能 2023年7月28日
0091
【YOLOV5-6.x中文注释版】整体项目代码全中文注释导航页面-By2022

现在YOLOV5已经更新到6.X版本，现在网上很多还停留在5.X的源码注释上，因此特开一贴传承开源精神！ 5.X版本的可以看其他大佬的帖子本文章主要从6.X版本出发，主要解决6….

人工智能 2023年7月21日
0044
数字化风控的八个应用场景（下）

《银行家杂志》在数字化观察系列报道中，详细解读有关Ultipa Graph实时图数据库如何实现在数字化风控中的八个应用场景：系列报道中涉及以下8个场景： 1.个人业务中的反欺诈2…

人工智能 2023年6月10日
0077
【机器学习算法】神经网络与深度学习-3 BP神经网络

目录 BP神经网络（Back propagation）反向传播神经网络，也被叫做多层感知机。输入字段节点个数如何确定BP神经网络的特点：隐藏层个数如何确定BP神经网络如何传递信息B…

人工智能 2023年6月23日
0055
超分辨率——基于SRGAN的图像超分辨率重建(Pytorch实现)

基于SRGAN的图像超分辨率重建本文偏新手项，因此只是作为定性学习使用，因此不涉及最后的定量评估环节目录基于SRGAN的图像超分辨率重建 * 1 简要介绍 2 代码实现 &#…

人工智能 2023年5月26日
0059
数据挖掘考试（大纲）

数据挖掘过程？数据清理（消除噪声和删除不一致数据）数据集成（多种数据源可以组合在一起）数据选择（从数据库中提取与分析任务相关的数据）数据变换（通过汇总与聚集操作，把数据变换…

人工智能 2023年7月17日
0050

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

【深度学习】pix2pix GAN理论及代码实现

大家都在看