# 生成式模型之 GAN

### Generative Adversarial Nets(GAN)

GAN的优化是一个极小极大博弈问题，公式如下：

[\underset{G}{\min} \: \underset{D}{\max}V(D,G) =\mathbb E_{x\sim p_{data}(x)}[logD(x)]+\mathbb E_{z\sim p_{z}(z)}[log(1-D(G(z)))] ]

[\begin{align} V(G,D)&=\int_x p_{data}(x)\log(D(x))dx+\int_zp_z(z)\log(1-D(g(z)))dz \ &=\int_x [p_{data}(x)\log(D(x))+p_g(x)\log(1-D(x))]dx \end{align} ]

[En]

First fix G, find ( underset {D} { max} V (DMagne G)), make its derivative equal to 0, and get the optimal solution of D.

[D^*G(x)={p{data}(x)\over p_{data}(x)+p_g(x)} ]

[\begin{align} \underset{G}\min V(G,D^*G) &= \int_x [p{data}(x)\log{p_{data}(x)\over p_{data}(x)+p_g(x)}+p_g(x)\log{p_g(x)\over p_{data}(x)+p_g(x)}]dx \ &= \mathbb E_{x\sim p_{data}}[\log{p_{data}(x)\over p_{data}(x)+p_g(x)}]+\mathbb E_{x\sim p_g}[\log{p_g(x)\over p_{data}(x)+p_g(x)}] \ &= -\log 4+KL(p_{data}\|{p_{data}+p_g\over 2})+KL(p_g\|{p_{data}+p_g\over 2}) \ &= -\log 4+2JS(p_{data}\|p_g) \end{align} ]

JS散度：(JS(P\|Q)={1\over 2}KL(P\|{P+Q\over 2})+{1\over 2}KL(Q\|{P+Q\over 2}))

JS散度具有对称性，而KL没有。

[En]

As long as there is no overlap between P and Q or the overlap can be ignored, the JS divergence is always constant, and this means that for the gradient descent method-the gradient is zero! At this time, for the optimal discriminator, the generator can not get the gradient information; even for the near-optimal discriminator, the generator has a good chance to face the problem of gradient disappearance.

## f-GAN

### f-divergence

P和Q是两个分布,p(x),q(x)是x的分布概率

[D_f(P||Q)=\int_x q(x)f({p(x)\over q(x)})dx ]

### Fenchel Conjugate

[En]

Every convex function f has a conjugate function fallow as opposed to it

(f^ (t)=\max_{x\in dom(f)}{xt-f(x)}),且(f ) * = f.

(f(x)=\max_{t\in dom(f^)}{xt-f^(t)}),带入(D_f(P||Q))得:

[\begin{align} D_f(P||Q) &=\int_x q(x)f({p(x)\over q(x)})dx \ &=\int_xq(x)(\max_{t\in dom(f^)}{{p(x)\over q(x)}t-f^(t)})dx \ &=\max_D\int_x p(x)D(x)dx-\int_x q(x)f^*(D(x))dx \ &\text{(t=D(x))} \end{align} ]

[En]

So in GAN,

[D_f(P_{data}\|P_G)=\max_D{E_{x\sim P_{data}}[D(x)]-E_{x\sim P_G}[f^*(D(x))]} ]

[En]

You can use any f-divergence, such as JS,Jeffrey,Pearson.

## WGAN

[En]

Original version: weight clipping, improved version: gradient penalty.

[En]

Thesis:

• Martin Arjovsky, Soumith Chintala, Léon Bottou, Wasserstein GAN, arXiv preprint, 2017
• Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville,”Improved Training of Wasserstein GANs”, arXiv preprint, 2017

[En]

Main idea: use Earth Mover’s Distance (Wasserstein Distance) to evaluate the distance between two distributions. The bulldozer distance represents the minimum amount of transport that transforms one distribution into another.

GaN以前使用的JS发散的缺点是，当两个分布不相交时，距离为0，梯度为0，网络很难学习。推土机的距离可以解决这个问题。此时，网络可以继续学习，但为了防止梯度爆炸，需要采取权重裁剪等手段。

[En]

The disadvantage of JS divergence used by GAN before is that when the two distributions do not intersect, the distance is 0 and the gradient is 0, the network is very difficult to learn. Earth Mover’s Distance can solve this problem. At this time, the network can continue to learn, but in order to prevent gradient explosion, weight clipping and other means are needed.

[En]

Confrontation samples are not directly related to generative confrontation networks, which want to learn the internal expression of samples so that they can generate new samples. however, the existence of confrontation samples to a certain extent shows that the model does not learn some internal expression or distribution of data, but may be that learning some specific patterns is enough to achieve the goal of classification or regression.

GAN生成的图片能否用于CNN训练？

[En]

For now, it should not be possible. Because GAN is generated by sampling from a small distribution, it is a very small part of the real world, so it is not widely applicable for training. In addition, it is difficult for the current GAN to generate larger images (above 32×32).

Original: https://www.cnblogs.com/makefile/p/GAN.html
Author: 康行天下
Title: 生成式模型之 GAN

(0)