The title graph is from ProInfoGAN below, an unsupervised discovery of high-quality Disentangled coding on Celeba-HQ.

I introduced a very simple “TVGAN” a few days ago. I wonder why no one has explicitly used such a simple thing before.

After a closer look at BigGAN today, I found that the SotA of Google was also using a very similar term LOSS. Let’s see why.

1. TVGAN training

First, review the TVGAN formula:

PENG Bo: TVGAN: A Simple and effective New GAN (and WGAN paper problems) Thinking about the theoretical details of DL (2)zhuanlan.zhihu.com

However, if you actually use it, you will find a problem:

If the output of D is rigidly mapped to [-1, 1] by tanh, then the gradient of G is easily lost (obviously, just as the gradient of MSE with CE is lost).

I guess that’s why people didn’t do it before.

The simple idea, then, is to remove the [-1,1] constraint and add a regular term hoping that the output of D is close to 0, for example:

But it feels so ugly.

So, I thought about it, came up with a strange trick, see the picture below:


Here are some of the things we tried, and the one that wasn’t commented out was the weird thing that came to mind:

  • For the true sample D(x) > 1, the gradient is simply zero (because it is sufficient).
  • Similarly, for the false sample D(x) < -1, the gradient is also zero (because it is sufficient).
  • We didn’t have to add any parameters, so that’s pretty good.
  • Another interesting thing about this is that now if D is too perfect, it will automatically stop and wait for G.

If I write it in the formula, it is:

Also, the first GGG in the code is another gradient scheme, which seems to work. The third GGG is plain GAN (apparently).

In fact, many different gans are just gradients. We could have gotten rid of the nonlinearity at the end of D and just changed the gradient.

It should be noted that G cannot be used in a similar way, as shown in the figure below:


For mysterious reasons, G will still use the original Loss, i.e., fixed gradient. Otherwise it would seem that G would be too weak relative to D, and there would be no image.

In fact, G Loss in original GAN is also different from D Loss.


BigGAN’s LOSS: Coincidence

Today I wanted to check out the latest progress, so I perused BigGAN’s paper:

[1809.11096] Large Scale GAN Training for High Fidelity Natural Image Synthesisarxiv.org


Found BigGAN using this so-called Hinge loss (from SA-GAN) :


It is easy to see that the method of loss of D and G here is exactly the same as that of “TVGAN”, which is really a coincidence, haha.

Of course, AS I have said before, the loss of GAN is not that important, so there is nothing to boast about, but it can prove that this method is reasonable.

BigGAN also has spectral normalization, which I personally regard as a general technique, the same property as BN.

BigGAN also made a number of interesting choices. Google doesn’t care if the model is “neat and elegant” at all (so feel free to give D and G different learning rates, D-D-G training cycles, etc.), only if it works well.

BigGAN works well because Attention is so important, just like BERT. With Attention, almost everything is solved by violence.

Google also explicitly stated that maybe BigGAN is so big that it will collapse at the back, and they don’t understand the underlying cause. So now I’m reporting the results before the collapse.


I don’t have a TPU cluster and can only guess. What BigGAN lacks, I guess, can be filled in by InfoGAN modules. Let’s move on.


3. ProInfoGAN and Feelings

This is a little known search to the latest results, the effect is very good. I was going to do something similar, but now there seems to be nothing left to do.

jonasz/progressive_infogangithub.com

Unsupervised discovery of high quality disentangled coding! This is one of the holy grails of unsupervised learning.

Please enjoy the video (Youtube, need to climb the wall), the effect is very good:

https://youtu.be/U2okTa0JGZgyoutu.be

In fact, it’s a combination of NVidia’s progressive GAN (ProGAN) and InfoGAN:

  • It found that the addition of regularization to aN InfoGAN module made GAN much more stable. Ordinary GAN loss is enough, it won’t collapse.
  • It requires different codes to correspond to different scales (changing eye color, for example, would certainly not be visible in a small image), and other tricks to achieve very high quality code separation.
  • For tips, see the draft of his paper.

Effect, turn left and right:


Mouth zhang he2:


Let’s go to Youtube, it’s much better, it’s amazing.

Totally unsupervised, with 80 meaningful dimensions, all of high quality.


4. Think

Is GAN close to its final form? You can imagine BigInfoGAN being strong.

Feed more data, enlarge the model, really Sky is the limit.

Technically, it still doesn’t know anything, but it’s strong.