Sept. 22, 2019, 1:34 p.m.
I started working on a variational auto-encoder (VAE) for faces a few months ago. I was easily able to make a non-variational autoencoder to reproduce images that worked incredibly well, but since it was not variational there wasn't much you could do with it other than compress images. I wanted to be able to play with interpolation and such, and for that you need a VAE. So I converted my auto-encoder to a variational one, but the problem was that the resulting images were very blurry and the quality wasn't all that great. So I thought maybe I could attach a GAN to this to make the images look more realistic. And I tried that but unfortunately it didn't work very well, the GAN was trying to produce to generate images of what it though were faces will the autoencoder was trying to reproduce its input, as seen in the images below:
After fighting with this for a few months I decided to try to make sure that the GAN was working properly before I added on the autoencoder, and although I had to fight with the GAN quite a bit and was never able to get it to generate really high quality images, I was sure that it was working properly. So I decided to try to hook it up to the autoencoder again.
Then I discovered this paper Autoencoding beyond pixels using a learned similarity metric, which does the same thing I was trying to do but in a much smarter way. What I had been doing was using the MSE between the input and the generated images for my VAE loss, and training both the encoder and the decoder with the GAN loss. Obviously this did not work.
What they do in the paper is basically separate the encoder and leave the decoder and discriminator as the GAN, which is trained as usual. I had tried to think of ways to train the encoder and decoder separately, but my ideas were much more primitive and didn't work at all. What they do that is train the encoder separately, using the KLD loss and - this is the brilliant part - instead of using MSE between the input and the recreation they use the MSE between a feature map from an intermediate layer of the discriminator for the real and faked images. So rather than trying to produce an exact duplicate of the input, the encoder is trying to produce something that the discriminator thinks is close to the input.
It took me a few hours to rewrite my code to make use of this new loss, and come up with a version that would be able to run without having to keep all of the graphs in memory and be able to train in a reasonable amount of time, and I think everything is finally working. Hopefully this works better than my previous attempts, and next time I will try to remember to review the literature before trying to implement a new idea on my own.
Sept. 21, 2019, 1:18 p.m.
I've now been trying to train my GANs for quite a while and still haven't been too successful, but I have learned some tricks. I found this excellent article a while ago and I didn't really understand it completely at first, but after having tried a lot of its tricks I understand them now. Here are my thoughts and some additional tricks I have used:
Some additional tips on how to construct a GAN:
Sept. 16, 2019, 10:57 a.m.
Discovering how much cheaper spot EC2 instances were than normal on-demand instances gave me the courage to try out a faster GPU. I had been using K80s which are painfully slow, but very cheap. The spot price for the V100 is about the same as the on-demand price of the K80s, so using those with spot instances won't be any cheaper, but it won't be more expensive either.
I didn't think the V100s were such great GPUs, so I wasn't expecting it to be worth the extra cost. How wrong I was. Training the network I am currently playing with on a K80 with a batch size of 48 took about 8-12 hours per epoch. Training it on a V100 with a batch size of 64 is looking like it's going to take about 2 hours. With the V100s priced at about 4x the K80s, that works out to about the same price per compute to a little bit cheaper, depending on exactly how long it took per epoch on the K80.
When you factor in the value of not having to wait an entire day to see the results of an epoch, this is a no-brainer as far as I'm concerned. Unfortunately, I'm sure my AWS bill is going to increase substantially. That's how they get you... Once you have a taste of HPC they know you'll be back for more...
Sept. 12, 2019, 4:35 p.m.
My major complaint about using EC2 GPU instances was the cost, it gets very expensive to run a GPU instance for more than a few hours. Last week I was wondering why I wasn't using spot instances, so I set up a request and I've been running it for a few days now. It is about 1/4 the price of a normal instance, so it's not much more expensive than renting a CPU-only on-demand instance. I was hoping to get a better GPU than the K80, but I ended up settling for the K80 because it was more available than the better GPUs, but next time I may request a better one and see what happens.
The downside of spot instances is that they will be terminated if the capacity is needed for an on-demand instance, and my instance was terminated the other night. But then I spun up a new one in the morning and that one has been running for a few days now. I can't believe I haven't used these before.