Wednesday, March 30, 2016

Figured it out

Fixed the VAE bug. 

Turns out that my VAE model was correctly implemented, but during testing I was forgetting to invert the images I was passing in (that's what I was doing during training). Of course, this messes everything up. The reconstruction losses I was obtaining was actually converging very close to 0, which was an accurate reflection of what was going on during training.

The issue was my sloppy programming skills. I could have avoided this mistake (and saved a couple days of grief) by simplifying the data interface to avoid having to manually process it after calling sample().

Sent an email to Manuel Watter asking about the KL divergence business and some other questions. Here's what he says:
1.  You're right, the KL can explode. Ideally, the network should stay away from the problematic values for v and r, but that's of course not guaranteed. John Assael's approximation is only valid if, as you said, the is only insignificant covariance between the state dimensions, i.e. small perturbations of v and r. This becomes problematic for example in the case of velocities, because they obviously influence the position. The network might choose a state representation that minimizes this influence then, but it's hard to say.
2. Both of your tests should work here and I think the MNIST reconstructions look not too bad. In 10 dimensions, he really can't do more than represent an average version of the numbers and since there's no label information, some of those get blurred together. For the maze on the other hand, he should be able to give almost perfect reconstructions, or at least get the positions of the agent right. From your settings, the minibatch size seems a little odd. 1000 is pretty much, try 128 or 64. I'm not sure about the beta1 value of Adam, but I think we always had 0.1.
About inverting the images: For our training, the background is always 0 and the agent 1, we just switched it for the visualization. The diffculty here is the sparse gradient information, which in the past lead to the agent information being lost, but Adam should solve that issue.
To make sure, that this really works, try to create a dataset which has an image for every possible position of the agent and sample your minibatches from that. This has to work for the VAE and you can use it to visualize the latent spaces. You might also try to compare your implementation to e.g Lasagne to find out where it goes wrong.
3. Correct, that's a mistake :/
4. Yes, we didn't specify anything here. We assume that the inference network handles all the noise.
Here's a VAE reconstruction of *just* the images:




Next steps:

  • Implement a way to visualize the latent space of trained models as a mapping of it's "true" task space. I will need a "Variational Model" class of some sort to be able to abstract this.
  • Re-implement the VAE task with E2C-type modules.
  • I'm thinking of adding a logarithmic barrier to guard against v^r < -1 condition. Maybe that will help?

No comments:

Post a Comment