CS2980 Research Blog: Project Update

My E2C implementation is having some issues that I've been trying to debug over the last few days. Basically the training process is unstable (I'm using the same random seed).

Part of the ratio of determinants in the computation of the KL divergence between $N(\mu_0,A\Sigma_0A^T)$ includes a $log(1+v^Tr)$ term. $vr^T$ is the perturbation of of an identity matrix to create $A$, which is used in the linearized dynamics of the robot. Vectors $v$ and $r$ are generated using an affine transformation of $h_\psi^\text{trans}$, which implies that $v^Tr$ could be less than -1, in which the log will cause an underflow. I'm not quite sure how to get around this. The Torch implementation of E2C seems to compute the ratio of determinants via torch.log(q.sigma):sum() - torch.log(p.sigma):sum() but that assumes A to be diagonal. Perhaps it's okay to assume A is diagonally-dominant, since $vr^T$ is supposed to be a small perturbation?
My plane dataset implementation generates an independent set of obstacles for each trajectory, but this is incorrect. z_dim=2, which intuitively is "just enough" to encode the position of the robot. But we are asking the autoencoder to learn how to represent the obstacle positions as well. The obvious solution is to modify the task so that the obstacle state is fixed across all samples, and is "baked" into the network weights of the autoencoder during training.
Originally I had the robot as a black square on a white background (obstacles black). It turns out that white robot and obstalces on a black background work better.
Using GradientDescent instead of Adam prevented blowup (not sure if it has to do with the Adam $\beta_1$, $\beta_2$ parameters)
E2C is too complicated to debug, and I think there are issues with my KL divergence term implementation. To simplify things, I implemented a simple VAE. It works just fine on the MNIST task (naturally when z=2, network doesn't seem to be able to learn efficient encodings for all 10 digits), but fails to accurately encode the position of the robot. I tried downscaling the 40x40 plane task to 28x28 (same size as mnist) but performance is still bad.
L_z term is blowing up even in VAE.
Perhaps VAE is not learning plane task properly because weights are not convolutional and the spatial semantics are lost? I will try robot size of 1 pixel on the blank plane dataset and see if that learns properly.

CS2980 Research Blog

Tuesday, March 29, 2016

Project Update

No comments:

Post a Comment