Thursday, March 31, 2016

On Discriminative Regularization

The paper "Discriminative Regularization for Generative Models" makes an important insight - we can use the output of a classifier as a regularization term within the objective function for training generative models.

Other places where this "multi-task regularization" comes up:

  • Learning Physical Intuition of Block Towers by Example - in addition to "predicting where the blocks will end up", the network is also asked to predict a binary "will the blocks fall?" In a hand-wavy sense, this acts as a way to "force" the internal representation to be useful for classifying an important "high-level" feature: namely, whether the blocks fell or not. As humans, we tend to be more interested in whether the blocks fell or not than the exact L2 distance between where the blocks fell and where we thought they would fall.

Indeed, the images produced by E2C (and DRAW, to some extent!) appear blurry. D.R.



How might we apply D.R. to E2C?


Human perception is goal-driven. The objective term corresponding to the fidelity of latent space encoding/decoding should not only be penalized by deviations in prediction (i.e. binary CE between x, x_recons) but also be penalized in some "goal-space":

Some ideas;
  • Binary variable of whether agent is in free fall or not (to borrow directly from Lerer et al. 2016). Only applies to situations with gravity I think.
  • Scalar "value function" to predict the cost of a random T=10 horizon trajectory. We can compute this only using a random sample of controls, so this is actually fairly easy to sample. Given $x_t$, $u_{t},u_{t+1},...,u_{t+T} \in U$ we hallucinate $\hat{X}_{t:t+T}=\{\hat{x}_{t+1},\hat{x}_{t+2},...,\hat{x}_{t+T}\}$. Then we feed $\hat{X}_{t:t+T}$ (T images) into some big convnet (not unlike the Atari-playing game) and use that to compute the cost. Meanwhile, we can run the simulator on the actual controls $u_{t},u_{t+1},...,u_{t+T} \in U$ and use that to obtain $X_{t:t+T}$ and the true costs $J$ of the local trajectory. Of course, we don't want to bias the representation towards a very specific cost function (i.e. a very specific goal state) so we could sample a variety of goal images.
  • Approximate the value $c(z_t,u_t)$ for a randomly sampled $z_{\text{goal}}$.
  • Binary variable - "is the agent touching an obstacle?".
I'm currently working my way through "One-Shot Generalization in Deep Generative Models" has some really interesting ideas whose theoretical foundations I don't fully understand, but I feel will give me a much better grasp on understanding how attention mechanisms might help with improving E2C performance.

No comments:

Post a Comment