The latent space seems to get pretty "knotted up" in areas where the robot doesn't explore as much. Currently the dataset's distribution is uniform over the trajectories explored by the robot (but not uniform in possible states in the world).
Does training over more samples of the robot in the blue region result in a better unfolding?
Can we do some kind of interactive importance-sampling of $x_t$ from the dataset? If the training loop is coupled to the simulation loop, can we get the current latent manifold to inform what areas to search over?
Anyway, I had this really random idea that I tried, which was to extend the E2C to not only produce 1 step prediction, but actually multi-step. The intuition is that maybe the error for 1 step is minimized, but as we march forward, the errors accumulate quickly at $t=2,3,...$, which we can squash via gradient descent. So far simply extending E2C to also produce a prediction for $x2$ given $u1,x1$ is not very helpful. The plateaued error is about twice that of single step one.
But what if I predict a longer sequence using the reconstruction as the input? I'd expect results to get very poor after a few steps due to the blurriness accumulating. Maybe we can squash that.
No comments:
Post a Comment