Monday, May 16, 2016

Adaptive Exploration on Plane Task

I got the Plane E2C task to work: the adaptive policy is more adept at quickly finding areas of high E2C loss.

Left column is database samples (fixed size of 3000 points), right column is marginal E2C loss (mean over all possible U for a given position).

Random policy:
Adaptive Policy:

I repeated this experiment on a different environment map, one with a narrow corridor.

Random policy:


Adaptive policy:


Notice how in both cases, the adaptive scheme's dataset covers the full training space sooner than the random policy.


No comments:

Post a Comment