Left column is database samples (fixed size of 3000 points), right column is marginal E2C loss (mean over all possible U for a given position).
Random policy:
Adaptive Policy:
I repeated this experiment on a different environment map, one with a narrow corridor.
Random policy:
Adaptive policy:
Notice how in both cases, the adaptive scheme's dataset covers the full training space sooner than the random policy.
No comments:
Post a Comment