When I train the model to just minimize L_x, and then add the L_z term later and re-train, the network learns fine. However, when I try to minimize both L_x+L_z right off the bat, the gradient blows up.
Fixed it - I had L_z implemented incorrectly (wrong sign somewhere).
Now it trains with both L_z + L_x.
No comments:
Post a Comment