if they existed they’d be killer for RL. RL is insanely unstable when the distribution shifts as the policy starts exploring different parts of the state space. you’d think there’d be some clean approach to learning P(Xs|Ys) that can handle continuous shift of the Ys distribution in the training data, but there doesn’t seem to be. just replay buffers and other kludges.
- 0 Posts
- 2 Comments
Joined 4 months ago
Cake day: March 10th, 2025
You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.
the neutrino was estimated to at 220 PeV. goodness!