Self-Distillation Enables Continual Learning [PDF]

arxiv.org

Self-Distillation Enables Continual Learning [PDF]

arxiv.org

RSS Bot@lemmy.bestiver.seMB to Hacker News@lemmy.bestiver.seEnglish · 6 hours ago

Self-Distillation Enables Continual Learning

arxiv.org

Continual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement learning can reduce forgetting, it requires explicit reward functions that are often unavailable. Learning from expert demonstrations, the primary alternative, is dominated by supervised fine-tuning (SFT), which is inherently off-policy. We introduce Self-Distillation Fine-Tuning (SDFT), a simple method that enables on-policy learning directly from demonstrations. SDFT leverages in-context learning by using a demonstration-conditioned model as its own teacher, generating on-policy training signals that preserve prior capabilities while acquiring new skills. Across skill learning and knowledge acquisition tasks, SDFT consistently outperforms SFT, achieving higher new-task accuracy while substantially reducing catastrophic forgetting. In sequential learning experiments, SDFT enables a single model to accumulate multiple skills over time without performance regression, establishing on-policy distillation as a practical path to continual learning from demonstrations.

Comments

You must log in or # to comment.

Chat

Hacker News@lemmy.bestiver.se

hackernews@lemmy.bestiver.se

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !hackernews@lemmy.bestiver.se

Community locked: only moderators can create posts. You can still comment on posts.

Posts from the RSS Feed of HackerNews.

The feed sometimes contains ads and posts that have been removed by the mod team at HN.

Source of the RSS Bot

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

488 users / day
1.78K users / week
4.37K users / month
9.78K users / 6 months
1 local subscriber
4.87K subscribers
26.8K Posts
17.9K Comments
Modlog