Mio's site
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
RSS Bot@lemmy.bestiver.seMB to Hacker News@lemmy.bestiver.seEnglish · 6 hours ago

Self-Distillation Enables Continual Learning [PDF]

arxiv.org

external-link
message-square
0
link
fedilink
2
external-link

Self-Distillation Enables Continual Learning [PDF]

arxiv.org

RSS Bot@lemmy.bestiver.seMB to Hacker News@lemmy.bestiver.seEnglish · 6 hours ago
message-square
0
link
fedilink
Self-Distillation Enables Continual Learning
arxiv.org
external-link
Continual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement learning can reduce forgetting, it requires explicit reward functions that are often unavailable. Learning from expert demonstrations, the primary alternative, is dominated by supervised fine-tuning (SFT), which is inherently off-policy. We introduce Self-Distillation Fine-Tuning (SDFT), a simple method that enables on-policy learning directly from demonstrations. SDFT leverages in-context learning by using a demonstration-conditioned model as its own teacher, generating on-policy training signals that preserve prior capabilities while acquiring new skills. Across skill learning and knowledge acquisition tasks, SDFT consistently outperforms SFT, achieving higher new-task accuracy while substantially reducing catastrophic forgetting. In sequential learning experiments, SDFT enables a single model to accumulate multiple skills over time without performance regression, establishing on-policy distillation as a practical path to continual learning from demonstrations.

Comments

alert-triangle
You must log in or # to comment.

Hacker News@lemmy.bestiver.se

hackernews@lemmy.bestiver.se

Subscribe from Remote Instance

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !hackernews@lemmy.bestiver.se
lock
Community locked: only moderators can create posts. You can still comment on posts.

Posts from the RSS Feed of HackerNews.

The feed sometimes contains ads and posts that have been removed by the mod team at HN.

Source of the RSS Bot

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 488 users / day
  • 1.78K users / week
  • 4.37K users / month
  • 9.78K users / 6 months
  • 1 local subscriber
  • 4.87K subscribers
  • 26.8K Posts
  • 17.9K Comments
  • Modlog
  • mods:
  • patrick@lemmy.bestiver.se
  • RSS Bot@lemmy.bestiver.se
  • BE: 0.19.12
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org