Mio's site
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
RSS Bot@lemmy.bestiver.seMB to Hacker News@lemmy.bestiver.seEnglish · 2 hours ago

ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math

firethering.com

external-link
message-square
0
link
fedilink
1
external-link

ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math

firethering.com

RSS Bot@lemmy.bestiver.seMB to Hacker News@lemmy.bestiver.seEnglish · 2 hours ago
message-square
0
link
fedilink
ZAYA1-8B Matches DeepSeek-R1 on Math with Less Than 1B Active Parameters. - Firethering
firethering.com
external-link
Who should care If you work with math, science problems, or complex coding tasks and you're looking for something small enough to run locally or cheaply via API, this is worth serious evaluation. The benchmark numbers at 760M active parameters are not normal and the Markovian RSA boost means performance scales with compute budget rather than hitting a fixed ceiling. If you're building agent workflows that need reliable tool calling or multi-step instruction following, look elsewhere for now. The agentic numbers are honest about that gap. Researchers working on test-time compute methods will find the Markovian RSA implementation worth studying regardless of whether they deploy the model itself. The co-design approach — training the model specifically to work with the inference method rather than applying the method after the fact — is an interesting direction that most labs haven't published on at this level of detail. The AMD training story is also worth paying attention to if you care about where the hardware ecosystem goes next. This is the most capable model trained end to end on AMD hardware that anyone has published. That matters beyond just this one release.

Comments

alert-triangle
You must log in or # to comment.

Hacker News@lemmy.bestiver.se

hackernews@lemmy.bestiver.se

Subscribe from Remote Instance

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !hackernews@lemmy.bestiver.se
lock
Community locked: only moderators can create posts. You can still comment on posts.

Posts from the RSS Feed of HackerNews.

The feed sometimes contains ads and posts that have been removed by the mod team at HN.

Source of the RSS Bot

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 387 users / day
  • 2.03K users / week
  • 4.52K users / month
  • 9.76K users / 6 months
  • 1 local subscriber
  • 4.83K subscribers
  • 26K Posts
  • 17.3K Comments
  • Modlog
  • mods:
  • patrick@lemmy.bestiver.se
  • RSS Bot@lemmy.bestiver.se
  • BE: 0.19.12
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org