
1·
4 days agoReasoning benchmarks test the models ability to demonstrate complex problem solving capabilities. Think math proofs, graduate-level reasoning, and some coding tasks. It’s not really suited for generating code snippets from a vague prompt
This is a well-written post. I agree that “friction” involved with small changes and incompatibility with some Linux binaries are significant downsides. I think NixOS makes a lot of sense for development environments, but it’s not my preference for a personal device