cross-posted from: https://sh.itjust.works/post/61139432

I seriously can’t believe how much progress he’s made for the FOSS community. He actually might take a bite out of the big 3’s profits with this

  • Matt@lemmy.ml
    link
    fedilink
    English
    arrow-up
    6
    ·
    3 hours ago

    He’s done the main quest. Now he’s doing the side quests.

  • nublug@piefed.blahaj.zone
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    20
    ·
    3 hours ago

    fuck this nazi piece of shit and his sloppy ass slop and every one of you dipshits praising him and this garbage. “how much progress he’s made for FOSS” lol, lmao, lmfao, even.

    • Teppichbrand@feddit.org
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      18 minutes ago

      I get where you comment is coming from and I probably agree. Yet our planet isn’t populated with versions of you or me, sadly or luckily. There are billions who have never heard of free and open source software. If this guy manages to get like seven people to check it out, I’m happy with it. He’s not in it for the money anymore, he’s not trying to scam anyone. We don’t have to speak with the same voice, your truth is not universal. So just save your anger for something more important.

    • appauled@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      9
      ·
      7 hours ago

      I kinda loved his “you should self host to decentralize from big tech” and “run graphene and Linux to avoid data collection” content, but idk what the local ai stuff is any good for

      • Encrypt-Keeper@lemmy.world
        link
        fedilink
        English
        arrow-up
        10
        ·
        3 hours ago

        It’s good for the same things machine learning has always been good for. Language synthesis and analysis. Selfhosting something like Paperless for document management. It actually has a very rudimentary learning engine for document classification for a long time but feeding document content to a local AI model for organization tagging is very useful.

    • frongt@lemmy.zip
      link
      fedilink
      English
      arrow-up
      18
      ·
      12 hours ago

      And disclosed to the public before the project maintainer, too. This is shit from every angle.

  • onlinepersona@programming.dev
    link
    fedilink
    English
    arrow-up
    33
    arrow-down
    1
    ·
    14 hours ago

    How many GPUs do you even need to have a usable, self-hosted AI? It looks like he has 6 on his rig. Probably each costs 2k or something. That’s not peanuts. I have a 12GB VRAM card. It probably can’t generate anything in any meaningful amount of time. Which brings me to the question: who is this for?

    Regardless, impressive what he vibe-coded there.

    • Dultas@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      37 minutes ago

      I think in one video it looked like 16 cards. I think he did multiple bifurcations of the pcie lanes. I think he is / was using it for protein folding as well.

    • realitaetsverlust@piefed.zip
      link
      fedilink
      English
      arrow-up
      10
      ·
      10 hours ago

      I use an 6700 XTX and it’s working perfectly fine, depending on the model. Gemma4 takes a long time to generate answers, but the Qwen-Series is quick and starts generating answers in ~10 seconds.

      • onlinepersona@programming.dev
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 hours ago

        What’s the quality of the answers though? And how much context can it hold? I imagine it’s only good for small, short questions, but have no concept of what is needed for that.

        I’m assuming you’re using a 12b or 24b qwen model. The ones from deepseek go up to hundreds of billions of params and I can’t tell if bigger number is better or just meaningless posturing.

    • Rhaedas@fedia.io
      link
      fedilink
      arrow-up
      25
      ·
      13 hours ago

      16GB is plenty for even older model setups. Now they’ve got a few models designed so you load just parts of the model onto the GPU (Mixture of Experts) and use the CPU for less referenced sections, so you get both reasonable speed and a much more complex model.

    • cecilkorik@piefed.ca
      link
      fedilink
      English
      arrow-up
      19
      arrow-down
      1
      ·
      13 hours ago

      For chat usage (which is strictly a more efficient way to generate code on the LLM’s part, although you have to keep carefully guided and compartmentalized otherwise it typically requires a lot more testing and sometimes back-and-forth iteration on your part) 12GB is plenty to run many decent LLMs, you’ll typically want to use a Q4 quantization to make models with larger parameter fit into smaller memory, sometimes an IQ2 or IQ3 if you really want a particular model.

      For agentic usage (where the LLM is trained and optimized to use a harness like this to start requesting tool calls and getting their results and using the results of the tool calls to inform what it’s trying to do) it’s quite a bit more challenging to do on consumer hardware at a tolerable speed. The tools often generate large amounts of output which then take a long time to process, and the models and harnesses are both typically quite a bit stupider about using your limited resources efficiently. If you’re using to commercial “frontier” agentic models like Claude Code you’re going to have a bad time.

      That said, it is absolutely possible to do agentic AI on consumer hardware (just the GPU you have, not 6 of them), as long as you’re reasonably patient, using a harness properly tuned for efficiency. Out-of-the-box, many if not most are designed for remote API usage, even the “open source, local” ones realistically rely on free tier APIs and are inherently wasteful in terms of them not really caring how many tokens you burn in these remote datacenters and they’re expecting to just be able to iterate over and over again until they get it right. You don’t have that luxury when you’re getting slow tokens.

      Is PewDiePie’s any better or more efficient? I don’t know, I haven’t tried it yet. I prefer more minimal harnesses personally, OpenCode is about the most usable I’ve found personally, although I’m starting to experiment with Pi-mono (called Pi, but that’s unsearchable) which seems very promising, and I know quite a few people who have had good successful agent usage with Hermes Agent.

      I’m not going to pretend it’s going to be easy or that you’ll necessarily have very good results. I am pretty lukewarm on AI as a whole, but I am personally deeply invested in making sure I have fully local access to it in as much capacity as is currently technologically possible, as a personal digital sovereignty issue.

      As for hardware, I have a 12GB card myself and you don’t really need to fit everything into VRAM these days. I have an AMD X3D CPU which allows me to offload some of the model to system RAM with pretty decent performance, maybe it’s prohibitive on different architectures or configurations I don’t know but it’s worth a try. glm-4.7-flash:Q4_K_M from ollama is the model I’ve had the most consistent success with and with ollama running it with the context window set to 50,000 (context should also be set to be quantized to Q4_K_M), I end up with almost half of it offloaded to system RAM and it still runs quite fast thanks to the flash attention feature. I’ve worked with gemma4 quite a lot too and it’s definitely really fast but it’s also a bit unstable/weird at times, at least the heretic version hf.co/Stabhappy/gemma-4-26B-A4B-it-heretic-GGUF:Q4_K_M I’m running is. Still, if you really do need to fit everything into a smaller set of RAM you might try the gemma4 E4B models which clock in around 9GB when quantized. Qwen3.6 is I guess supposed to be really good too and should fit nicely on your 12GB card, but I haven’t had much opportunity to play with it yet. Qwen3 and 3.5 felt rather disappointing to me for agentic use but YMMV.

      You’re not completely going to outsource all software and all code you write to AI using a local model, the way companies are doing with those commercial models. But I consider that an advantage, not a flaw. I find it’s much more useful to have it help, suggest and advise, not to completely replace everything I’m doing. Yes, sometimes it’s slow and sometimes it’s wrong, but so are other people when I ask them sometimes. I’m prepared for it, and you should be too. Don’t get complacent.

      • onlinepersona@programming.dev
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 hours ago

        Thank you for that writeup.

        Do you know how important the parameter size is? 12b, 24b, 128b, etc. Does it really improve performance or is it like megapixels in a camera: more megapixels don’t necessarily mean a better picture?

        And what’s “quantisation”. Context compression or something?

        I’ve been considering buying a better card to test models (also want to be personally sovereign), but NVIDIA on linux gives me the jeebies and, last i checked, AMD hasn’t released anything with more than 20GB in a while. In fact, figuring out hardware requirements has been tough and I’m considering just riding this whole thing out. Maybe the bubble will collapse and bring prices down to something reasonable.

    • apftwb@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      11 hours ago

      I can tell you from personal experience, 8GB is not enough for a snappy experience. Maybe if you had it setup to churn through data overnight. My RTX 3060 Ti was not happy.

    • Korhaka@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      6
      ·
      13 hours ago

      Depends on what you want it to do and how well it should do it. Zero is potentially enough. A second hand card from half a decade ago can also do quite a lot.

    • new_world_odor@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      13 hours ago

      I have a rx5600xt (6gb), 32gb ram, ryzen 3600. System hasn’t been updated since i built it during covid. QwenV3-vl35B is the heftiest thing I can run, it gets around 2 tokens/sec, in LM studio. It’s easier than most people seem to think.

    • artyom@piefed.social
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      14 hours ago

      My buddy has an older 16GB card and I installed LM studio for fun. Its not quite as fast as some of the web-based ones, but perfectly usable.

  • Mordikan@kbin.earth
    link
    fedilink
    arrow-up
    8
    ·
    11 hours ago

    I remember getting into security about 15 years ago. This was around the time of Android really kicking off and it was crazy all the stupid things you could get away with (and that’s just a static permission set). Now you have AI pentesting tools and AI slopcode to use it on. Depending on perspective, this is either a great time to be working in security and/or an awful time to be working in security.

  • A Sharky Anthro@fedia.io
    link
    fedilink
    arrow-up
    32
    arrow-down
    16
    ·
    14 hours ago

    I was finally becoming interested in Felix when he started dabbling in Linux and made some serious cool shit…Now that he’s made his own slop tool, I am losing interest in him again. It’s so fucking cursed, this slop nonsense needs to stop. I wish he would’ve stuck with ricing, making fun projects instead of open source washing corporate garbage.

    • timestatic@feddit.org
      link
      fedilink
      English
      arrow-up
      43
      arrow-down
      2
      ·
      14 hours ago

      Those are his fun projects. He’s not doing them for you. I honestly think its a cool project even if its not something crazy or anything. And honestly self-hosted AI projects are imo a lot better than just using claude tokens or whatever

      • A Sharky Anthro@fedia.io
        link
        fedilink
        arrow-up
        6
        arrow-down
        4
        ·
        12 hours ago

        It’s clear that this is his own project, that he is doing this for himself…It’s just sad this is what Felix is choosing to share with the community of people that follow him on YouTube. It’s bad enough that corporate idiots are peddling slop tools, but having a content creator with history and sway in the YouTube scene do it. Also, it’s an LLM not an “AI”, as AI is actually out of human reach unless humans actually do some real multidisciplinary work to make it happen. Techbros being able to conflate LLMs with AI was the worst thing to happen to the world. If only they were forced to advertise their slop tools correctly, we’d be in a different situation. 😮‍💨

        • StarDreamer@lemmy.blahaj.zone
          link
          fedilink
          English
          arrow-up
          10
          ·
          11 hours ago

          Back in the BERT days I had a physics major friend that stuffed a bunch of Norwegian names in a file and trained a Norwegian name generator. He also made a Moby Dick sentence generator for funsies.

          PewDiePie’s project is nothing different than a personal pet project like these cases. Nothing about being a YouTuber makes you an expert at machine learning. It should be treated the same way as any other pet project.

          If the concern is someone with large amounts of influence causing disproportionate harm with their personal projects by name alone, at least in this particular case, I feel it’s appropriate to blame someone who trusts a YouTuber’s pet project in the first place.

  • Decronym@lemmy.decronym.xyzB
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    1
    ·
    edit-2
    10 minutes ago

    Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

    Fewer Letters More Letters
    CF CloudFlare
    DNS Domain Name Service/System
    HA Home Assistant automation software
    ~ High Availability
    HTTP Hypertext Transfer Protocol, the Web
    HTTPS HTTP over SSL
    IP Internet Protocol
    SSL Secure Sockets Layer, for transparent encryption
    VPN Virtual Private Network
    VPS Virtual Private Server (opposed to shared hosting)

    8 acronyms in this thread; the most compressed thread commented on today has 16 acronyms.

    [Thread #324 for this comm, first seen 1st Jun 2026, 16:50] [FAQ] [Full list] [Contact] [Source code]

  • somethingDotExe@lemmy.world
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    8
    ·
    14 hours ago

    I love that guy. Remember hating him back in the days when he got popular by sitting and yelling while playing games. But damn the guy matured and put out epic content the past 10 years or so.

  • MissesAutumnRains@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    14 hours ago

    I’m a tiny bit confused as to what this actually is. I don’t use the Codex/ClaudeCode/Cursor stuff, but it seems like this is just an interface for connecting those services, isn’t it? It doesn’t seem like that actually protects your data at all.

    Can anyone help explain it a bit?

    Edit: I realized I kinda glossed over all the stuff that seemed to be included in this, I more meant the start where he talked about this being privacy centric. Is he just trying to make self-hosting less painful?

    • appauled@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      7 hours ago

      Yeah essentially the other tools you had to use API keys, and none of them were FOSS, mostly paid only tools.

      This lets you self host both the application interface itself (which can also be an IDE) and use a self hosted LLM