felsiq

felsiq@piefed.zip · 3 days ago

Did you use a heavily quantized version? Those models are much smaller than the state of the art ones to begin with, and if you chop their weights from float16 to float2 or something it reduces their capabilities a lot more

felsiq@piefed.zip · 3 days ago

Yep, the OpenAI api and/or the ollama one work for this no problem in most projects. You just give it the address and port you want to connect to, and that port can be localhost, lan, another server on another network, whatever.