Will 2026 be the year of local AI use in coding, without subscription costs? And can we somehow use it to get a handle on environmental damage – or will it become even more inefficient and energy-intensive if everyone runs it on their own laptops instead of in well-cooled data centres?
These questions were brought to mind again today by this video from c’t 3003:
‘Local AI is now REALLY useful (and it runs on this hardware)’ (german)
In the last year, I often read “Local AI is fine for prompting/chatting, but not agentic usage & co”. This seemed to have changed a bit.
The models Qwen3 Coder 30B and Mistral Small 3.2 are mentioned, among others. The video suggests using the LM Studio tool instead of Ollama for downloading and running the models, but this is purely a matter of preference.
The local models can then be integrated into an IDE such as Visual Studio Code using plugins like Roo Code (Roo Code - Visual Studio Marketplace). I needed to increase the context size in my first test since it’s set to a low number by default.
I was also told about the Zed IDE (with OpenCode) and the Continue tooling (https://www.continue.dev/).
Does anyone happen to have any current, maybe scientifically based assessments of the sustainability of local AI?
nice to see you also like the c`t 3003 videos A fan myself, but I seem to have missed this particular one.
What kind of scientific study are you looking for regarding sustainability? Since you will be running them locally only I guess really only the embodied stuff of your hardware and the direct power consumption matters. Water usage / cooling should not really be present and development / Training time could be hard to come by …? What do you say?
Since you were successful running the model already: What setup have you used? The 19 GB VRAM the model needs seem really high for consumer hardware. Did you have a special card already at home? What did you buy? Or are you running the model on CPU?
But hadn’t more time to test different scenarios unfortunately.
Regarding studies:
My main question is if we could achieve something like “sustainable AI prompting/agentic coding” on our own laptops / own hardware - or if this is a dumb idea, which does not scale and using a company-internal AI model in a (green energy powered) data centre is much more efficient.
Because my heart bleeds when I do my daily coding with AI models (in the cloud) currently - without knowing if I use coal plant powered data centres.
Training of models is another topic of course, but “using it daily” would be a first I guess.
But I’m no expert at all in all of this and I haven’t digged into the AI talks of ecocompute (yet).
For me the main question, before even the sustainability discussion can be openend, is can we get to a comparable setup. macOS is pretty unique here with the shared VRAM in two ways:
You will likely not be able to create the same setup on a linux / windows box as no one can afford the GPU / AI inference cards with the NVRAM needed for these savy bigger models
Mac NVRAM is not dedicated and thus also alters the whole embodied carbon discussion. Your hardware is not “sitting idle“ as not being dedicated only for AI jobs
Having said that I believe the discussion can only be comparably done if you try to answer the question: Can a macOS user with a beefy macBook pro be more sustainably by ditiching ChatGPT usage instead of using a cloud based model.
And for that I believe we can generate ready numbers:
You can just turn on powermetrics to measure a job on your system.
Just type `sudo powermetrics` in your shell and it will give you:
CPU Power: XX mW
GPU Power: XX mW
ANE Power: XX mW
Combined Power (CPU + GPU + ANE): XXX mW
Count the number of tokens. Create mW per Token value.