Offline AI models on Phones & PC - April 2026

Intro

Today I want to talk about, Google Gemma 4; this sets a new standard for open source AI models. I am astonished by the quality of this, though it has a small model size compared to the other big ones out there.. like Kimi K2.5 from Moonshot AI, that thing is a 1 trillion parameter model.. 1 trillion! and Gemma 4’s biggest is only 31 billion.. but here is the thing, Kimi K2.5 only uses 32 billion of those parameters at any given time, so they are not that far apart in practice.. and Gemma 4 punches way above its weight for how small it actually is..

gemma4

With the Gemma models, the naming conventions highlight their architecture, specifically using A for “active” parameters and E for “edge” or “efficient” models. For instance, E2B and E4B designate lighter models optimized for edge devices. They also utilize a Mixture of Experts (MoE) architecture, like the 26B MoE. In an MoE model, only a subset of the total parameters are “active” (A) for any given token. Think of it like a restaurant with 26 chefs, but only 4 are actually cooking your order at any point while the rest stand by with their own specializations. This allows it to run as fast as a 4B model while retaining the extensive knowledge of a 26B one. On the other hand, there is the 31B Dense model. That one just loads everything—all 31 billion parameters fire for every single token. No routing, no tricks, just brute force. It uses much more memory, but its performance is more predictable.

There are two classifications of models, and in each of these classifications there are multiple models that are released. Two of the four.. are for desktop/laptop, and the others, they created for edge devices. What they mean by edge devices are mobile phones, Raspberry Pi, and other less powerful devices or mobile.

On PC

I’ve been a long time user of lm studio, it offers great features.. I like it because it offers ease of use, ability to manage different models all at once,

from the official documentation. I can run up to 14 GB of graphics memory worth of model. The 31B dense model needs around 20+ GB at full precision.. so that is out for me. The 26B MoE is interesting though.. even though it has 26B total params you only compute 4B at a time, so it is lighter on the processor, but you still have to load all 26B into VRAM which I don’t have.. at least until now. The trick people use is Q4 quantization, which basically compresses the model down significantly. The 31B at Q4 can fit around 16-17 GB.. still a bit above what I got but getting closer. Maybe next GPU.

On mobile

It runs surprisingly fast and remains fairly accurate. The output isn’t exactly the same as the desktop models, largely because of how mobile inference is optimized. Instead of performing full, heavy computations on every layer from scratch for every word, mobile execution uses techniques like quantization and KV caching. Essentially, the model reuses pre-calculated states (like a memory map) on each layer for the context it has already processed, which drastically speeds up generation on phones. Because of these aggressive optimizations, it’s hard to measure its accuracy strictly against the uncompressed desktop versions. Still, to give it credit, it handles an impressive 140 languages easily; heck, I even used my native language and it worked great when I tried it.

To run these models locally, I am using Google AI Edge Gallery. It is an open-source, experimental mobile application from Google that showcases running Large Language Models entirely on-device (Android and iOS). Because the app uses LiteRT (formerly TensorFlow Lite) and the LLM Inference API, the models execute offline directly on the phone’s hardware. This means complete data privacy with no cloud API costs. The app provides great features out of the box like LLM Chat, multimodal interactions to ask questions about images, a Prompt Lab for testing ideas locally.. and much more.

personal

Hey, how are you doing? I hope everything is going well. I lost my iPhone this month; it was hectic. I lost it when traveling back from the office to my Chennai home. Although I have had the phone for a while now, and it’s old.. it still stings. My first time losing a phone. Why would I lose an iPhone? It was a series of events that happened that day. A few days before.. I accidentally bent the new glasses I got. Then during that day, the glasses got so loose that I couldn’t wear them the entire day. The temples (new term.. weird to call that piece that.. anyway) were so loose, they would have come into two pieces if I had worn them that day. So before that, a day ago, my mom got into a little accident. All of these in mind, I got into a weird headspace and it led to me losing the phone. I felt something drop.. but when I looked for it, it was still there where I kept the phone. On top of it, the pouch that I kept the phone in was torn a bit and I hadn’t changed it.. which also contributed to this. I’m never keeping a phone in a pouch again! ..that’s what I told myself, but future me will catch myself doing the same in the future, I think. After that.. I lost my building entrance card. My office card I had, but I lost the Olympia Guindy building card, which is another pain.. but at the time of updating this, on April 15, it was found and I went and collected it.

It’s been a long time since I bought a phone. Before, I used to develop for Android and iOS; now I don’t anymore. I don’t think I need an iPhone again.. a better deal is to buy a latest Snapdragon chipset phone and call it a day. Going to switch back to bigger XL screen phones.

Lots of things to do, yet you choose to have little moments of fun? At the moment, my brain can’t remember what pending things are left.

A friend stayed over for a while at the start of April. I had to work on weekends because of horrible management in the office. I try to do a small piece of work every day though, but still.. yeah. He was going on a trip.. north and back and stayed for a while. The initial plan was that he would stay here for a while, but then we would both go to our hometown. He is a childhood friend of mine, but work got hectic; I couldn’t make it as discussed next week and he had to go alone.

Last year at this time, I was playing Ghost of Tsushima in my off-time. This year, I’m playing Slay the Spire 2 and Crimson Desert (released on March 19th, 2026) in my off-time.. dang nab it, I should give up this gaming.. I’m getting old.. but what if that changes me entirely? Risky..