Intro

Today I want to talk about, Google Gemma 4; this sets a new standard for source AI models. I am astonished by the quality of this, though it has a small model size compared to the other big ones out there.. like Kimi K2.5 from Moonshot AI, that thing is a 1 trillion parameter model.. 1 trillion! and Gemma 4’s biggest is only 31 billion.. but here is the thing, Kimi K2.5 only uses 32 billion of those parameters at any given time, so they are not that far apart in practice.. and Gemma 4 punches way above its weight for how small it actually is..

gemma4

In the Gemma models, they always release with active and passive parameters being announced in the model name. They do this by mentioning a and e. There is also a dense model which loads the complete model. So.. the e stands for effective, like E2B or E4B.. that means it is a Mixture of Experts model and the number is how many parameters are actually active per token. Think of it like a restaurant with 26 chefs.. but only 4 of them are actually cooking your order at any point, the rest are just standing by with their own specialization, so it runs fast like a 4B model but it has the knowledge of a full 26B one.. pretty clever. The a is used similarly in some naming, same idea basically. And then there is the dense model.. that one just loads everything, all 31 billion parameters fire for every single token, no routing no tricks, just brute force.. uses more memory but more predictable

There are two classifications of models, and in each of these classifications there are multiple models that are released. Two of the four.. are for desktop/laptop, and the others, they created for edge devices. What they mean by edge devices are mobile phones, Raspberry Pi, and other less powerful devices or mobile.

On PC

I’ve been a long time user of lm studio, it offers great features.. I like it because it offers ease of use, ability to manage different models all at once,

from the official documentation. I can run up to 14 GB of graphics memory worth of model. The 31B dense model needs around 20+ GB at full precision.. so that is out for me. The 26B MoE is interesting though.. even though it has 26B total params you only compute 4B at a time, so it is lighter on the processor, but you still have to load all 26B into VRAM which I don’t have.. at least until now. The trick people use is Q4 quantization, which basically compresses the model down significantly. The 31B at Q4 can fit around 16-17 GB.. still a bit above what I got but getting closer. Maybe next GPU.

On mobile

Runs surprisingly fast and accurate to a degree. It’s not the same as the desktop models, because.. for the model, when run on each layer, the layers are precalculated. They are being called as.. instead of doing computation on each layer, they are reusing a map on each layer; that is what is making the mobile devices run faster. Because of this, it’s hard to measure accuracy on this. To give it credit, it does have the ability to recognize 140 languages easily; heck, I even used my native language and it worked great when I tried it.

To run these models locally, I am using Google AI Edge Gallery. It is an open-source, experimental mobile application from Google that showcases running Large Language Models entirely on-device (Android and iOS). Because the app uses LiteRT (formerly TensorFlow Lite) and the LLM Inference API, the models execute offline directly on the phone’s hardware. This means complete data privacy with no cloud API costs. The app provides great features out of the box like LLM Chat, multimodal interactions to ask questions about images, a Prompt Lab for testing ideas locally.. and much more.

personal

Hey, how are you doing? I hope everything is going well. I lost my iPhone this month; it was hectic. I lost it when traveling back from the office to my Chennai home. Although I have had the phone for a while now, and it’s old.. it still stings. My first time losing a phone. Why would I lose an iPhone? It was a series of events that happened that day. A few days before.. I accidentally bent the new glasses I got. Then during that day, the glasses got so loose that I couldn’t wear them the entire day. The temples (new term.. weird to call that piece that.. anyway) were so loose, they would have come into two pieces if I had worn them that day. So before that, a day ago, my mom got into a little accident. All of these in mind, I got into a weird headspace and it led to me losing the phone. I felt something drop.. but when I looked for it, it was still there where I kept the phone. On top of it, the pouch that I kept the phone in was torn a bit and I hadn’t changed it.. which also contributed to this. I’m never keeping a phone in a pouch again! ..that’s what I told myself, but future me will catch myself doing the same in the future, I think. After that.. I lost my building entrance card. My office card I had, but I lost the Olympia Guindy building card, which is another pain.. but at the time of updating this, on April 15, it was found and I went and collected it.

It’s been a long time since I bought a phone. Before, I used to develop for Android and iOS; now I don’t anymore. I don’t think I need an iPhone again.. a better deal is to buy a latest Snapdragon chipset phone and call it a day. Going to switch back to bigger XL screen phones.

Lots of things to do, yet you choose to have little moments of fun? At the moment, my brain can’t remember what pending things are left.

A friend stayed over for a while at the start of April. I had to work on weekends because of horrible management in the office. I try to do a small piece of work every day though, but still.. yeah. He was going on a trip.. north and back and stayed for a while. The initial plan was that he would stay here for a while, but then we would both go to our hometown. He is a childhood friend of mine, but work got hectic; I couldn’t make it as discussed next week and he had to go alone.

Last year at this time, I was playing Ghost of Tsushima in my off-time. This year, I’m playing Slay the Spire 2 and Crimson Desert (released on March 19th, 2026) in my off-time.. dang nab it, I should give up this gaming.. I’m getting old.. but what if that changes me entirely? Risky..