Writing
Notes on shipping on-device AI
What's new in NativeLM v0.10.0: answering from the right document
v0.10 is a retrieval release — an optional EmbeddingGemma embedder tiered to your device, hybrid dense + lexical search, a flagship reranker, and a set of grounding fixes that stop the model answering from the wrong file. Still fully local, no account, no upload, no telemetry.
Jun 5, 2026What's new in NativeLM v0.9.0: charts in chat, an adaptive UI, and a real engine library
v0.9 teaches the on-device model to answer with charts, makes the UI adapt from phone to tablet, and pulls the whole AI core out of the app into a reusable Kotlin Multiplatform library — still fully local, no account, no upload, no telemetry.
Jun 4, 2026Your data, your key: local encrypted backup without a server
NativeLM keeps everything on your phone — which means losing the phone means losing the data. v0.7 fixes that with a passphrase-encrypted .nlmbak file you fully control: Argon2id → AES-256-GCM, no server, no account, no key we hold.
Jun 4, 2026Talk to your local LLM: on-device voice input with Whisper
NativeLM v0.8 lets you dictate your questions — transcribed entirely on-device with Whisper (whisper.cpp), no cloud. Here's why we picked Whisper over Android's built-in recognizer, and how the Whisper model became a first-class 'Audio' entry in the model catalog.
Jun 4, 2026The OCR library that phoned home: restoring NativeLM's zero-telemetry guarantee
Google's ML Kit gave NativeLM on-device OCR — and quietly bundled a datatransport pipeline that uploaded diagnostics to firebaselogging.googleapis.com on startup. Here's how we found it and stripped it out with a three-line manifest merge.
Jun 3, 2026AirDrop for your LLM: building cloudless peer-to-peer sync without Google Play Services
How we built local device-to-device sync for NativeLM using mDNS and TCP sockets, keeping your private AI data completely off the cloud—and why we explicitly avoided Google's Nearby Connections API.
Jun 3, 2026Ask in your language, about your English documents: on-device cross-lingual RAG
NativeLM v0.8 answers in Hindi, Tamil, Kannada and more — reading your English documents and replying in your language, with zero translation model. The whole feature is one prompt directive (plus one stubborn script bug).
Jun 3, 2026Turning your documents into artifacts, on-device: NativeLM Studio
NativeLM v0.6.0 adds Studio — generate briefings, FAQs, study guides, timelines, mind maps, and even spoken audio overviews from your own documents, entirely on the phone, via a map-reduce pipeline over on-device Gemma.
Jun 2, 2026What's new in NativeLM v0.5.0: open, highlight, zoom, OCR, better retrieval
v0.4 made on-device document chat work. v0.5 makes it usable — tap a citation to open the source at the exact page with the passage highlighted, pinch to zoom, chat with scans, and get sharper answers. Plus the bugs we fixed along the way.
Jun 2, 2026Chatting with scanned documents: on-device OCR (no cloud)
NativeLM v0.5.0 reads scanned PDFs and photos with on-device OCR, and blends keyword + vector search so exact terms actually get retrieved — all without an image ever leaving the phone.
Jun 1, 2026The low-end gauntlet: running a local LLM on budget Android phones
A local LLM that only runs on flagships isn't private AI for everyone — it's a toy for people with expensive phones. Here's how NativeLM tiers models across devices, why budget phones break in two different ways (RAM and the navigation bar), and what's still hard about the 4–6 GB tier.
Jun 1, 2026Why Android's ActivityManager lies about RAM — and how litertlm-kmp works around it
Xiaomi, Realme, and OPPO inflate reported RAM with swap-to-flash. Here's how we detect it and prevent OOM crashes when loading on-device LLMs.
Jun 1, 2026Shipping on-device RAG: Building NativeLM for Android
How we implemented fully offline document RAG using MediaPipe's USE-Lite and ObjectBox HNSW vector search to ground Gemma's chat answers in imported PDFs.
May 30, 2026Stateful KV-cache sessions for on-device Gemma on Android
How litertlm-kmp v0.3 makes multi-turn memory lossless and free — plus what an on-device CPU/GPU/NPU benchmark actually told me.
May 26, 2026Seeing on-device: multimodal image input for local Gemma
litertlm-kmp v0.2.4 added vision — attach an image and the local Gemma model reasons over it, on-device. Here's how image attachments flow through the engine, why we default to the CPU vision backend, and the model gotcha that bites you on init.
May 25, 2026Wrapping Google's LiteRT-LM into a Kotlin Multiplatform engine
The engine origin story: how litertlm-kmp turns Google's LiteRT-LM into a clean KMP library — four core abstractions, a resumable SHA-256 download manager, typed-Kotlin-to-OpenAPI function calling, and the thread discipline that keeps a non-thread-safe native runtime honest.