In April I shipped an Android app «Ne pishi golosovoe!» («Don’t send voice notes!») to RuStore — it records voice messages and transcribes them to text. Speak into the mic, get a transcript. All on the device: no cloud, no account, no internet. Voice notes often carry sensitive stuff, and pushing them to someone else’s servers is a poor default.
I built the first version in two days on Expo SDK 54 and React Native 0.81. It lived for a month. During that month I tuned the chunking, hunted a couple of memory leaks on long recordings, and wrestled with the JS bridge and timers. It became clear: piling more on top of RN meant constantly fighting the layer between the app and the microphone.
I rewrote everything in Kotlin. The main argument: direct access to AudioRecord and a foreground service with no RN bridge in between. The PCM byte stream now flows from the mic straight into sherpa-onnx, with no serialisation through JS. Fewer layers, fewer places where memory leaks. Testing also got easier — Robolectric spins up an Android environment without an emulator, so tests run on CI in a couple of minutes. Tooling too: Android Studio’s profiler shows where time and memory actually go far more honestly than Flipper ever did with RN.
The stack: Kotlin 2.2.21, Jetpack Compose with Material 3, Hilt for dependency injection, Room for the notes database, Coroutines and StateFlow for state. Recording goes through native AudioRecord at 16 kHz mono PCM16. Navigation runs on Navigation Compose. Analytics through AppMetrica 8.2.0, events sent anonymously: recording started, model downloaded, recognition errors. Targets Android 7 and up, mostly tested on a Xiaomi Poco M5S.
The core is the same: sherpa-onnx 1.13.2 and Sber’s GigaAM v3 e2e CTC (int8). The model weighs around 320 MB, downloads once, then runs offline. On Russian it’s roughly 2.5× more accurate than Whisper-large-v3. Long recordings get chopped into 22–25 second chunks and the transcripts are joined back together. On top of the engine I bolted on a VAD so the recogniser doesn’t waste cycles on silence. Old recordings from the RN version get pulled in on first launch — no manual export needed.
What got better compared to the RN version. The test count went from about six in Jest to 295 across JUnit, Robolectric, and Compose UI. Settings moved from AsyncStorage to Preferences DataStore: keys are now type-checked at compile time rather than picked by string at runtime. Light, Dark, and System theming runs on CompositionLocal and Material 3 in parallel, so my own palette and the system dialogs stay consistent. The foreground service is now explicit, with its own notification channel and a proper cancel path. Hour-long recordings no longer crash when the phone goes to sleep.
The UI stayed the same: a list of notes, a floating record button, a live waveform of 44 bars pulsing with loudness, swipe-left to delete. Swipes and pulse animations are now native Compose instead of Reanimated. Themes switch right in settings, including a System mode that listens to the OS theme and picks up changes without a restart. I left the negolosom.ru landing alone — a one-pager with buttons to RuStore and contacts, it was already doing its job.
Technologies: Kotlin 2.2.21, Jetpack Compose, Material 3, Hilt, Room, Kotlin Coroutines, sherpa-onnx 1.13.2, GigaAM v3 (NeMo CTC, INT8), AppMetrica 8.2.0
negolosom.ru