The Evolution of Auditex

March 2026 · build

The only reason I ever opened a tech blog was because I needed an answer to something. It was almost always a Medium article. I would skim it, find the part I needed, and try to read it aloud to myself.

Medium eventually added a "listen to article" feature. I tried it once in 2023. It was alright, but I wasn't really the audience for it, so I moved on.

Then it was summer of 2025. I wanted to listen to an article about how Netflix scaled to handle massive user numbers. It was on Medium. Behind a paywall.

I started searching for alternatives. I quickly found that Medium's TTS was powered by Speechify, so I downloaded that. Another paywall. The few free voices they offered weren't great, and the free alternatives I tried were worse. This frustrated me. With everything companies like ElevenLabs, Google, and OpenAI are doing with voice, shouldn't there be something better available? And even if we're just talking about reading articles aloud, shouldn't there be some basic preprocessing? When I want to listen to a Medium article, I don't want to hear "Get unlimited access, sign up today" read out loud. It seemed like such an obvious thing that everyone was ignoring.

That's when I first wanted to build a truly free TTS.

Getting sidetracked, then getting serious

The idea was simple, but I had a more pressing problem: finding a job. I spent time building GitTrack, a tool for tracking job applications and keeping each other accountable. It wasn't until around December 2025 that I came back to the TTS problem.

I started by studying how existing products worked. The bad extensions just used Chrome's native TTS engine. Speechify did its processing on the cloud and streamed the audio back. I assumed that was the default approach for good-sounding TTS, so my goal became: build something that sounds at least as good as Speechify, with proper text preprocessing, and make it free.

I spent time researching the best open-source models for audio generation, comparing their tradeoffs, and settled on Kokoro. It sounded great while being small enough to be practical.

I built the backend over about a week. And then I stopped and asked myself: do people actually want this?

I went looking for discussion online. What I found was a lot of people who had simply accepted the situation: either you pay for a decent product, or you use the bad free ones. That was enough to keep going.

I finished the backend and it worked well. Before building the extension, I wanted to test quality, so I manually generated TTS for articles my friends wanted to listen to and sent them samples. The reviews were generally good. They were surprised. Clearly not as polished as voices from Gemini or Siri, but genuinely decent.

The wall

Then I hit my first real roadblock, and this is where I had a realization.

Speechify isn't charging you for no reason. There is no way to offer cloud-based TTS at that quality for free. My goal was to make something truly free, meaning I shouldn't be paying for it out of my own pocket either. Even though I had a server set up at home, the approach I'd originally planned wasn't sustainable.

This is when I stopped being frustrated with Speechify and understood something: only when you start solving a problem do you truly understand why the problem exists in the first place.

Making it work anyway

I was stuck for a while trying to figure out how to deliver good TTS without any server costs. Then I remembered ONNX.

I immediately searched for a Kokoro TTS ONNX model. The model existed, but there was a problem: the phonemizer bundled with the local model wasn't available in the ONNX deployment. I found an existing Node package for phonemization, but its output format didn't match what Kokoro expected. After testing, I found they were mostly the same except for a few phonemes, so I normalized the new generator's output to match Kokoro's expectations.

It worked. I could finally generate TTS audio that sounded good and was truly free. Everything running in the browser, no server needed.

But it wasn't the complete story. I had to make some sacrifices.

The extension generates all audio locally, which means the user can't wait for the entire article to finish generating before playback starts. That's not how TTS is supposed to feel. So I implemented chunking: generate each sentence as a separate chunk and maintain a buffer that the user can seek through.

This had a subtle effect on quality. Each sentence was processed independently, so they all started with the same neutral tone. The reading lost its flow. It wasn't a dramatic drop, but if you were listening for it, you could hear it.

Then came another problem. The audio buffer was built on Chrome's offscreen document, which Chrome silently cleans up after 60 seconds. I never encountered this until I was nearly done with the implementation. It required a significant architectural change: I had to move the entire buffer to IndexedDB storage and rework how chunks were loaded and played back.

What's next

Auditex is live on the Chrome Web Store. I'm going to make my friends use it, collect feedback on where it falls short, and keep improving it for anyone who needs a free TTS.

I think this fills a genuine need, even if I'm a bit late to it. I'm honestly not sure how many people still go to blogs to read long articles. But I know that when they do, something like this should exist, and it should be free.

Auditex on the Chrome Web Store