Earlier this year, I decided to force myself to read more. Not a New Year’s resolution, because those never last. The reason is that growing up as a child and young teenager, reading often felt like punishment. My mum required my siblings and me to read a certain number of pages from a designated book every day throughout elementary school. Missing a day meant mandatory punishment. In boarding secondary school, this eventually led to a stubborn, subconscious resistance to non-essential reading. Over the six years I spent there, I probably read only five to ten non-academic fiction books (though Artemis Fowl was a delight). So it is not hard to see where my indifference to reading came from.

During the COVID-19 pandemic, however, I fell deep into podcasts. As an avid sports fan and TV show buff, I listened to everything: sports recaps, tech podcasts, expert interviews, the works (meeting Walter White at Anfield would be the ultimate dream). So when I decided to read more this year, audiobooks felt like the natural bridge. I already had a few EPUBs in the Apple Books app on my reading list and wondered: Can I listen to these EPUBs using Apple Dictation’s two-finger swipe-down feature? Unfortunately, it only works for the current page. It is quite janky, not very user-friendly, and frankly does not work well for my use case.

Recently, I worked on evaluating how a Facebook state-of-the-art (SOTA) automatic speech recognition (ASR) model handles Igbo tones, trying to see whether it actually “listens” properly. So I have been dabbling with audio quite a bit this year. You could say I have been thinking about listening a lot. In the past, I also experimented with WaveNet (a generative model for raw audio) and its fundamental building block, the dilated causal convolution.

With these experiences in mind, I wondered: Can I build an iPhone Shortcut that lets me listen to EPUBs properly? That question eventually led to Inkcast. The goal was not to build a Speechify competitor. I simply wanted to solve a personal problem. My aim was to create a low-effort, frictionless tool for personal use, so I made a GitHub repository and started building. Within a few days, I had a working website that could take EPUBs and PDFs and let users listen to the content organized by chapters in a sidebar with one-tap navigation. It included basic controls such as play/pause, rewind (15 seconds), forward (30 seconds), playback speed control (0.75× to 2×), and voice selection.

It worked well on desktop, so I used the URL to create a Shortcut on my iPhone. On mobile it also worked, but there was one problem: the reader voices sounded robotic and monotone, which is not ideal for long-form listening. The irony was that only weeks earlier I had been evaluating how machines handle speech. So I went back to the drawing board to figure out how to get more natural-sounding reader voices on Inkcast.

While researching audiobook-quality text-to-speech, I came across several APIs (OpenAI, ElevenLabs, Google Cloud). None were free, and for a personal project I wanted something that required no subscriptions or API keys. Most resources suggested that human-quality narration requires a dedicated TTS service. Eventually I discovered that the Web Speech API can access premium voices already installed on the device. These voices are free, require no API keys, and remain available offline. They are not state of the art, but they are surprisingly good. Many people do not realize that higher-quality Siri voices can be downloaded. The voice quality improved, but the project also started evolving in another direction.

I have always wanted to work through Paul Graham’s essays properly. There are 229 of them, and they read almost like long-form podcasts. But they live on webpages, which raised another question: Why limit the input to EPUBs and PDFs? So I added URL support. I pasted Paul Graham’s archive page into Inkcast, and it automatically pulled all 200+ essays into the sidebar. That was the moment I realized the idea actually worked.

The entire project lives in a single HTML file. There are no accounts, no installations, and files never leave the user’s device. Because the app has no server dependencies, it ended up functioning as a privacy-preserving tool by default. In a way, I started the year studying whether machines listen well. Along the way, I realized that humans do not have many good free tools for listening either. Speechify costs $139 per year and Audible requires a subscription, so I built something that worked for me.

If you find it useful, pls try it, star it, or buy me a coffee if it saves you a Speechify subscription.