About
Come on down to Harmontown; turn that frown upside down.
There’s something special about this podcast. Like many Harmontown fans, I keep coming back to it. The mix of comedy, philosophy, roleplaying, and improv among an ensemble of complementary friends with the occasional celebrity (or niche Hollywood person) just hasn’t gotten old.
Back when the podcast ended in 2019, I wanted to transcribe all the episodes and make them searchable for us fans to easily find and revisit favorite moments. Problem was, there’s a lot of Harmontown, and high-quality transcription services were pricey.
That changed in 2022 when OpenAI, the company behind ChatGPT, released Whisper. It’s open-source software you can run on your own computer that, in their words, “approaches human level robustness and accuracy on English speech recognition.” It’s not perfect, but it’s pretty darn good.
With the cost barrier overcome and a desire to create a new side project that I was passionate about, I built this site in the last few months of 2022. I hope you enjoy it.
How it works (the nerd stuff)
Episodes have been transcribed using OpenAI’s Whisper small.en
model between July–November 2023. Minor find-and-replace corrections are made to the transcripts for common errors, such as “Harmontown” being transcribed as “Herman Town.”
The transcripts are uploaded to Amazon S3 in a structured data format (TSV and indexed for search using Typesense, which is running on a Google Compute Engine server. Typesense provides performant dialog matches via API when you search across all episodes.
The frontend is React, running on a static-export Next.js site. When you visit the site, a list of all episode titles and other metadata is downloaded and indexed in browser using Fuse.js, which returns near-instant episode search results for queries that match episode titles, descriptions, and numbers. Fuse is also used for searching individual transcripts while playing an episode.
Audio episodes are streamed from Cloudflare R2 because podcast providers use dynamic ad insertion which unpredictably shifts the syncronization between the audio and the corresponding transcript. Video episodes are streamed from YouTube, where a fan has generously uploaded them.
The code for this website plus the transcription and indexing process is open source and available on GitHub.
Special thanks
Thank you to @JonesyCat for uploading all of the Harmontown video episodes to YouTube so that they live on after the closure of harmontown.com. Thanks also to Zach Manson for giving me the heads up and recommendation to embed the Harmontown videos from this channel.
Server costs
Here’s everything I’m paying to run this:
- GCP hosting for Typesense search API (fast searches across the 6M+ words spoken during Harmontown): ~$2/month
- Static file hosting for website (the web UI you’re reading this on): ~$1/month
- Static file hosting for audio podcasts: ~$2/month
- Domain name: $1/month
Total cost: ~$72/year
I’m paying this out of pocket. If you get value and enjoyment from this site, consider throwing me a few bucks if you can spare it. I’d love to keep this archive online for as long as possible. Thank you.