Larynx - A viable Linux TTS

I spent some time over the weekend experimenting with voice2json and rhasspy, trying to set up a fully offline voice assistant system using Mozilla DeepSpeech for speech recognition, a template file containing all known phrases and their mappings to intents, an intent recognizer and a local shell script to parse the recognized intent and invoke commands (think opening a website or folder when an intent is recognized). Rhasspy was easy to use and really fun. It’s amazing how far we’ve come in terms of open-source tools in the TTS/speech recognition space.

Along the way, I discovered Larynx, a TTS system for Linux with high-quality voices from Glow-TTS and others, with intonations that sound human. I’ve kept an eye on the Linux TTS space for years and have been disappointed by the limited consumer-use options. Often, the pre-trained TTS voices sound all too robotic for everyday use. I suppose that’s understandable given the dearth of open-source voice datasets (which is why projects like Mozilla CommonVoice are so exciting!). It’s nice to have a pleasant pre-trained TTS model natively available on Linux.

My use case is to copy text in a browser/Thunderbird RSS article, hit a shortcut and have the TTS system read the selected text aloud so I can look away from the screen and just listen.

Setup

I followed the Debian installation instructions, and downloaded and installed the tts, lang-en_us and Harvard Glow TTS files.

# cd Downloads # (or /path/to/downloaded/deb/files)
sudo apt install ./larynx*.deb

TTS Shortcut

To create a shortcut that invokes Larynx on selected text, I added aliases in my ~/.bash_aliases file. They use xclip to access clipboard and selection data. On Debian-based systems, you should be able to install it with sudo apt install xclip.

# Speak text passed as argument
# Usage: speak "This is a test"
alias speak="larynx --voice harvard-glow_tts --interactive"

# Speak clipboard text
# Usage: speak-clipboard
alias speak-clipboard="xclip -out -selection clipboard | speak"

# Speak currently selected text
# Usage: speak-selection
alias speak-selection="xclip -out -selection primary | speak"

Under Settings –> Keyboard on GNOME, I added a custom keybinding for Super+S to invoke bash -i -c "speak-selection". This lets me select any text and hit Super+S to invoke larynx

References

Categories: tech
Other posts
  • Jodi Sudoku
    Open-source, privacy-friendly multiplayer WebRTC Sudoku game
  • Goodbye Kisumu
    Leaving Kisumu, getting back to nomad life and looking for farms