Text to Speech Documentation

Find out about our different technical specifications for our Server and Embedded solutions.

rSpeak Server

rSpeak Server is a server based solution for using a supported text-to-speech engine to produce streaming audio or audio files (dependent on License). There are different ways to interact with rSpeak Server from your application. It offers a command line tool, REST API or TCP/IP communication protocol (Linux Only), pronunciation lexicon tool, and documentation.

Technical Specification

Windows Linux
Supported Operating System Windows Vista, Server 2003 and later, Windows 7/8/10 CentOS 5, Fedora 5, RSHL5 (or higher).
Supported Architecture Intel 64 bit (X86_64) Intel 32 and 64 bit (x86/x86_64)
Run-time memory (RAM) recommendation 4 GB or more 4 GB or more
Voice footprints (High quality) 250-500 MB per voice 250-500 MB per voice
Supported Audio formats PCM, Wav, Ogg, mp3* PCM, Wav, Ogg, mp3*
Audio Quality (PCM/Wav) 16KHz YES YES
Sampling Depth (PCM) 16 bit YES YES
Ogg qualities supported: All; VBR, All; CBR YES YES
MP3 (*) quality All; CRB YES YS
Pronunciation lexicon YES YES
GUI Lexicon editor YES (exe) YES (requires Java)
Support UTF8 text and SSML as input ** YES YES
Phonetic alphabet for lexicon transcriptions IPA IPA
DSP (speech rate, pitch and volume) YES YES
Command line interface YES YES
REST API YES (***) YES (***)
TCP/IP protocol **** NO YES
Configuration file support YES YES
Voice/Language Switch (via SSML) YES YES
Create marks file (to use for event notification – e.g. highlighting etc.) YES, using SSML marks YES, using SSML marks
Requires license file YES YES

*) mp3 format is supported but requires you to download and install the LAME package.
**) According to separate rSpeak TTS documentation
***) REST WEB API requires that the command line application is reachable via a CGI enabled web server application such as Apache.
****) requires that XINETD or similar is installed

rSpeak Embedded Text to Speech

rSpeak Embedded Text to Speech proposes a Software Development Kit (SDK) to be used for embedding TTS in third party software and hardware products. The SDK consists of speech engine libraries, voice specific files and documentation. The SDK is shipped multi-platform The use of the SDK is governed by a required separate License Agreement.

Technical Specification

Supported Operating System

Windows XP SP3, Vista, Server 2003, Server 2007, Windows 7/8/10, Linux CentOS 5, Fedora 5, RSHL5, Android 2.3 (Gingerbread) or later, Mac OS X, iOS

Supported Architecture Windows 32/64bit (x86/x86_64) Linux 32/64bit (x86/x86_64) Android ARM (armv7/armv8), ARM and Intel 32- and 64bit
Supported languages and voices American English (Sophie, Mark, Jeff), British English (Alice), Australian English (Jack), Spanish (Pilar) German (Max), French (Elise), Dutch (Ilse) and Swedish (Maja)
Run-time memory (RAM) recommendation 4 GB or more
Processor speed recommendation 1 GHz or faster
Voice footprints (depending on quality) 100-400 MB per voice
API Wrappers Java
Supported Audio formats PCM Mono
Audio Quality (PCM/Wav) 8KHz, 11KHz, 16KHz, 22KHz, 44,1KHz YES
Sampling Depth 16 bit
Support SSML as input YES (*)
DSP (speech rate, pitch and volume) YES
Voice/Language Switch (via SSML) YES
Documentation YES

(*) with limitations specified in separate rSpeak TTS documentation.