Text to Speech Documentation

Find out about our different technical specifications for our Server and Embedded solutions.

rSpeak Server

rSpeak Server is a server based solution for using a supported text-to-speech engine to produce streaming audio or audio files (dependent on License). There are different ways to interact with rSpeak Server from your application. It offers a command line tool, REST API or TCP/IP communication protocol (Linux Only), pronunciation lexicon tool, and documentation.

Technical Specification

	Windows	Linux
Supported Operating System	Windows Vista, Server 2003 and later, Windows 7/8/10	CentOS 5, Fedora 5, RSHL5 (or higher).
Supported Architecture	Intel 64 bit (X86_64)	Intel 32 and 64 bit (x86/x86_64)
Run-time memory (RAM) recommendation	4 GB or more	4 GB or more
Voice footprints (High quality)	250-500 MB per voice	250-500 MB per voice
Supported Audio formats	PCM, Wav, Ogg, mp3*	PCM, Wav, Ogg, mp3*
Audio Quality (PCM/Wav) 16KHz	YES	YES
Sampling Depth (PCM) 16 bit	YES	YES
Ogg qualities supported: All; VBR, All; CBR	YES	YES
MP3 (*) quality All; CRB	YES	YS
Pronunciation lexicon	YES	YES
GUI Lexicon editor	YES (exe)	YES (requires Java)
Support UTF8 text and SSML as input **	YES	YES
Phonetic alphabet for lexicon transcriptions	IPA	IPA
DSP (speech rate, pitch and volume)	YES	YES
Command line interface	YES	YES
REST API	YES (***)	YES (***)
TCP/IP protocol ****	NO	YES
Configuration file support	YES	YES
Voice/Language Switch (via SSML)	YES	YES
Create marks file (to use for event notification – e.g. highlighting etc.)	YES, using SSML marks	YES, using SSML marks
Requires license file	YES	YES

*) mp3 format is supported but requires you to download and install the LAME package.
**) According to separate rSpeak TTS documentation
***) REST WEB API requires that the command line application is reachable via a CGI enabled web server application such as Apache.
****) requires that XINETD or similar is installed

rSpeak Embedded Text to Speech

rSpeak Embedded Text to Speech proposes a Software Development Kit (SDK) to be used for embedding TTS in third party software and hardware products. The SDK consists of speech engine libraries, voice specific files and documentation. The SDK is shipped multi-platform The use of the SDK is governed by a required separate License Agreement.

Technical Specification

Supported Operating System	Windows XP SP3, Vista, Server 2003, Server 2007, Windows 7/8/10, Linux CentOS 5, Fedora 5, RSHL5, Android 2.3 (Gingerbread) or later, Mac OS X, iOS
Supported Architecture	Windows 32/64bit (x86/x86_64) Linux 32/64bit (x86/x86_64) Android ARM (armv7/armv8), ARM and Intel 32- and 64bit
Supported languages and voices	American English (Sophie, Mark, Jeff), British English (Alice), Australian English (Jack), Spanish (Pilar) German (Max), French (Elise), Dutch (Ilse) and Swedish (Maja)
Run-time memory (RAM) recommendation	4 GB or more
Processor speed recommendation	1 GHz or faster
Voice footprints (depending on quality)	100-400 MB per voice
API	C/C++
API Wrappers	Java
Supported Audio formats	PCM Mono
Audio Quality (PCM/Wav) 8KHz, 11KHz, 16KHz, 22KHz, 44,1KHz	YES
Sampling Depth	16 bit
Support SSML as input	YES (*)
DSP (speech rate, pitch and volume)	YES
Voice/Language Switch (via SSML)	YES
Documentation	YES

(*) with limitations specified in separate rSpeak TTS documentation.

rSpeak

Listen to our voices