Text to Speech Documentation
Find out about our different technical specifications for our Server and Embedded solutions.rSpeak Server
rSpeak Server is a server based solution for using a supported text-to-speech engine to produce streaming audio or audio files (dependent on License). There are different ways to interact with rSpeak Server from your application. It offers a command line tool, REST API or TCP/IP communication protocol (Linux Only), pronunciation lexicon tool, and documentation.
Technical Specification
Windows | Linux | |
Supported Operating System | Windows Vista, Server 2003 and later, Windows 7/8/10 | CentOS 5, Fedora 5, RSHL5 (or higher). |
Supported Architecture | Intel 64 bit (X86_64) | Intel 32 and 64 bit (x86/x86_64) |
Run-time memory (RAM) recommendation | 4 GB or more | 4 GB or more |
Voice footprints (High quality) | 250-500 MB per voice | 250-500 MB per voice |
Supported Audio formats | PCM, Wav, Ogg, mp3* | PCM, Wav, Ogg, mp3* |
Audio Quality (PCM/Wav) 16KHz | YES | YES |
Sampling Depth (PCM) 16 bit | YES | YES |
Ogg qualities supported: All; VBR, All; CBR | YES | YES |
MP3 (*) quality All; CRB | YES | YS |
Pronunciation lexicon | YES | YES |
GUI Lexicon editor | YES (exe) | YES (requires Java) |
Support UTF8 text and SSML as input ** | YES | YES |
Phonetic alphabet for lexicon transcriptions | IPA | IPA |
DSP (speech rate, pitch and volume) | YES | YES |
Command line interface | YES | YES |
REST API | YES (***) | YES (***) |
TCP/IP protocol **** | NO | YES |
Configuration file support | YES | YES |
Voice/Language Switch (via SSML) | YES | YES |
Create marks file (to use for event notification – e.g. highlighting etc.) | YES, using SSML marks | YES, using SSML marks |
Requires license file | YES | YES |
*) mp3 format is supported but requires you to download and install the LAME package.
**) According to separate rSpeak TTS documentation
***) REST WEB API requires that the command line application is reachable via a CGI enabled web server application such as Apache.
****) requires that XINETD or similar is installed
rSpeak Embedded Text to Speech
rSpeak Embedded Text to Speech proposes a Software Development Kit (SDK) to be used for embedding TTS in third party software and hardware products. The SDK consists of speech engine libraries, voice specific files and documentation. The SDK is shipped multi-platform The use of the SDK is governed by a required separate License Agreement.
Technical Specification
Supported Operating System |
Windows XP SP3, Vista, Server 2003, Server 2007, Windows 7/8/10, Linux CentOS 5, Fedora 5, RSHL5, Android 2.3 (Gingerbread) or later, Mac OS X, iOS |
Supported Architecture | Windows 32/64bit (x86/x86_64) Linux 32/64bit (x86/x86_64) Android ARM (armv7/armv8), ARM and Intel 32- and 64bit |
Supported languages and voices | American English (Sophie, Mark, Jeff), British English (Alice), Australian English (Jack), Spanish (Pilar) German (Max), French (Elise), Dutch (Ilse) and Swedish (Maja) |
Run-time memory (RAM) recommendation | 4 GB or more |
Processor speed recommendation | 1 GHz or faster |
Voice footprints (depending on quality) | 100-400 MB per voice |
API | C/C++ |
API Wrappers | Java |
Supported Audio formats | PCM Mono |
Audio Quality (PCM/Wav) 8KHz, 11KHz, 16KHz, 22KHz, 44,1KHz | YES |
Sampling Depth | 16 bit |
Support SSML as input | YES (*) |
DSP (speech rate, pitch and volume) | YES |
Voice/Language Switch (via SSML) | YES |
Documentation | YES |
(*) with limitations specified in separate rSpeak TTS documentation.
rSpeak
