Over the past years, we’ve seen amazing breakthroughs in technology and impressive progress in voice synthesis. Synthetic speech has come an extremely long way since the early, robotic days. Today, the leading text-to-speech engines feature human-sounding tone and inflection, rising tonality at the end of sentences where appropriate and a whole spectrum of expressive “colour” that can make it quite challenging to distinguish the human from the machine. This is true both true for the ear and when trying to spot the difference between text-to-speech sound waves and human ones.
One of the core attributes to look for in TTS is Accuracy – the degree of closeness of the synthetic soundwave values to the “true value”, the human voice. When assessing the quality of TTS software, experts rate performance in terms of challenging performance, like, for example, how it handles acronyms, abbreviations, numbers, words in other languages, names and addresses, heteronyms. The strategies applied to make a speech engine accurate are an intricate combination of a complex voice creation process and the resources it has available to produce accurate speech output in tricky situations.
rSpeak Technologies applies a unique combination of text normalization, lexical lookup, grapheme to phoneme modelling, prosody modelling, manuscript creation, recording of voice talent and acoustic database creation to ensure that the quality of rSpeak Text to Speech is extremely accurate. This meticulous approach is appreciated by rSpeak customers and its success is also demonstrated by the fact that rSpeak’s American English female voice Sophie beat the leading speech engines in a recent industry benchmark conducted by ASR News (2016 Text-to-Speech Accuracy Testing report), achieving an outstanding overall accuracy rate of 98.6 on a test corpus of over fifteen hundred phrases.