This paper describes an approach for predicting the pauses in the text utterance which has to be synthesized so as to increase the naturalness of the synthesized voice. We propose that the pause in an utterance depends on both the language syntax and also on the lexical structure of the sentence. Lexical based approach uses sub-word information such as syllable sequence and other related features to predict the pauses. On the other hand syntax based approach uses linguistic information such as part of speech information. Here we will describe some experiments to predict pauses in a sentence based on both lexical and syntactic information by using statistical methods like Conditional Random Fields (CRF) and Classification and Regression Trees (CART). All these experiments were done on the Telugu corpus. Pause prediction performance on this corpus is 83.872% using CART and 84.178 % using CRF. We also provide examples and observations on the improved quality of the text to speech system by using this pause prediction module.
For Full Paper : Click Here