For synthesizing high quality speech, a enative Text- To-Speech system requires a large number of well annotated segments at the phone level. Manual segmentation, though reliable, is tedious, time consuming and can be inconsistent. This correspondence presents an automated phone segmentation algorithm that force aligns the phonetic transcriptions with the utterances of the corresponding Indian language sentences. The algorithm uses the distance function obtained from the output of the recently proposed Bach scale filter bank and the statistical knowledge of the lengths of the phones to force align the boundaries between successive stop consonants. Preliminary results for Hindi database shows that 85.2% of the boundaries detected by the algorithm are well within 20 ms of the manually segmented boundaries. The misclassified frames (20 ms) per sentence or the Frame Error Rate is 20.4%.
Added on April 4, 2014
Product Type : Research Paper
License Type : Freeware
System Requirement :
Author : Ranjani H G,Ananthakrishnan G,A G Ramakrishnan