Bilingualism, almost universal in India, routinely appears in communication in many forms. Code-switching with English is common among city dwellers with the matrix language typically being the speaker’s native tongue. While a number of English words have made their way into the lexicon of Indian languages, also prevalent is insertional code-switching, i.e. switching at sentence or clause level. We consider an interesting and widely encountered variety of code-switched speech in the form of public discourses by a popular motivational speaker who uses English, probably for effect, in her Hindi language speeches. We effectively observe three categories of segments in the discourse: Hindi, Hindi with embedded English words and English. In this work, we present the characteristics of our data, and investigate the discrimination potential of lexical and prosodic cues on manually segmented fragments. Lexical cues are obtained via Google Speech API for Indian English recognition. Prosodic cues computed from pitch, intensity and syllable duration estimates are found to demonstrate significant differences between Hindi and English segments, indicating more careful articulation of the embedded language.
This study analyses the source characteristics of voiced and voiceless nasals in Mizo, a Tibeto-Burman language spoken in North-East India. Mizo is one of the few languages that has voiced and voiceless nasals in its phoneme inventory. This analysis is motivated by the interaction between breathiness and nasality reported in a number of speech perception studies using synthetic stimuli. However, there are no studies examining this interaction in vowels after voiced and voiceless nasals. Existing research has also documented the interaction between breathy phonation and vowel height. The current study is an acoustic analysis of breathiness in high and low vowels following voiced and voiceless nasals in Mizo. The acoustic parameter measures are: H1H2 ratio, spectral balance (SB), strength of excitation (SoE), and waveform peak factor (WPF). The values obtained for all the four acoustic measures suggest that vowels following voiceless nasals exhibit stronger acoustic characteristics associated with breathy phonation than vowels following voiced nasals. In addition, the degree of acoustic breathiness is affected by vowel height.
Added on March 31, 2020
Contributed by : Consortium
Product Type : Research Paper
License Type : Freeware
System Requirement :
Author : Pamir Gogoi,Sishir Kalita,Parismita Gogoi,Ratree Wayland,Priyankoo Sarmah,S. R. M. Prasanna
In this paper, we consider breathy to tense voices, which are often considered to be opposite ends of a voice quality continuum. Along with these, other aspects of a speaker’s voice play an important role to convey the information to the listener such as mood, attitude and emotional state. The glottal pulse characteristics in different phonation types vary due to the tension of laryngeal muscles together with the respiratory effort. In the present study, we are deriving the features that can capture effects of excitation on the vocal tract system through a signal processing method, called as zero-time windowing (ZTW) method. The ZTW method gives the instantaneous spectrum which captures the changes in the speech production mechanism, providing higher spectral resolution. The cepstral coefficients derived from ZTW method are used for the classification of phonation types. Along with zero-time windowing cepstral coefficients (ZTWCCs), we use the excitation source features derived from zero frequency filtering (ZFF) method. The excitation features used are: strength of excitation, energy of excitation, loudness measure and ZFF signal energy. Classification experiments using ZTWCC and excitation features reveal a significant improvement in the detection of phonation type compared to the existing voice quality features and MFCC features.
Fricatives are produced by creating a turbulence in the air– flow by passing it through a stricture in the vocal tract cavity. Fricatives are characterized by their noise–like behavior, which makes it difficult to analyze. Difference in the place of articulation leads to different classes of fricatives. Identification of fricative segment boundaries in speech helps in improving the performance of several applications. The present study attempts towards the identification and classification of fricative segments in continuous speech, based on the statistical behavior of instantaneous spectral characteristics. The proposed method uses parameters such as the dominant resonance frequencies, the center of gravity along with the statistical moments of the spectrum obtained using the zero time windowing (ZTW) method. The ZTW spectra exhibits a high temporal resolution and therefore gives accurate segment boundaries in speech. The proposed algorithm is tested on the TIMIT dataset for English language. A high identification rate of 97.5% is achieved for segment boundaries of the sibilant fricative class. Voiced nonsibilants show a lower identification rate than their voiceless counterparts due to their vowel–like spectral characteristics. A high classification rate of 93.2% is achieved between sibilants and nonsibilants.
Mizo is an under-resourced tonal language that is mainly spoken in North-East India. It has 4 canonical tones along with a tone-sandhi. In Mizo language, a majority of the words contain tone information. As a result of that, it exhibits higher acoustic variability like other tonal languages in the world. In this work, we investigate the impact of tonal information on robust Mizo continuous speech recognition (CSR). First, separate baseline CSR systems are developed employing the Mel-frequency cepstral coefficient (MFCC) based acoustic features and salient acoustic modeling paradigms. For further improvement, the tonal information has been incorporated in each of the CSR systems. For this purpose, 3-dimensional tonal features are derived which include pitch, pitch-difference, and probability of voicing values. Our experimental study reveals that with the inclusion of tonal information, the robustness of Mizo CSR system gets enhanced across all acoustic modeling paradigms.