Bodo Raw Speech Corpus

Contributor: CIIL Mysore
Product Code: CIIL-BRX-RAW-Speech-120

Sample Download | size: 0B | type: zip

Added on : 29 Jul 2019

176:53:28 hours of 113 Gigabytes speech data | 456 Speakers | 77443 Audio segments | 48 kHz | 16 bit wav

Bodo, one of the scheduled language of India, is one of the Tonal languages of the world. There are two clearly distinguishable kinds of tones in Bodo which are known as Low and High. The language belongs to the Tibeto Burmese linguistic family. It is the language of Bodos, which are the major tribes of Indian State of Assam.

The LDC-IL speech data is collected from the regions of Chirang, Baksa Sonitpur Udalguri, Kamrup, Barpeta, Udalguri, Kokrajhar districts of Assam State of India which covers Bwrdwnari, Eastern, and Standard dialects. The data is collected from both the genders and different age group.

The LDC-IL Bodo Speech data set consists of different types of datasets that are made up of word lists, sentences running texts and date formats.

The available Speech Corpus details:

Total of 456 speakers (220 Female and 236 Male.)
Contemporary Text (News) - 411 Audio Segments - 53:47:56 Hours
Creative Text - 413 Audio Segments - 26:43:07 Hours
Sentence - 10257 Audio Segments - 9:38:58 Hours
Date - 938 Audio Segments - 1:16:54 Hours
Command and Control Words - 12348 Audio Segments - 14:19:32 Hours
Person Name - 8222 Audio Segments - 14:49:44 Hours
Place Name - 4115 Audio Segments - 05:17:14 Hours
Most Frequent Word-Part - 12397 Audio Segments - 14:34:05 Hours
Most Frequent Word-Full - 15999 Audio Segments - 20:07:33 Hours
Phonetically Balanced - 5960 Audio Segments - 7:50:00 Hours
Form and Function Word - 6383 Audio Segments - 8:28:25 Hours

Speech Data Attributes
Annotation	Raw Speech Corpus
Language	Bodo
Duration	176:53:28
Speaker Type	Native
File Size	113 GB
No. of Audio Segment	77443
Speaker Gender	Male and Female

Tags: Boro, Bodo, Raw Speech Corpus

Write a review