Your cart is empty!
0 reviews / Write a review
Dataset Description
138:06:18 hours | 89 GB | 474 Speakers | 73,418 Audio segments | 48 kHz | 16 bit wav.Odia is an Indo-Aryan language; which is mainly spoken in the state of Odisha and also in some of the border states like West Bengal, Jharkhand, Chhatisgarh and Andhra Pradesh. It is designated with Classical Language Status by the Govt. of India. The LDC-IL Odia speech data is collected from the Central and Northern parts of Odisha from both the genders and different age groups. This data consists of different types of datasets that are made up of word lists, sentences include running texts and date formats.
The available Speech Corpus details:
Total Speakers 474 (239 Female and 235 Male)
Domains
Audio Segments
Each Domain
Duration
Contemporary Text (News)
449
42:49:56
Creative Text
450
19:43:50
Sentence
11,248
8:22:57
Date Format
900
1:27:49
Command and Control Words
13,499
14:18:49
Person Name
8,998
5:01:40
Place Name
4,496
13:22:45
Most Frequent Word - Part
8,994
9:40:04
Most Frequent Word - Full Set
10,989
10:21:04
Phonetically Balanced
10,438
10:05:10
Form and Function - Word
2,957
2:52:14
A detailed explanation of the Bengali Speech Corpus will be available in the Odia Raw Speech Data Documentation.
For any research-based citations, please use the following citations:
Tags: Odia, Raw Speech Corpus, Speech Corpus