Your cart is empty!
0 reviews / Write a review
Dataset Description
57:17:08 Hours | 37 GB | 204 Speakers| 25,712 Audio Segments | 48 kHz | 16 bit wav.
Gujarati is one of the major literary languages of India and it is the official language of Gujarat state and union territories of Daman and Diu and Dadra and Nagar Haveli. For the convenience LDC-IL considered Gujarati with four dialects namely South Gujarat, Central Gujarat, North Gujarat and Saurashtra.
LDC-IL has 57:17:08 hours Gujarati raw speech data. The LDC-IL Gujarat Raw Speech data set consists of different types of datasets that are made up of word lists, sentences, texts and date formats. Approximately 15 minutes of speech (per speaker) has taken from 96 female and 108 male from Gujarati mother tongue speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.
The available Speech Corpus details:
Total Speakers 204 (96 Female and 108 Male)
Domains
Audio Segments
Each Domain Duration
Contemporary Text (News)
204
15:21:28
Creative Text
202
11:34:29
Sentence
5081
5:48:32
Date
404
0:41:39
Command and Control Words
6006
7:17:22
Person Name
4079
6:36:02
Place Name
2041
2:33:20
Most Frequent Word - Part
4236
5:18:47
Most Frequent Word – Full Set
2000
1:13:39
Phonetically Balanced
1378
0:51:50
A detailed explanation of the Gujarati Raw Speech Corpus will be available in the Gujarati Raw Speech Documentation.
For any research-based citations, please use the following citations:
Tags: Gujarati, Raw Speech Corpus, Speech Corpus