• Gujarati Raw Speech Corpus(Mono Recordings)
Gujarati Raw Speech Corpus(Mono Recordings)
  • Contributor: CIIL Mysore
  • Product Code: CIIL-GUJ-RAW-Speech-135
Sample Download | size: 380.7KB | type: zip
Added on : 26 Aug 2021

Dataset Description 

64:44:02 Hours | 7.1 GB | 233 Speakers| 26,223 Audio Segments | 16 kHz | 16 bit wav. 

Gujarati is one of the major literary languages of India and it is the official language of Gujarat state and union territories of Daman and Diu and Dadra and Nagar Haveli. For the convenience LDC-IL considered Gujarati with four dialects namely South Gujarat, Central Gujarat, North Gujarat and Saurashtra.

LDC-IL has 64:44:02 hours Gujarati raw speech data as Mono recording. The LDC-IL Gujarati Raw Speech data set consists of different types of datasets that are made up of word lists, sentences, texts and date formats. Approximately 15 minutes of speech (per speaker) has taken from 124 female and 109 male from Guajarati mother tongue speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.

The available Speech Corpus details: 



Total Speakers 233 (124 Female and 109 Male)



Domains

Audio Segments

Each Domain Duration

Contemporary Text (News)

233

12:52:46

Creative Text

232

13:30:15

Sentence

5824

7:12:17

Date Format

466

0:59:31

Command and Control Words

6985

9:43:07

Person Name

4644

8:34:44

Place Name

2322

3:17:06

Phonetically Balanced

4131

6:28:15

Form and Function - Word

1386

2:06:01




A detailed explanation of the Gujarati Raw Speech Corpus (Mono Recordings) will be available in the Gujarati Raw Speech (Mono Recordings) Documentation. 

For any research-based citations, please use the following citations: 

·         Ramamoorthy L., Narayan Kumar Choudhary, Mona Parakh, Rejitha KS, Rajesha N., Manasa, G.2021. Gujarati Raw Speech Corpus(Mono Recordings).  Central Institute of Indian Languages, Mysore.

·         Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

 

Speech Data Attributes
Annotation Raw Speech Corpus
Language Gujarati
Duration 64:44:02
Speaker Type Native
No. of Audio Segment 26223
Speaker Gender Male and Female

Write a review

Please login or register to review

Tags: Gujarati, Raw Speech Corpus, Mono Recordings, Speech Corpus

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.