• Kashmiri Raw Speech Corpus
Kashmiri Raw Speech Corpus
  • Contributor: CIIL Mysore
  • Product Code: CIIL-KAS-RAW-Speech-136
Sample Download | size: 1.6MB | type: zip
Added on : 26 Aug 2021

Dataset Description 

28:10:07 Hours | 18 GB speech data | 150 Speakers | 16,380 Audio segments | 48 kHz | 16 bit wav. 

Kashmiri Language belongs to Dardic group of Indo-Aryan family. It is known by names ‘Kashur’ and‘Kashmiri’. It is primarily spoken in Kashmir valley and Pir-Panchal range of Jammu region. Kashmiri language has two types of dialects i.e., regional dialects and social dialects. Apart from the Kashmiri spoken in valley itself there are other varieties of language that are spoken outside the valley and those varieties are considered as regional dialects of Kashmiri language. These regional dialects consist of Kishtawari, Poguli and Rambani. Kashmiri language has three social dialects as well which are known by the names Yamraz, Marak and Kamraz. 

The LDC-IL speech data is collected from Kashmiri Valley are from Pulwama, Srinagar, and Anantnag. This data is collected from both the genders at different age groups. The LDC-IL Kashmiri Speech data consists of different types of datasets that are made up of words, sentences, running texts and date formats.  Each speaker recorded these datasets which are randomly selected from a master dataset.

 

 The available Speech Corpus details:


Total Speakers 150 (78 Female and 72 Male)

 

Domains

Audio Segments

Each Domain

Duration

Contemporary Text (News)

147

3:56:57

Creative Text

148

12:41:33

Sentence

3704

2:40:24

Date Format

281

0:10:36

Command and Control Words

4288

3:04:32

Person Name

2065

1:53:21

Place Name

1468

1:04:37

Most Frequent Word - Part

4279

2:38:07

 

A detailed explanation of Kashmiri Speech Corpus will be available in the Kashmiri Speech Data Documentation. 

For any research-based citations, please use the following citations: 

  • Narayan Kumar Choudhary, Shahid Mushtaq Bhatt, Rajesha N., Manasa G., 2021. Kashmiri Raw Speech Corpus.  Central Institute of Indian Languages, Mysore.
  •  Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

 

Speech Data Attributes
Annotation Raw Speech Corpus
Language Kashmiri
Duration 28:10:07
Speaker Type Native
No. of Audio Segment 16380
Speaker Gender Male and Female

Write a review

Please login or register to review

Tags: Kashmiri, Raw Speech Corpus, Speech Corpus

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.