Urdu Raw Speech Corpus

Contributor: CIIL Mysore
Product Code: CIIL-URD-RAW-Speech-131

Sample Download | size: 0B | type: zip

Added on : 29 Jul 2019

99:18:21 hours, 64.2 Gigabytes of speech data | 499 Speakers | 88,708 Audio Segments | 48 kHz | 16 bit wav

Urdu is one of the Modern Indo-Aryan languages of India. It evolved from Shaurseni Apabhramsha. It uses Persio-Arabic script. The language in a region is influenced by other languages of the region, mother tongue of the speaker, etc. The reading speed, loudness, frequency etc. also differ depending on certain factors like age, gender etc. Linguistic data consortium collected the speech corpus through fieldwork. This read data is collected from various age groups of male and female native speakers. This data includes Texts, Sentences, Date Formats, and different wordlists.

The available Speech Corpus details are as follows.

Total Speakers - 499 (252 Female and 247 Male)
News - 431 Audio Segments - 25:35:02 Hours
Creative Text - 433 Audio Segments - 19:40:11 Hours
Sentence - 10646 Audio Segments - 8:00:38 Hours
Date - 846 Audio Segments - 0:43:37 Hours
Command and Control Words - 13580 Audio Segments - 9:21:01 Hours
Person Name - 6577 Audio Segments - 2:55:41 Hours
Place Name - 4273 Audio Segments - 1:09:17 Hours
Most Frequent Word (Part) - 12802 Audio Segments - 7:46:28 Hours
Most Frequent Word (Full) - 18927 Audio Segments - 11:38:30 Hours
Phonetically Balanced Vocabulary - 13646 Audio Segments - 8:13:20 Hours
Form and Function Word - 6547 Audio Segments - 4:14:36 Hours

Speech Data Attributes
Language	Urdu

Tags: Urdu, Raw Speech Corpus

Write a review