• Urdu Raw Speech Corpus
Urdu Raw Speech Corpus
  • Contributor: CIIL Mysore
  • Product Code: CIIL-URD-RAW-Speech-131
Sample Download | size: 1.6MB | type: zip
Added on : 29 Jul 2019

99:18:21 hours, 64.2 Gigabytes of speech data | 499 Speakers | 88,708 Audio Segments | 48 kHz | 16 bit wav


Urdu is one of the Modern Indo-Aryan languages of India. It evolved from Shaurseni Apabhramsha. It uses Persio-Arabic script. The language in a region is influenced by other languages of the region, mother tongue of the speaker, etc. The reading speed, loudness, frequency etc. also differ depending on certain factors like age, gender etc. Linguistic data consortium collected the speech corpus through fieldwork. This read data is collected from various age groups of male and female native speakers. This data includes Texts, Sentences, Date Formats, and different wordlists.


The available Speech Corpus details are as follows.

 

    •             Total Speakers - 499 (252 Female and 247 Male)
    •             News - 431 Audio Segments - 25:35:02 Hours
    •             Creative Text - 433 Audio Segments - 19:40:11 Hours
    •             Sentence - 10646 Audio Segments - 8:00:38 Hours
    •             Date - 846 Audio Segments - 0:43:37 Hours
    •             Command and Control Words - 13580 Audio Segments - 9:21:01 Hours
    •             Person Name - 6577 Audio Segments - 2:55:41 Hours
    •             Place Name - 4273 Audio Segments - 1:09:17 Hours
    •             Most Frequent Word (Part) - 12802 Audio Segments - 7:46:28 Hours
    •             Most Frequent Word (Full) - 18927 Audio Segments - 11:38:30 Hours
    •             Phonetically Balanced Vocabulary - 13646 Audio Segments - 8:13:20 Hours
    •             Form and Function Word - 6547 Audio Segments - 4:14:36 Hours
Speech Data Attributes
Language Urdu

Write a review

Please login or register to review

Tags: Urdu, Raw Speech Corpus

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.