• Malayalam Raw Speech Corpus
Malayalam Raw Speech Corpus
  • Contributor: CIIL Mysore
  • Product Code: CIIL-MAL-RAW-Speech-125
Sample Download | size: 1.8MB | type: zip
Added on : 29 Jul 2019

164 hours; 43670 segments; 458 speakers 

Malayalam is the official language of Kerala and Laccadive Islands. It belongs to the Dravidian language family.  According to the formation of Kerala and the language of Travancore, Cochin and Malabar regions are influenced by different internal and external factors so LDC-IL considered Malayalam has three specifically different varieties, thus collected speech data from Thiruvananthapuram, Ernakulam and Kozhikode.

LDC-IL has 164 hours Malayalam speech data. The LDC-IL Malayalam Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.

Approximately 15 minutes speech (per speaker) has taken from 231 female and 227 Male native speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.


Corpus details:

    • Total 458 speakers (231 Female and 227 Male.)
    • 43670 audio segments
    • 105 gigabytes of .wav  files and Metadata .txt Files
    • 164:01:02 hours of speech data
Speech Data Attributes
Annotation Raw Speech Corpus
Language Malayalam
Duration 164:01:02
Speaker Type Native
File Size 105 GB
No. of Audio Segment 43670
Speaker Gender Male and Female

Write a review

Please login or register to review

Tags: Malayalam, Raw Speech Corpus

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.