CIIL Mysore Repository

List of linguistic resources developed by Linguistic Data Consortium for Indian Languages (LDC-IL), CIIL Mysore. 

**Repository Last Crawled Date: 26/08/2021

Manipuri Raw Speech Corpus

Manipuri Raw Speech Corpus

156:28:32   hours of Manipuri Raw Speech Corpus | 100 GB | 620 Speakers | 66,231 Audio segments | 48 khz | 16 bit wavManipuri is the Adminis..

Sample Download | size: 1.4MB | type: zip

Added on : 29 Jul 2019

Malayalam Raw Speech Corpus

Malayalam Raw Speech Corpus

164 hours; 43670 segments; 458 speakers Malayalam is the official language of Kerala and Laccadive Islands. It belongs to the Dravidian language ..

Sample Download | size: 1.8MB | type: zip

Added on : 29 Jul 2019

Maithili Raw Speech Corpus

Maithili Raw Speech Corpus

LDC-IL Maithili Raw speech data of  72:02:12 (hh:mm:ss)  hours. The LDC-IL Maithili Speech data set consists of different typ..

Sample Download | size: 1.2MB | type: zip

Added on : 29 Jul 2019

Konkani Raw Speech Corpus

Konkani Raw Speech Corpus

156:37:51 hours of 100 Gigabytes speech data | 503 Speakers | 72,938 Audio segments | 48 kHz | 16 bit wavKonkani belonging to the Indo-European family..

Sample Download | size: 2.9MB | type: zip

Added on : 29 Jul 2019

Kannada Raw Speech Corpus

Kannada Raw Speech Corpus

179:32:52 hours of 115 Gigabytes speech data | 656 Speakers | 99109 Audio segments | 48 kHz | 16 bit wavKannada is one of the Ancient Indian languages..

Sample Download | size: 1.5MB | type: zip

Added on : 29 Jul 2019

Hindi Raw Speech Corpus

Hindi Raw Speech Corpus

Hindi is a Major, Indo-Aryan language, a descendant of Sanskrit, which is spoken in the central and northern India.LDC-IL Hindi speech data of 11..

Sample Download | size: 1.4MB | type: zip

Added on : 29 Jul 2019

Bodo Raw Speech Corpus

Bodo Raw Speech Corpus

176:53:28 hours of 113 Gigabytes speech data | 456 Speakers | 77443 Audio segments | 48 kHz | 16 bit wavBodo, one of the scheduled language of In..

Sample Download | size: 2MB | type: zip

Added on : 29 Jul 2019

Bengali Raw Speech Corpus

Bengali Raw Speech Corpus

Bengali is the official language of West Bengal and Tripura. It belongs to the Indo-Aryan language family.LDC-IL Bengali Speech Data set consists of d..

Sample Download | size: 1.9MB | type: zip

Added on : 29 Jul 2019

A Gold Standard Urdu Raw Text Corpus

A Gold Standard Urdu Raw Text Corpus

Unicode Standard Urdu text corpus of  5161927  Words| 739 Titles | Data and Metadata in XML format | 5 Text domains.Urdu is one am..

Sample Download | size: 15.5KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Telugu Raw Text Corpus

A Gold Standard Telugu Raw Text Corpus

Standard Telugu Text Corpus of 30,10,993 words|859 Titles|Data and Metadata in XML format | 6 Text Domains |Telugu Text Corpus encoded in a machine re..

Sample Download | size: 39.8KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Tamil Raw Text Corpus

A Gold Standard Tamil Raw Text Corpus

Tamil is one of the longest-surviving Classical Languages in the world. It is a Dravidian Language Family.Tamil Text Corpus encoded in a machine reada..

Sample Download | size: 33.1KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Punjabi Raw Text Corpus

A Gold Standard Punjabi Raw Text Corpus

Punjabi Text Corpus encoded in a machine readable form and stored in a standard format. The major encoding being used is Unicode and stored in XM..

Sample Download | size: 46.8KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Odia Raw Text Corpus

A Gold Standard Odia Raw Text Corpus

LDC-IL Odia Raw Text Corpus developed according to various factors such as quality of the text, representativeness, retrievable format, size of corpus..

Sample Download | size: 19.9KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Nepali Raw Text Corpus

A Gold Standard Nepali Raw Text Corpus

Nepali is one of the 22 schedule languages of India. It is descendent of Sanskrit.Nepali Text Corpus encoded in a machine readable form and stored in ..

Sample Download | size: 14KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Marathi Raw Text Corpus

A Gold Standard Marathi Raw Text Corpus

Marathi is an Indo-Aryan language. It is the official language of Maharashtra state of India. Marathi Text Corpus encoded in a machine readable f..

Sample Download | size: 59.9KB | type: zip

Added on : 26 Jul 2019

Showing 16 to 30 of 41 (3 Pages)
Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.