Your cart is empty!
List of linguistic resources developed by Linguistic Data Consortium for Indian Languages (LDC-IL), CIIL Mysore.
**Repository Last Crawled Date: 26/08/2021
Marathi is an Indo-Aryan language. It is the official language of Maharashtra state of India. Marathi Text Corpus encoded in a machine readable f..
63,70,954 words taken from 1,119 different titles.Malayalam is a highly agglutinative and morphologically rich language.The actual pattern of la..
Maithili Raw Text Corpus encoded in a machine readable form and stored in a standard format.Maithili is an Indio-Aryan language, a direct descendent o..
Konkani is the principal and administrative language of Goa. Konkani is an Indo-Aryan language belonging to the Indo-European family of languages and ..
Kashmiri language is one of the 22 scheduled languages of India and is a part of the Eighth Schedule in the constitution of Jammu and Kashmir.Kashmiri..
Kannada text Corpus of 77,63,124 words | 1772 Titles | Data and Metadata in XML format | 6 text domainsKannada is one of the Ancient Indian..
Hindi is a Major, Indo-Aryan language, a descendent of Sanskrit, which is spoken in the central and northern India.Hindi Text Corpus encoded in a mach..
Gujarati is a major, Indo-Aryan language and the administrative language of Gujarat, Union territories of Daman and Diu and Dadra and Nagar Haveli.Guj..
Dogri, is an Indo-Aryan Language spoken by about five million people in India and Pakistan, Particularly in the Jammu.Dogri Text Corpus encoded in a m..
Unicode Standard Bodo text Corpus of 29, 15,544 words | 80Titles |Data and Metadata in XML format | 5 text domainsBodo is a major tribal language..
Bengali is the official language of West Bengal and Tripura. It belongs to the Indo-Aryan language family.Bengali Text Corpus encoded in a machine r..