National Platform for Language Technology
  • Skip to Main Content
  • Announcement 1
  • Sign up
    • Register
    • Login
  • Save for later (0)
  • Feedback
    • Your cart is empty!

Highlights / Announcement

New Services Added on Portal
  • About
    • NLTM
    • NPLT
    • NLTM Advisors
    • NLTM Consortium
  • Resources
    • Text Corpus
    • Tools
    • Speech Corpus
    • WordNet
    • Treebank
    • PLS
    • Other Repositories
    • By Private Players
    Show All Resources
  • Services
    • Machine Translation
    • Speech Recognizer
    • Text to Speech
    • Transliteration
    • OCR
    • Govt. Services
    • Startups Services
    • Third Party Services
    Show All Services
  • Demonstration
  • Startups
    • Startup Wall
    • Mentor Wall
  • LeaderBoard
  • Dashboard
  • Marketplace
    • Data Marketplace
    • Translation Marketplace
Localization Logo
TDIL
Meity Startup
Startup Wall
Dashboard
C-DAC : Transliteration
  • Search

Search

Products meeting the search criteria

Product Compare (0)
A Gold Standard Urdu Raw Text Corpus

A Gold Standard Urdu Raw Text Corpus

Unicode Standard Urdu text corpus of  5161927  Words| 739 Titles | Data and Metadata in XML format | 5 Text domains.Urdu is one among the prominent language used in Indian sub-cont...

Contributor:  CIIL Mysore
Tags:   Urdu, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Telugu Raw Text Corpus

A Gold Standard Telugu Raw Text Corpus

Standard Telugu Text Corpus of 30,10,993 words|859 Titles|Data and Metadata in XML format | 6 Text Domains |Telugu Text Corpus encoded in a machine readable form and stored in a standard format. The m...

Contributor:  CIIL Mysore
Tags:   Telugu, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Tamil Raw Text Corpus

A Gold Standard Tamil Raw Text Corpus

Tamil is one of the longest-surviving Classical Languages in the world. It is a Dravidian Language Family.Tamil Text Corpus encoded in a machine readable form and stored in a standard format. The majo...

Contributor:  CIIL Mysore
Tags:  Tamil, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Punjabi Raw Text Corpus

A Gold Standard Punjabi Raw Text Corpus

Punjabi Text Corpus encoded in a machine readable form and stored in a standard format. The major encoding being used is Unicode and stored in XML format. The data is embedded with metadata infor...

Contributor:  CIIL Mysore
Tags:   Punjabi, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Odia Raw Text Corpus

A Gold Standard Odia Raw Text Corpus

LDC-IL Odia Raw Text Corpus developed according to various factors such as quality of the text, representativeness, retrievable format, size of corpus.Odia (formerly Oriya) is the official l...

Contributor:  CIIL Mysore
Tags:  Odia, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Nepali Raw Text Corpus

A Gold Standard Nepali Raw Text Corpus

Nepali is one of the 22 schedule languages of India. It is descendent of Sanskrit.Nepali Text Corpus encoded in a machine readable form and stored in a standard format. The major encoding being used i...

Contributor:  CIIL Mysore
Tags:  Nepali, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Manipuri Raw Text Corpus

A Gold Standard Manipuri Raw Text Corpus

Manipuri Text Corpus is encoded in a machine readable form and stored in a standard format. The major encoding being used is Unicode and stored in XML format. The data is embedded with metadata inform...

Contributor:  CIIL Mysore
Tags:  Manipuri, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Marathi Raw Text Corpus

A Gold Standard Marathi Raw Text Corpus

Marathi is an Indo-Aryan language. It is the official language of Maharashtra state of India. Marathi Text Corpus encoded in a machine readable form and stored in a standard format. The major enc...

Contributor:  CIIL Mysore
Tags:   Marathi, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Malayalam Raw Text Corpus

A Gold Standard Malayalam Raw Text Corpus

 63,70,954 words taken from 1,119 different titles.Malayalam is a highly agglutinative and morphologically rich language.The actual pattern of language use in natural texts reveals the evidence o...

Contributor:  CIIL Mysore
Tags:  Malayalam, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Maithili Raw Text Corpus

A Gold Standard Maithili Raw Text Corpus

Maithili Raw Text Corpus encoded in a machine readable form and stored in a standard format.Maithili is an Indio-Aryan language, a direct descendent of Sanskrit, which is spoken in the states of South...

Contributor:  CIIL Mysore
Tags:  Maithili, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Konkani Raw Text Corpus

A Gold Standard Konkani Raw Text Corpus

Konkani is the principal and administrative language of Goa. Konkani is an Indo-Aryan language belonging to the Indo-European family of languages and is spoken along the western coast of India.Konkani...

Contributor:  CIIL Mysore
Tags:  Konkani, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Kashmiri Raw Text Corpus

A Gold Standard Kashmiri Raw Text Corpus

Kashmiri language is one of the 22 scheduled languages of India and is a part of the Eighth Schedule in the constitution of Jammu and Kashmir.Kashmiri text has been typed in Unicode by using the In Sc...

Contributor:  CIIL Mysore
Tags:  Kashmiri, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Kannada Raw Text Corpus

A Gold Standard Kannada Raw Text Corpus

Kannada text Corpus of 77,63,124 words | 1772 Titles | Data and Metadata in XML format |  6 text domainsKannada is one of the Ancient Indian language which belongs to Dravidian family. It ha...

Contributor:  CIIL Mysore
Tags:  Kannada, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Hindi Raw Text Corpus

A Gold Standard Hindi Raw Text Corpus

Hindi is a Major, Indo-Aryan language, a descendent of Sanskrit, which is spoken in the central and northern India.Hindi Text Corpus encoded in a machine readable form and stored in a standard format....

Contributor:  CIIL Mysore
Tags:  Hindi, Raw Text Corpus
Redirect to external website
click here
A Gold Standard Gujarati Raw Text Corpus

A Gold Standard Gujarati Raw Text Corpus

Gujarati is a major, Indo-Aryan language and the administrative language of Gujarat, Union territories of Daman and Diu and Dadra and Nagar Haveli.Gujarati is a major, Indo-Aryan language and the admi...

Contributor:  CIIL Mysore
Tags:  Gujarati, Raw Text Corpus
Redirect to external website
click here
Information
  • About NPLT
  • Privacy Policy
  • Return Policy
  • Terms & Conditions
  • MeitY Linguistic Resource Sharing Policy
Customer Service
  • Contact Us
  • Website Survey
  • Feedback
  • FAQs
  • Site Map
Imp Links
  • National Portal of India
  • MeitY
  • TDIL Programme
  • TDIL-DC
  • Language Technology Players
My Account
  • My Account
  • Order History
  • Save for Later
  • Newsletter
National Portal link
MeitY Website link
Digital India Website link
TDIL logo
CDAC logo

Copyright @ All Rights Reserved
National Platform for Language Technology © 2026