• A Gold Standard Kannada Raw Text Corpus
A Gold Standard Kannada Raw Text Corpus
  • Contributor: CIIL Mysore
  • Product Code: CIIL-KAN-RAW-TEXT-106
Sample Download | size: 18.8KB | type: zip
Added on : 26 Jul 2019

Kannada text Corpus of 77,63,124 words | 1772 Titles | Data and Metadata in XML format |  6 text domains


Kannada is one of the Ancient Indian language which belongs to Dravidian family. It has its own script. Even though Kannada is considered as a classical language because of its ancient history in literature, the Kannada text corpus is extracted from contemporary text sources. To keep the corpus balanced, the Kannada text corpus is collected by keying-in and proofing text extracts from books of various domains or Crawled from News websites. The available corpus is in Unicode standard and the data with metadata is in XML format.

Text Corpus Attributes
Language Kannada
Parallel or Monolingual Monolingual
Annotation Raw Text Corpus
Word-Count 7763124
Encoding UTF-8

Write a review

Please login or register to review

Tags: Kannada, Raw Text Corpus

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.