This is Author wise Marathi Language text corpus for research purpose that includes research area as Data or Text Mining that includes but not limited to Author Identification, Author Profiling, Sentiment Analysis, Text Summarization etc on Marathi Language text. There are two datasets based on two categories one is comedy articles and another category is mixed articles i.e. composed of comedy, novels, Lalit lekhan etc. of well-known authors in Marathi.

Dataset–I is a collection of articles on category comedy by 5 different authors. A file for each author is prepared which contain all articles by that author i.e. dataset-I contain 5 files with the file name as author name. A number of words by each author is ranging from minimum 7006 and maximum 10,411 words.
Dataset–II is composed of articles of the mixed category. In total 10 different authors with minimum 26874 and maximum 33722 words. A file for each author is prepared which contain all articles by that author with the file name as the name of the author name. These files are encoded with UTF 8 encoding.

This resource is contributed by Mr. Sunil Digamberrao Kale & Dr. Rajesh S. Prasad

Disclaimer: The responsibility of performance of this resource/tool lies with the contributor.

Last updated on August 1, 2018


  More Details
  • Contributed by : Kale Sunil Digamberrao Dr. Rajesh S. Prasad
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable
