This is Author wise Marathi Language text corpus for research purpose that includes research area as Data or Text Mining that includes but not limited to Author Identification, Author Profiling, Sentiment Analysis, Text Summarization etc on Marathi Language text. There are two datasets based on two categories one is comedy articles and another category is mixed articles i.e. composed of comedy, novels, Lalit lekhan etc. of well-known authors in Marathi.
Dataset–I is a collection of articles on category comedy by 5 different authors. A file for each author is prepared which contain all articles by that author i.e. dataset-I contain 5 files with the file name as author name. A number of words by each author is ranging from minimum 7006 and maximum 10,411 words.
Dataset–II is composed of articles of the mixed category. In total 10 different authors with minimum 26874 and maximum 33722 words. A file for each author is prepared which contain all articles by that author with the file name as the name of the author name. These files are encoded with UTF 8 encoding.