•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 696
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 251

Search Results | Total Results found :   135

You refine search by :    Text Corpora  
  Catalogue
This is Author wise Marathi Language text corpus for research purpose that includes research area as Data or Text Mining that includes but not limited to Author Identification, Author Profiling, Sentiment Analysis, Text Summarization etc on Marathi Language text. There are two datasets based on two categories one is comedy articles and another category is mixed articles i.e. composed of comedy, novels, Lalit lekhan etc. of well-known authors in Marathi.

Dataset–I is a collection of articles on category comedy by 5 different authors. A file for each author is prepared which contain all articles by that author i.e. dataset-I contain 5 files with the file name as author name. A number of words by each author is ranging from minimum 7006 and maximum 10,411 words.
Dataset–II is composed of articles of the mixed category. In total 10 different authors with minimum 26874 and maximum 33722 words. A file for each author is prepared which contain all articles by that author with the file name as the name of the author name. These files are encoded with UTF 8 encoding.

This resource is contributed by Mr. Sunil Digamberrao Kale & Dr. Rajesh S. Prasad

Disclaimer: The responsibility of performance of this resource/tool lies with the contributor.

Last updated on August 1, 2018

0
4

  More Details
  • Contributed by : Kale Sunil Digamberrao Dr. Rajesh S. Prasad
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable

The corpus was created in 2015 at Shrimad Rajchandra Institute of Management and Computer Application by its research group. The major purpose for creation of this corpus was lexical and morphological level data analysis of Gujarati text.
The corpus covers news articles from reputed newspapers published in Gujarati. Currently it contains 156, 210, 101 and 50 news articles in the domain of business, crime, politics and sports.

This resource is contributed by Dr. Jikitsha Sheth & Dr.Bankim Patel, SRIMCA Research

Disclaimer: The responsibility of performance of this resource/tool lies with the contributor.

Last updated on July 16, 2018

0
6

  More Details
  • Contributed by : Dr. Jikitsha Sheth & Dr.Bankim Patel, SRIMCA Resea
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable

Malayalam tree bank data is in Shakti Standard Format (SSF). SSF is a common representation for data. SSF allows information in a sentence to be represented in the form of one or more trees together with a set of attribute-value pairs with nodes of the trees. The attribute-value pairs allow features or properties to be specified with every node. Sentence
level SSF is used to store the analysis of a sentence. It occurs as part of text level SSF. The analysis of a sentence may mark any or all of the following kinds of information as appropriate: part of speech of the words in the sentence; morphological analysis of the words including properties such as root, gender, number, person, tense, aspect, modality; phrase-structure or dependency structure of the sentence; and properties of units such as chunks, phrases, local word groups, tags, etc. SSF is theory neutral and allows both phrase structure as well as dependency structure to be coded, and even mixed in well defined ways. The SSF representation for a sentence consists of a sequence of trees. Each tree is made up of one or more related nodes.
Total size of the Malayalam tree bank corpus is 9512 monolingual sentence ids ,6010 parallel sentence ids and approx. 251 verb frames. Following supporting documents are provided:
1. BIS Tag set
2. Chunk Guidelines
3. Dependency guidelines_Malayalam
4.Malayalam_VerbFrames
5.morph guidelines final
6.pos guidelines
7.SSF Format

Tags: TreeBank, Malayalam treebank, Malayalam Treebank Corpus, Tree Bank Data

Added on February 27, 2018

0
11

  More Details
  • Contributed by : IL dependency tree bank, IIIT Hyd
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable

Kannada tree bank data is in Shakti Standard Format (SSF). SSF is a common representation for data. SSF allows information in a sentence to be represented in the form of one or more trees together with a set of attribute-value pairs with nodes of the trees. The attribute-value pairs allow features or properties to be specified with every node. Sentence level SSF is used to store the analysis of a sentence. It occurs as part of text level SSF. The analysis of a sentence may mark any or all of the following kinds of information as appropriate: part of speech of the words in the sentence; morphological analysis of the words including properties such as root, gender, number, person, tense, aspect, modality; phrase-structure or dependency structure of the sentence; and properties of units such as chunks, phrases, local word groups, tags, etc. SSF is theory neutral and allows both phrase structure as well as dependency structure to be coded, and even mixed in well defined ways. The SSF representation for a sentence consists of a sequence of trees. Each tree is made up of one or more related nodes. Total size of the Kannada tree bank corpus is 19550 sentence ids and approx. 215 verb frames. Following supporting documents are provided:
1. Dependency_Guidelines_Kannada.pdf
2. morph_kannada.pdf
3. pos_chunk_guidelines_kannada.pdf
4. SSF format of tree bank.pdf

Tags:TreeBank, Kannada treebank, Treebank Corpus, Tree bank Data

Added on February 27, 2018

1
8

  More Details
  • Contributed by : IL Dependency Treebank, IIIT Hyd
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable

Marathi treebank data is in Shakti Standard Format (SSF). SSF is a common representation for data. SSF allows information in a sentence to be represented in the form of one or more trees together with a set of attribute-value pairs with nodes of the trees. The attribute-value pairs allow features or properties to be specified with every node. Sentence level SSF is used to store the analysis of a sentence. It occurs as part of text level SSF. The analysis of a sentence may mark any or all of the following kinds of information as appropriate: part of speech of the words in the sentence; morphological analysis of the words including properties such as root, gender, number, person, tense, aspect, modality; phrase-structure or dependency structure of the sentence; and properties of units such as chunks, phrases, local word groups, tags, etc. SSF is theory neutral and allows both phrase structure as well as dependency structure to be coded, and even mixed in well defined ways. The SSF representation for a sentence consists of a sequence of trees. Each tree is made up of one or more related nodes. Total size of the Marathi tree bank corpus is 10852
monolingual sentence ids ,3450 parallel sentence ids and approx. 380 verb frames.
Following supporting documents are provided:
1. Marathi POS Tag set
2. Marathi_Morph Guidelines
3. Guidelines for Marathi Verb frames
4.Dependency -Marathi

Tags: TreeBank, Marathi treebank, Treebank Corpus, Tree bank Data

Added on February 27, 2018

0
4

  More Details
  • Contributed by : IL dependency tree bank, IIIT Hyderabad
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable