•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 688
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 31
  •    NLP Tools 105
  •    Linguistic Resources 249

Search Results | Total Results found :   1164

You refine search by : All Results
  Catalogue
Extraction and recognition of Bangla text from video frame images is challenging due to fonts type and style variation, complex color background, low-resolution, low contrast etc. In this paper, we propose an algorithm for extraction and recognition of Bangla and Devanagari text form video frames with complex background. Here, a two-step approach has been proposed. After text localization, the text line is segmented into words using information based on line contours. First order gradient values of the text blocks are used to find the word gap. Next, an Adaptive SIS binarization technique is applied on each word. Next this binarized text block is sent to a state of the art OCR for recognition.

Added on March 14, 2018

3

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Purnendu Banerjee, B. B. Chaudhuri

Skew correction of a scanned document page is an important preprocessing step in document image analysis. We propose here a fast and robust skew estimation algorithm based on rank analysis in Farey sequence. Our target document class comprises two major Indian scripts with headlines, namely Devnagari and Bangla. At the beginning, straight edge segments from the edge map of the document page are detected by our algorithm using properties of digital straightness. Straight edges derived in this manner are binned by Farey ranks in correspondence with their slopes. The principal bin, identified from these bins using the strength of accumulated edge points, represents the principal direction along the direction of headlines, from which the gross skew angle is estimated. A fast refinement algorithm is then applied with a finer tuning of Farey ranks, to detect the skew up to the desired level of precision.

Added on March 14, 2018

2

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Sanjoy Pratihar,Partha Bhowmick,Shamik Sural,Jayanta Mukhopadhyay

Malayalam tree bank data is in Shakti Standard Format (SSF). SSF is a common representation for data. SSF allows information in a sentence to be represented in the form of one or more trees together with a set of attribute-value pairs with nodes of the trees. The attribute-value pairs allow features or properties to be specified with every node. Sentence
level SSF is used to store the analysis of a sentence. It occurs as part of text level SSF. The analysis of a sentence may mark any or all of the following kinds of information as appropriate: part of speech of the words in the sentence; morphological analysis of the words including properties such as root, gender, number, person, tense, aspect, modality; phrase-structure or dependency structure of the sentence; and properties of units such as chunks, phrases, local word groups, tags, etc. SSF is theory neutral and allows both phrase structure as well as dependency structure to be coded, and even mixed in well defined ways. The SSF representation for a sentence consists of a sequence of trees. Each tree is made up of one or more related nodes.
Total size of the Malayalam tree bank corpus is 9512 monolingual sentence ids ,6010 parallel sentence ids and approx. 251 verb frames. Following supporting documents are provided:
1. BIS Tag set
2. Chunk Guidelines
3. Dependency guidelines_Malayalam
4.Malayalam_VerbFrames
5.morph guidelines final
6.pos guidelines
7.SSF Format

Tags: TreeBank, Malayalam treebank, Malayalam Treebank Corpus, Tree Bank Data

Added on February 27, 2018

0
10

  More Details
  • Contributed by : IL dependency tree bank, IIIT Hyd
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable

Kannada tree bank data is in Shakti Standard Format (SSF). SSF is a common representation for data. SSF allows information in a sentence to be represented in the form of one or more trees together with a set of attribute-value pairs with nodes of the trees. The attribute-value pairs allow features or properties to be specified with every node. Sentence level SSF is used to store the analysis of a sentence. It occurs as part of text level SSF. The analysis of a sentence may mark any or all of the following kinds of information as appropriate: part of speech of the words in the sentence; morphological analysis of the words including properties such as root, gender, number, person, tense, aspect, modality; phrase-structure or dependency structure of the sentence; and properties of units such as chunks, phrases, local word groups, tags, etc. SSF is theory neutral and allows both phrase structure as well as dependency structure to be coded, and even mixed in well defined ways. The SSF representation for a sentence consists of a sequence of trees. Each tree is made up of one or more related nodes. Total size of the Kannada tree bank corpus is 19550 sentence ids and approx. 215 verb frames. Following supporting documents are provided:
1. Dependency_Guidelines_Kannada.pdf
2. morph_kannada.pdf
3. pos_chunk_guidelines_kannada.pdf
4. SSF format of tree bank.pdf

Tags:TreeBank, Kannada treebank, Treebank Corpus, Tree bank Data

Added on February 27, 2018

1
6

  More Details
  • Contributed by : IL Dependency Treebank, IIIT Hyd
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable

Marathi treebank data is in Shakti Standard Format (SSF). SSF is a common representation for data. SSF allows information in a sentence to be represented in the form of one or more trees together with a set of attribute-value pairs with nodes of the trees. The attribute-value pairs allow features or properties to be specified with every node. Sentence level SSF is used to store the analysis of a sentence. It occurs as part of text level SSF. The analysis of a sentence may mark any or all of the following kinds of information as appropriate: part of speech of the words in the sentence; morphological analysis of the words including properties such as root, gender, number, person, tense, aspect, modality; phrase-structure or dependency structure of the sentence; and properties of units such as chunks, phrases, local word groups, tags, etc. SSF is theory neutral and allows both phrase structure as well as dependency structure to be coded, and even mixed in well defined ways. The SSF representation for a sentence consists of a sequence of trees. Each tree is made up of one or more related nodes. Total size of the Marathi tree bank corpus is 10852
monolingual sentence ids ,3450 parallel sentence ids and approx. 380 verb frames.
Following supporting documents are provided:
1. Marathi POS Tag set
2. Marathi_Morph Guidelines
3. Guidelines for Marathi Verb frames
4.Dependency -Marathi

Tags: TreeBank, Marathi treebank, Treebank Corpus, Tree bank Data

Added on February 27, 2018

0
3

  More Details
  • Contributed by : IL dependency tree bank, IIIT Hyderabad
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable