•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 687
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 31
  •    NLP Tools 105
  •    Linguistic Resources 239

Search Results | Total Results found :   1153

You refine search by : All Results
  Catalogue
Bengali tree bank data is in Shakti Standard Format (SSF). SSF is a common representation for data. SSF allows information in a sentence to be represented in the form of one or more trees together with a set of attribute-value pairs with nodes of the trees. The attribute-value pairs allow features or properties to be specified with every node. Sentence level SSF is used to store the analysis of a sentence. It occurs as part of text level SSF. The analysis of a sentence may mark any or all of the following kinds of information as appropriate: part of speech of the words in the sentence; morphological analysis of the words including properties such as root, gender, number, person, tense, aspect, modality; phrase-structure or dependency structure of the sentence; and properties of units such as chunks, phrases, local word groups, tags, etc. SSF is theory neutral and allows both phrase structure as well as dependency structure to be coded, and even mixed in well defined ways. The SSF representation for a sentence consists of a sequence of trees. Each tree is made up of one or more related nodes.Total size of the Bengali tree bank corpus is 15725
monolingual sentence ids and approx. 425 verb frames.
Following supporting documents are provided:
1. Bengali sentences with relation following Hindi.pdf
2. POS TAG guideline draft.pdf
3. SSF format of tree bank.pdf

Tags: TreeBank, Bengali treebank, Treebank Corpus, Tree bank Data, Bangla

Added on February 27, 2018

0
4

  More Details
  • Contributed by : IL dependency tree bank, IIIT Hyd
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable

A necessary step for the recognition of scanned documents is binarization, which is essentially the segmentation of the document. In order to binarize a scanned document, we can find several algorithms in the literature. What is the best binarization result for a given document image? To answer this question, a user needs to check different binarization algorithms for suitability, since different algorithms may work better for different type of documents. Manually choosing the best from a set of binarized documents is time consuming. To automate the selection of the best segmented document, either we need to use ground-truth of the document or propose an evaluation metric.

Added on December 19, 2017

127

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Deepak Kumar, M. N. Anil Prasad, A.G. Ramakrishnan
Author Community Profile :

In this paper, we report a breakthrough result on the difficult task of segmentation and recognition of coloured text from the word image dataset of ICDAR robust reading competition challenge 2: reading text in scene images. We split the word image into individual colour, gray and lightness planes and enhance the contrast of each of these planes independently by a power-law transform. The discrimination factor of each plane is computed as the maximum between-class variance used in Otsu thresholding. The plane that has maximum discrimination factor is selected for segmentation. The trial version of Omnipage OCR is then used on the binarized words for recognition. Our recognition results on ICDAR 2011 and ICDAR 2003 word datasets are compared with those reported in the literature.

Added on December 18, 2017

16

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Deepak Kumar,M. N. Anil Prasad,A. G. Ramakrishnan
Author Community Profile :

A versatile tool has been created, with user‐friendly interface, for the rapid and efficient conversion of printed Tamil books to Braille books for the use of persons with visual disabil‐ ity. The tool has been developed in Java using Eclipse SWT and runs on Linux, Windows and Mac operating systems. This tool has been developed as an open source project and is available under the Apache 2.0 license from code.google.com. An individual scanned page or all the pages of a whole book can be recognized by this tool. The average time taken for digitizing a Tamil page is two seconds. The output can be saved in RTF, XML or BRF (Braille) format directly, by the click of a button. There is a provision for manually selecting the indi‐ vidual columns of a two‐column printed page or even marking the individual rectangular text blocks of a page with a more complex Manhattan layout. The user can modify the read‐ ing order of the so‐selected text blocks. This information of ordered text blocks is passed on to the Tamil OCR integrated at the backend of the tool and hence the recognized Tamil text in Unicode is put together in the same reading order. In the case of books with identical or very similar text layouts across its pages, such an user‐defined layout can be saved as a cus‐ tom layout and automatically applied to segment the other pages of the same book or a dif‐ ferent book also.

Added on December 13, 2017

22

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Shiva Kumar H R ,A G Ramakrishnan
Author Community Profile :

A competition was organized by the authors to detect text from scene images. The motivation was to look for script-independent algorithms that detect the text and extract it from the scene images, which may be applied directly to an unknown script. The competition had four distinct tasks: (i) text localization and (ii) segmentation from scene images containing one or more of Kannada, Tamil, Hindi, Chinese and English words. (iii) English and (iv) Kannada word recognition task from scene word images. There were totally four submissions for the text localization and segmentation tasks. For the other two tasks, we have evaluated two algorithms, namely nonlinear enhancement and selection of plane and midline analysis and propagation of segmentation, already published by us. A complete picture on the position of an algorithm is discussed and suggestions are provided to improve the quality of the algorithms. Graphical depiction of f-score of individual images in the form of benchmark values is proposed to show the strength of an algorithm.

Added on December 13, 2017

15

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Deepak Kumar,M. N. Anil Prasad,A. G. Ramakrishnan
Author Community Profile :