•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 696
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 251

Search Results | Total Results found :   1181

You refine search by : All Results
  Catalogue
In this paper, we report a breakthrough result on the difficult task of segmentation and recognition of coloured text from the word image dataset of ICDAR robust reading competition challenge 2: reading text in scene images. We split the word image into individual colour, gray and lightness planes and enhance the contrast of each of these planes independently by a power-law transform. The discrimination factor of each plane is computed as the maximum between-class variance used in Otsu thresholding. The plane that has maximum discrimination factor is selected for segmentation. The trial version of Omnipage OCR is then used on the binarized words for recognition. Our recognition results on ICDAR 2011 and ICDAR 2003 word datasets are compared with those reported in the literature.

Added on December 18, 2017

16

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Deepak Kumar,M. N. Anil Prasad,A. G. Ramakrishnan
Author Community Profile :

A versatile tool has been created, with user‐friendly interface, for the rapid and efficient conversion of printed Tamil books to Braille books for the use of persons with visual disabil‐ ity. The tool has been developed in Java using Eclipse SWT and runs on Linux, Windows and Mac operating systems. This tool has been developed as an open source project and is available under the Apache 2.0 license from code.google.com. An individual scanned page or all the pages of a whole book can be recognized by this tool. The average time taken for digitizing a Tamil page is two seconds. The output can be saved in RTF, XML or BRF (Braille) format directly, by the click of a button. There is a provision for manually selecting the indi‐ vidual columns of a two‐column printed page or even marking the individual rectangular text blocks of a page with a more complex Manhattan layout. The user can modify the read‐ ing order of the so‐selected text blocks. This information of ordered text blocks is passed on to the Tamil OCR integrated at the backend of the tool and hence the recognized Tamil text in Unicode is put together in the same reading order. In the case of books with identical or very similar text layouts across its pages, such an user‐defined layout can be saved as a cus‐ tom layout and automatically applied to segment the other pages of the same book or a dif‐ ferent book also.

Added on December 13, 2017

27

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Shiva Kumar H R ,A G Ramakrishnan
Author Community Profile :

A competition was organized by the authors to detect text from scene images. The motivation was to look for script-independent algorithms that detect the text and extract it from the scene images, which may be applied directly to an unknown script. The competition had four distinct tasks: (i) text localization and (ii) segmentation from scene images containing one or more of Kannada, Tamil, Hindi, Chinese and English words. (iii) English and (iv) Kannada word recognition task from scene word images. There were totally four submissions for the text localization and segmentation tasks. For the other two tasks, we have evaluated two algorithms, namely nonlinear enhancement and selection of plane and midline analysis and propagation of segmentation, already published by us. A complete picture on the position of an algorithm is discussed and suggestions are provided to improve the quality of the algorithms. Graphical depiction of f-score of individual images in the form of benchmark values is proposed to show the strength of an algorithm.

Added on December 13, 2017

18

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Deepak Kumar,M. N. Anil Prasad,A. G. Ramakrishnan
Author Community Profile :

A script independent, font-size independent scheme is proposed for detecting bold words in printed pages. In OCR applications such as minor modifications of an existing printed form, it is desirable to reproduce the font size and characteristics such as bold, and italics in the OCR recognized document. In this morphological opening based detection of bold (MOBDoB) method, the binarized image is segmented into sub-images with uniform font sizes, using the word height information. Rough estimation of the stroke widths of characters in each sub-image is obtained from the density. Each sub-image is then opened with a square structuring element of size determined by the respective stroke width. The union of all the opened sub-images is used to determine the locations of the bold words. Extracting all such words from the binarized image gives the final image. A minimum of 98 % of bold words were detected from a total of 65 Tamil, Kannada and English pages and the false alarm rate is less than 0.4 %.

Added on December 13, 2017

18

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Pedamalli Saikrishna, A. G. Ramakrishnan
Author Community Profile :

Conventional optical character recognition systems, designed to recognize linearly aligned text, perform poorly on document images that contain multi-oriented text lines. This paper describes a novel technique that can extract text lines of arbitrary curvature and align them horizontally. By invoking the spatial regularity properties of text, adjacent components are grouped together to obtain the text lines present in the image. To align each identified text line, we fit a B-spline curve to the centroids of the constituent characters and normal vectors are computed all along the resulting curve. Each character is then individually rotated such that the corresponding normal vector is aligned with the vertical axis. The method has been tested on images that contain text laid out in various forms namely arc, wave, triangular and combination of these with linearly skewed text lines. It yields 97.3% recognition accuracy on text strings where state-of-the-art OCRs fail before alignment.

Added on December 13, 2017

24

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : T. Kasar,A. G. Ramakrishnan
Author Community Profile :