•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 687
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 31
  •    NLP Tools 105
  •    Linguistic Resources 239

Search Results | Total Results found :   1153

You refine search by : All Results
  Catalogue
This paper presents an implementation of an OCR system for the Meetei Mayek script. The script has been newly reintroduced and there is a growing set of documents currently available in this script. Our system accepts an image of the textual portion of a page and outputs the text in the Unicode format. It incorporates preprocessing, segmentation and classification stages. However, no post-processing is done to the output. The system achieves an accuracy of about 96% on a moderate database.

Added on March 26, 2018

13

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Subhankar Ghosh,Ujjwol Barman,P. K. Bora

Segmentation of a text-document into lines, words and characters, which is considered to be the crucial preprocessing stage in Optical Character Recognition (OCR) is traditionally carried out on uncompressed documents, although most of the documents in real life are available in compressed form, for the reasons such as transmission and storage efficiency. However, this implies that the compressed image should be decompressed, which indents additional computing resources. This limitation has motivated us to take up research in document image analysis using compressed documents. In this paper, we think in a new way to carry out segmentation at line, word and character level in run-length compressed printed-text-documents. We extract the horizontal projection profile curve from the compressed file and using the local minima points perform line segmentation. However, tracing vertical information which leads to tracking words-characters in a run-length compressed file is not very straight forward.

Added on March 14, 2018

46

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Mohammed Javed, P. Nagabhushan, B.B. Chaudhuri

Document Image Analysis, like any Digital Image Analysis requires identification and extraction of proper features, which are generally extracted from uncompressed images, though in reality images are made available in compressed form for the reasons such as transmission and storage efficiency. However, this implies that the compressed image should be decompressed, which indents additional computing resources. This limitation induces the motivation to research in extracting features directly from the compressed image. In this research, we propose to extract essential features such as projection profile, run-histogram and entropy for text document analysis directly from run-length compressed text-documents.

Added on March 14, 2018

5

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Mohammed Javed,P. Nagabhushan,B.B. Chaudhuri

In this paper we present how Bag-of-Features Hidden Markov Models can be applied to printed Bangla word spotting. These statistical models allow for an easy adaption to different problem domains. This is possible due to the integration of automatically estimated visual appearance features and Hidden Markov Models for spatial sequential modeling. In our evaluation we are able to report high retrieval scores on a new printed Bangla dataset. Furthermore, we outperform state-of-the-art results on the well-known George Washington word spotting benchmark. Both results have been achieved using an almost identical parametric method configuration.

Added on March 14, 2018

2

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : L. Rothacker, G. A. Fink, P. Banerjee

Extraction and recognition of Bangla text from video frame images is challenging due to fonts type and style variation, complex color background, low-resolution, low contrast etc. In this paper, we propose an algorithm for extraction and recognition of Bangla and Devanagari text form video frames with complex background. Here, a two-step approach has been proposed. After text localization, the text line is segmented into words using information based on line contours. First order gradient values of the text blocks are used to find the word gap. Next, an Adaptive SIS binarization technique is applied on each word. Next this binarized text block is sent to a state of the art OCR for recognition.

Added on March 14, 2018

1

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Purnendu Banerjee, B. B. Chaudhuri