In this paper, we discuss the issues related to word recognition in born-digital word images. We introduce a novel method of power-law transformation on the word image for binarization. We show the improvement in image binarization and the consequent increase in the recognition performance of OCR engine on the word image. The optimal value of gamma for a word image is automatically chosen by our algorithm with fixed stroke width threshold. We have exhaustively experimented our algorithm by varying the gamma and stroke width threshold value. By varying the gamma value, we found that our algorithm performed better than the results reported in the literature.
Scene word images undergo degradations due to motion blur, uneven illumination, shadows and defocusing, which lead to difficulty in segmentation. As a result, the recognition results reported on the scene word image datasets of ICDAR have been low. We introduce a novel technique, where we choose the middle row of the image as a subimage and segment it first. Then, the labels from this segmented sub-image are used to propagate labels to other pixels in the image. This approach, which is unique and distinct from the existing methods, results in improved segmentation. Bayesian classification and Max-flow methods have been independently used for label propagation.
In this paper, we describe a method for feature extraction and classification of characters manually isolated from scene or natural images. Characters in a scene image may be affected by low resolution, uneven illumination or occlusion. We propose a novel method to perform binarization on gray scale images by minimizing energy functional. Discrete Cosine Transform and Angular Radial Transform are used to extract the features from characters after normalization for scale and translation.
We have benchmarked the maximum obtainable recognition accuracy on five publicly available standard word image data sets using semi-automated segmentation and a commercial OCR. These images have been cropped from camera captured scene images, born digital images (BDI) and street view images. Using the Matlab based tool developed by us, we have annotated at the pixel level more than 3600 word images from the five data sets. The word images binarized by the tool, as well as by our own midline analysis and propagation of segmentation (MAPS) algorithm are recognized using the trial version of Nuance Omnipage OCR and these two results are compared with the best reported in the literature.
Added on September 11, 2017
Contributed by : OCR Consortium
Product Type : Research Paper
License Type : Freeware
System Requirement :
Author : Deepak Kumar,M N Anil Prasad,A G Ramakrishnan
Script identification in a multi-lingual document environment has numerous applications in the field of document image analysis, such as indexing and retrieval or as an initial step towards optical character recognition. In this paper, we propose a novel hierarchical framework for script identification in bi-lingual documents. The framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages. We utilize texture and shape based information embedded in the documents at different levels for feature extraction. The prediction task at different levels of hierarchy is performed by Support Vector Machine (SVM) and Rejection based classifier defined using AdaBoost. Experimental evaluation of the proposed concept on document collections of Hindi/English and Bangla/English scripts have shown promising results.