•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 670
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 31
  •    NLP Tools 105
  •    Linguistic Resources 234

Search Results | Total Results found :   1131

You refine search by : All Results
  Catalogue
In this paper we present an approach for correcting character recognition errors of an OCR which can recognise Indic Scripts. Suffix tree is used to index the lexicon in lexicographical order to facilitate the probabilistic search. To obtain the best probable match against the mis-recognised string, it is compared with the sub-strings (edges of suffix tree) using similarity measure as weighted Levenshtein distance, where Confusion probabilities of characters (Unicodes) are used as substitution cost, until it exceeds the specified cost k. Retrieved candidates are sorted and selected on the basis of their lowest edit cost. Exploiting this information, the system can correct non-word errors and achieves maximum error rate reduction of 33% over simple character recognition system.

Added on September 8, 2017

2

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Rupi Jain,Santanu Chaudhury
Author Community Profile :

The paper presents application of multiple features for word based document image indexing and retrieval. A novel framework to perform Multiple Kernel Learning for indexing using the Kernel based Distance Based Hashing is proposed. The Genetic Algorithm based framework is used for optimization. Two different features representing the structural organization of word shape are defined. The optimal combination of both the features for indexing is learned by performing MKL. The retrieval results for document collection belonging to Devanagari script are presented.

Added on September 8, 2017

1

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Ehtesham Hassan,Santanu Chaudhury,M. Gopal
Author Community Profile :

IISc Bangalore has developed a recognition engine for Tamil printed text, which has been tested on 1000 document images of pages scanned from books printed between 1950 and 2000. IIIT Hyderabad has developed a XML based annotated database for storing the 5000 images of scanned pages and the corresponding typed text in Unicode. CDAC, Noida has developed an efficient evaluation tool, which compares the OCR output text to the reference typed text (ground truth) and flashes the substitution, deletion and insertion errors in different colours on the screen, so that the design team can quickly identify the issues with the OCR and make corrective steps for improving the performance.

Added on September 8, 2017

6

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Tushar Patnaik ,Santanu Chaudhury ,A.G. Ramakrishnan
Author Community Profile :

India is a land of many languages and consequently one often encounters documents that contain elements in multiple languages and scripts. This chapter presents an approach towards designing a bilingual OCR that can process documents containing both English and Kannada scripts which are used by the Kannada language of the southern Indian state of Karnataka. We report an efficient script identification scheme for discriminating Kannada from Roman script. We also propose a novel segmentation and recognition scheme for Kannada, which could possibly be applied to many other Indian languages as well.

Added on September 8, 2017

0

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type :
  • System Requirement :
  • Author : R.S. Umesh, A.G. Ramakrishnan
Author Community Profile :

Skew correction of complex document images is a difficult task. We propose an edge-based connected component approach for robust skew correction of documents with complex layout and content. The algorithm essentially consists of two steps - an 'initialization' step to determine the image orientation from the centroids of the connected components and a 'search' step to find the actual skew of the image. During initialization, we choose two different sets of points regularly spaced across the the image, one from the left to right and the other from top to bottom.

Added on September 8, 2017

0

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : T Kasar,A G Ramakrishnan,J Kumar
Author Community Profile :