•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 707
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 255
In this paper we present an approach for correcting character recognition errors of an OCR which can recognise Indic Scripts. Suffix tree is used to index the lexicon in lexicographical order to facilitate the probabilistic search. To obtain the best probable match against the mis-recognised string, it is compared with the sub-strings (edges of suffix tree) using similarity measure as weighted Levenshtein distance, where Confusion probabilities of characters (Unicodes) are used as substitution cost, until it exceeds the specified cost k. Retrieved candidates are sorted and selected on the basis of their lowest edit cost. Exploiting this information, the system can correct non-word errors and achieves maximum error rate reduction of 33% over simple character recognition system.

Added on September 8, 2017


  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Rupi Jain,Santanu Chaudhury
Author Community Profile :