•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 707
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 255
  Catalogue
Identification of the script of the text in multi-script documents is one of the important steps in the design of an OCR system for the analysis and recognignation of the page. Much work has already been reported in this area relating to Roman, Arabic, Chineses, Korean and Japanese script. In the Indian context, though some results have been reported, the task is still at its infancy the script. In the work presented in this paper, a successful attempt has been made to identify the scripts, ar the word level, in a bilingual document containing Roman and Tamil scripts. Two different approaches have been proposed and thoroughly tested. In the first method, words are divided into three spatial zones. The spatial spread of a word in upper and lower zones, together with the character density, is used to identify the script. The second technique analyses the directional energy distribution of a words with various font styles and sizes have been used for the testing of the proposed algorithms and the results are quite encouraging.

Added on September 25, 2014

35

  More Details
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Dhanya D, A G Ramakrishnan, Peeta Basa Pati
Author Community Profile :
Similar / Suggested Resources