•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 707
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 255
  Catalogue
Text segmentation and localization algorithms are proposed for the born-digital image dataset. Binarization and edge detection are separately carried out on the three colour planes of the image. Connected components (CC’s) obtained from the binarized image are thresholded based on their area and aspect ratio. CC’s which contain sufficient edge pixels are retained. A novel approach is presented, where the text components are represented as nodes of a graph. Nodes correspond to the centroids of the individual CC’s. Long edges are broken from the minimum spanning tree of the graph. Pairwise height ratio is also used to remove likely non-text components. A new minimum spanning tree is created from the remaining nodes. Horizontal grouping is performed on the CC’s to generate bounding boxes of text strings. Overlapping bounding boxes are removed using an overlap area threshold. Non-overlapping and minimally overlapping bounding boxes are used for text segmentation. Vertical splitting is applied to generate bounding boxes at the word level. The proposed method is applied on all the images of the test dataset and values of precision, recall and H-mean are obtained using different approaches.

Added on August 11, 2014

8

  More Details
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Deepak Kumar , A G Ramakrishnan
Author Community Profile :