Text segmentation and localization algorithms are proposed for the born-digital image dataset. Binarization and edge detection are separately carried out on the three colour planes of the image. Connected components (CC's) obtained from the binarized image are thresholded based on their area and aspect ratio. CC's which contain sufficient edge pixels are retained. A novel approach is presented, where the text components are represented as nodes of a graph. Nodes correspond to the centroids of the individual CC's. Long edges are broken from the minimum spanning tree of the graph. Pair wise height ratio is also used to remove likely non-text components.