In this paper, a DNN based keyword spotting framework, that utilizes both spectral as well as prosodic information present in the speech signal, is proposed. A DNN is first trained to learn a set of hierarchical non-linear transformation parameters that project the original spectral and prosodic feature vectors onto a feature space where the distance between similar syllable pairs is small and between dissimilar syllable pairs is large. These transformed features are then fused using an attention based long short-term memory (LSTM) network. As a side result, a deep denoising autoencoder based fine-tuning technique is used to improve the performance of sequence predictions.
Many Inscript based standalone keyboard applications are available online for typing Punjabi but they are restrictive in nature and most of these do not offer formatting of the keyed-in content in the text area provided in the application. The main problem regarding these keyboard applications is that in the absence of an audio feedback for the keys pressed on the Punjabi Inscript keyboard, unless trained to use that keyboard, the visually impaired cannot make out whether the content is being typed correctly. In order to overcome this problem, we developed the Punjabi Unicode Inscript Keyboard with sound embedded on every keystroke. This keyboard can be used for typing directly in the Microsoft Word.
The OCR technology for Indian documents is in emerging stage and most of these Indian OCR systems can read the documents written in only a single script. As many commercial and official documents of different states of India are tri-lingual in nature, therefore identification of script and/ or language is one of the elementary tasks for multi-script document recognition. A script recognizer simplifies the task of multi-lingual OCR by improving the accuracy and reducing the computational complexity. This script recognition may be at line, word or character level depending on interlacing of different scripts at different levels.
Script Identification is one of the challenging step in the Optical Character Recognition system for multi-script documents. In Indian and Non-Indian context some results have been reported, but research in this field is still emerging. This paper presents a research work in the identification of Gurmukhi and English scripts at word level. It also identifies English Numerals from Gurmukhi text. Gabor feature extraction is one of most popular method for script recognition. This paper presents a zone based gabor feature extraction technique.
Digitization of newspaper article is important for registering historical events. Layout analysis of Indian newspaper is a challenging task due to the presence of different font size, font styles and random placement of text and non-text regions. In this paper we propose a novel framework for learning optimal parameters for text graphic separation in the presence of complex layouts. The learning problem has been formulated as an optimization problem using EM algorithm to learn optimal parameters depending on the nature of the document content.
Added on August 28, 2018
Contributed by : Consortium
Product Type : Research Paper
License Type : Freeware
System Requirement :
Author : Ritu Garg,Anukriti Bansal,Santanu Chaudhury,Sumantra Dutta Roy