Letter-to-sound (LTS) rules play a vital role in building a speech synthesis system. In this paper, we apply various Machine Learning approaches like Classifcation and Regression Trees (CART), Decision Forest, forest of Artificial Neural Network (ANN) and Auto Associative Neural Networks (AANN) for LTS rules. We used these techniques mainly for Schwa deletion in Hindi.
Product Attribute Extraction is the task of automatically discovering attributes of products from text descriptions. In this paper, we propose a new approach which is both unsupervised and domain independent to extract the attributes. With our approach, we are able to achieve 92% precision and 62% recall in our experiments.
This paper discourses our CLIR experiments performed for the FIRE1 workshop. We had submitted our runs for Adhoc monolingual document retrieval in Hindi and English, and Ad-hoc cross-lingual document retrieval from Hindi to English, and English to Hindi. In this paper, we describe our English to Hindi and Hindi to English CLIR systems and the experiments conducted on them using the FIRE- 2008 dataset.
Telugu is the official language of Andhra Pradesh state and one of the widely spoken languages in the world. However, there is no standard input method, which has a widespread use among Telugu users on computers. In this paper, we describe the design of Telugu soft keyboards, which are based on a set of design principles. We also evaluate these designs along with existing designs and compare their performances.
In this paper we have proposed an approach for automatic language and subject identification for the books of digital library. The important characteristics of function words is explored for language identification. The heuristic search approach is explored for subject identification by matching title words with the keywords of the subjects. The language identification system is developed for five languages namely English, French, German, Italian and Spanish.