A dictionary, compiled by language scholars, which provides a common platform for 14 Indian languages. It is a collection of nearly 5000 words and their corresponding meanings in other 13 languages. The compilation of words has been done. Keeping in mind words used in day to day life, words pertaining to Indian culture & tradition. File Size: 75.6MB. True Type Support
Added on November 11, 2010
Product Type : Tool
License Type : Freeware
System Requirement :
By : RK Shrivastava
August 25, 2016
File size written is 75 MB. The rar file downloads as 13 MB only and does not install correctly.
Million pages multilingual parallel text corpus in English and 11 Indian languages (Assamese, Bengali, Gujarati, Hindi, Kannada, Marathi, Malayalam, Oriya, Punjabi, Tamil & Telugu) based on Unicode encoding. Useful resource for applications such as improving translation system, translation memory, spell checkers dictionaries morphological analyzer & CLIR.
Recognition of Multi-word Expressions (MWEs) and their relative compositionality are crucial to Natural Language Processing. Various statistical techniques have been proposed to recognize MWEs. In this paper, we integrate all the existing statistical features and investigate a range of classifiers for their suitability for recognizing the non-compositional Verb-Noun (V-N) collocations.
In this paper we describe Part Of Speech (POS) tagging and Chunking using Condi-tional Random Fields (CRFs) and Transfor-mation Based Learning (TBL) for Telugu, Hindi and Bengali. We show here how to train CRFs to achieve good performance over any other ML techniques.
This paper, submitted as an entry for the NERSSEAL-2008 shared task, describes a system build for Named Entity Recognition for South and South East Asian Languages. Our paper combines machine learning techniques with language specific heuristics to model the problem of NER for Indian languages.