In this paper we present a statistical transliteration technique that is language independent. This technique uses statistical alignment models and Conditional Random Fields (CRF). Statistical alignment models maximizes the probability of the observed (source, target) word pairs using the expectation maximization algorithm and then the character level alignments are set to maximum posterior predictions of the model.
Added on August 2, 2010
Product Type : Research Paper
License Type : Freeware
System Requirement :
Author : Praneeth M Shishtla, Surya Ganesh, Sethuramalingam S,Vasudeva Varma
The paper describes the overall design of a new two stage constraint based hybrid approach to dependency parsing. We define the two stages and show how different grammatical construct are parsed at appropriate stages. This division leads to selective identification and resolution of specific dependency relations at the two stages. Furthermore, we show how the use of hard constraints and soft constraints helps us build an efficient and robust hybrid parser.
In this paper, we propose a modular cascaded approach to data driven dependency parsing. Each module or layer leading to the complete parse produces a linguistically valid partial parse. We do this by introducing an artificial root node in the dependency structure of a sentence and by catering to distinct dependency label sets that reflect the function of the set internal labels vis--vis a distinct and identifiable linguistic unit, at different layers.
A new method for sentence extraction on the basis of language model with relative entropy is presented in this paper. The proposed technique first builds a sentence language model and document cluster language model respectively for the sentence and the documents. The sentences are then ranked according to the relative entropies of the estimated document language model with respect to the estimated sentence language model.
Cross Language Information Retrieval (CLIR) between languages of the same origin is an interesting topic of research. The similarity of the writing systems used for these languages can be used effectively to not only improve CLIR,but to overcome the problems of textual variations, textual errors, and even the lack of linguistic resources like stemmers to an extent.