All languages mark tense, aspect and modality (TAM) in some way, but the markers dont have a one-to-one mapping across languages. Many errors in machine translation (MT) are due to wrong translation of TAM markers. Reducing them can improve the performance of an MT system. We used about 9000 sentence pairs from an English-Hindi parallel corpus. These were manually annotated with TAM markers and their mappings. Based on this corpus, we identify the factors responsible for ambiguity in translation. We present the results for learning TAM marker translation using CRF. We achieved an improvement of 17.88% over the baseline.
For Full Paper : Click Here