Interlingua based English-Hindi Machine Translation and

Language Divergence

Shachi Dave Jignashu Parikh Pushpak Bhattacharyya

Department of Computer Science and Engineering,

Indian Institute of Technology,

Bombay.

Abstract

Interlingua and transfer based approaches to machine translation have long been in use in competing and complimentary ways. The former proves economical in situations where translation among multiple languages is involved, while the latter is used for pair specific translation tasks. The additional attraction of an interlingua is that it can be used as a knowledge representation scheme. But given a particular interlingua, its adoption depends on its ability to (a) capture the knowledge in texts precisely and accurately and (b) handle cross language divergences. This paper studies the language divergence between English and Hindi and its implication to machine translation between these languages using the Universal Networking Language (UNL). UNL has been introduced by the United Nations University (UNU), Tokyo, to facilitate the transfer and exchange of information over the internet in the natural languages of the world. The representation works at the level of single sentences and defines a semantic net like structure in which nodes are word concepts and arcs are semantic relations between these concepts. Hindi belongs to the Indo European family of languages. The language divergences between Hindi and English can be considered as representing the divergences between SOV and SVO class of languages. The work presented here is the only one to our knowledge that describes language divergence phenomena in the framework of computational linguistics through a South Asian language.