Machine translation in NLP

January 7, 2016
Europe Flags

Machine translation is the process of translating from source language text into the target language. The following diagram shows all the phases involved.

Figure: A Typical Machine Translation Process

Text Input

This is the first phase in the machine translation process and is the first module in any MT system. The sentence categories can be classified based on the degree of difficulty of translation. Sentences that have relations, expectations, assumptions, and conditions make the MT system understand very difficult. Speaker’s intentions and mental status expressed in the sentences require discourse analysis for interpretation. This is due to the inter-relationship among adjacent sentences. World knowledge and commonsense knowledge could be required for interpreting some sentences.

Deformating and Reformating

This is to make the machine translation process easier and qualitative. The source language text may contain figures, flowcharts, etc that do not require any translation. So only translation portions should be identifed. Once the text is translated the target text is to be reformatted after post-edting. Reformating is to see that the target text also conatains the non-translation portion.

Pre-editing and Post editing

The level of pre-editing and post-editing depend on the efficiency of the particular MT system. For some systems segmenting the long sentences into short sentences may be required. Fixing up punctuation marks and blocking material that does not require trarnslation are also done during pre-editing. Post editing is done to make sure that the quality of the translation is upto the mark. Post-editing is unavoidable especially for translation of crucial information such as one for health. Post-editing should continue till the MT systems reach the human-like.

Analysis, Trasfer and Generation

Morphological analysis determines the word form such as inflections, tense, number, part of speech, etc. Syntactic analysis determines whether the word is subject or object. Semantic and contextual analysis determine a proper interpretation of a sentence from the results produced by the syntactic analysis. Syntactic and semantic analysis are often executed simultaneously and produce syntactic tree structure and semantic network respectively. This results in internal structure of a sentence. The sentence generation phase is just reverse of the process of analysis.

Morphological analysis and generation

Computational morphology deals with recognition, analysis and generation of words. Some of the morphological process are inflection, derivation, affixes and cobining forms. Inflection is the most regular and productive morphological process across languages. Inflection alters the form of the word in number, gender, mood, tense, aspect, person, and case. Morphlogical analyser gives information concerning morphological properties of the words it analyses.

Syntactic analysis and generation

As words are the foundation of speech and language processing, syntax can considered as the skeleton. Syntactic analysis concerns with how words are grouped into classes called parts-of-speech, how they group their neighbours into phrases, and the way in which words depends on other words in a sentence.

Grammar formalism

Grammar formalism is a framework to explain the basic structure of a language. Researchers propose the following grammar formalisms:

Phrase Structure Grammar (PSG)
Dependency Grammar
Case Grammar
Systematic Grammar
Montague Grammar

The variants of PSG are

Context Free PSG
Context Sensitive PSG
Augmented Transition Network Grammar (ATN)
Definite Clasue (DC) Grammar
Categorical Grammar
Lexical Functional Grammar (LFG)
Generalised PSG
Head Driven PSG
Tree Adjoining (TAG)

Not all the grammars suit a particular language. PSG, for example, does suit Japanese while dependency grammar does suite. Case grammar is popular as sentence in differet languages that express the same contents may have the same case frames.

Parsing and Tagging

Tagging means the identification of linguistic properties of the individual words and parsing is the assessment of the functions of the words in relation to eah other.

Semantic and Contextual analysis and Generation

Semantic analysis composes the meaning representations and assign them the linguisitc inputs.The semantic analysers uses lexicon and grammar to create context independent meanings. The source of knowledge consists of meaning of words, meanings associated with grammatical structures, knowledge about the discourse context and commonsense knowledge.

NLP 1 An Introduction to Pragmatics in NLP
NLP 1 An Introduction to Pragmatics in NLP
Linear Programming decoder In NLP Part 1
Linear Programming decoder In NLP Part 1
Natural Language Processing NLP
Natural Language Processing NLP
Share this Post