How machine translation works
Google Translate, as well as other machine translators, operate on statistics rather than rules. That is, they look for patterns in hundreds of millions of documents that have already been translated by human translators. Google Translate makes special use of UN documents, which are translated in all six official UN languages, and thus provide ample linguistic data. This way, they can weigh a plethora of options for phrases presented by various different (human) translations, and select an educated guess based on the one that occurs most frequently. For example, they detect that, in Spanish, the phrase “darse cuenta” is usually translated as “realize” in English. Therefore, based on statistics, Google Translate will correctly translate the phrase as “realize”, rather than a word-for-word translation, which would appear more like “give account”.
Finding linguistic data large enough to create legitimate statistical analyses is no easy feat. Given that more documents are available in English than in any other language, the data almost always uses English as an intermediary step when translating between two languages that aren’t English. For example, when translating from Russian to Spanish, Google Translate will first translate the text from Russian to English, and then from English into Spanish. As a result, when translating in languages other than English, machine translations actually involve two iterations.
In fact, some language pairs involve even more iterations. If you want to translate some text from, say, Catalan to Japanese, Google will translate it first into Spanish, as most existing Catalan translations are in Spanish. Then, this translated Spanish-language version of the original Catalan text will be translated into English. And finally, the English version of the Spanish version of the Catalan text will finally make it to Japanese — and if you’re lucky, it will still bear some resemblance to the original meaning.
Why it doesn’t make the cut
Google Translate does a good job with very basic translations — especially those whose target language is English — and now even offers alternative interpretations for certain words and phrases. However, the very methodology upon which Google Translate is based prevents it from ever competing with human translators. Here’s why:
Statistics don’t have feelings. Google Translate is based on statistics — it chooses the “best” translation based on how certain words and phrases have been translated in other documents. As a result, machine translators choose the most probable translation, but not the most interesting or poetic one. As a result, even if translations are accurate (which they often aren’t), they adopt a robotic, lifeless tone. It takes a human translator, with feelings and creativity, to reproduce the tone, color, and vibrancy of the original text.
Machine translations struggle with complex grammar. Language is based on rules, and as a result, a statistics-based translator like Google will struggle with complex grammatical concepts, such as the difference between the imperfect and preterite past tenses in Romance languages. This is especially true given that Google almost always uses English — a language that does not grammatically distinguish between preterite and imperfect tenses — as an intermediary step when translating into Romance languages. Therefore, Google Translate often incorrectly translates the imperfect past as the preterite past (and vice versa), making ongoing or habitual acts seem like one-time, completed events.
Google can’t write for an audience. Every translator knows that you need to tailor your work to whom you’re writing for. For example, if this article were written for a casual blog, my use of the word “whom” in the previous sentence may come off as overly formal. However, given that this article appears on a language interest blog, grammarians and language experts may applaud my correct distinction between “who” and “whom” (though they may scoff at my decision to end a sentence in a preposition). Machines cannot make such judgment calls — Google cannot take into account who the intended audience is for the article it translates. Only a human translator can make that kind of decision.
Google Translate vs. a human being
To illustrate the difference between Google Translate and a living, breathing human translator, I will employ both to translate the following text in English, which appears on a website selling Argentine wine. Try to guess which one was written by a human, and which was produced by a machine (spoiler alert: it won’t be hard).
Después de una excelente cosecha como la que le precedió, la cosecha 2009 muestra sus virtudes en este vino base Cabernet Sauvignon, mas el ensamble de tres variedades de gran personalidad que encontraron en San Rafael el terruño ideal para la expresión de sus mejores cualidades. Vino aun de color rojo violáceo intenso a pesar de los años en botella, ya en la copa se nos muestra intenso y seductor con aromas especiados que se entremezclan con nítidos y frescos aromas a frutas de ciruelas, cerezas negras y moras, mientras que se van desprendiendo lentamente los aromas tostados que recuerdan a granos de café molidos.