The Google Translate Blog - The official source for news on Google's translation technologies

Endangered Languages to Endure on YouTube

Thursday, October 28, 2010 | 9:30 AM

(Cross-posted from the Google.org blog)


Many of the world's smallest and most endangered languages have no written form and have never been recorded or scientifically documented. Today, the National Geographic Enduring Voices YouTube channel will launch and allow many of these tongues to have a presence on the Internet for the very first time. Linguists Dr. K. David Harrison and Dr. Gregory Anderson from the Living Tongues Institute have teamed up with Google.org to allow small and endangered languages that may have never been heard outside of a remote village to reach a global audience. Using YouTube as a platform, researchers, academics and communities can now collaborate more effectively on promoting language revitalization.

The YouTube channel features videos such as hip-hop performed by Songe Nimasow in the Aka language of India, songs by Aydyng Byrtan-ool, a talented young Tuvan singer and epic storyteller in Southern Siberia, and videos demonstrating how the Foe language of Papua New Guinea uses body parts to count from 1 to 37.

The launch of the channel comes on the heels of an announcement by Harrison and Anderson of a “hidden” language of India, known locally as Koro, that is new to science and had never been documented outside of its rural community. Koro is one of half of the world’s languages likely to vanish in the next 100 years.

In addition to using YouTube to help revitalize endangered and minority languages, communities can also take advantage of Google Translator Toolkit that announced the addition of 284 new languages last year to make translation faster and easier.

In the midst of a language extinction crisis, we are also seeing a global grassroots movement for language revitalization. Speakers are leveraging new technologies, such as social networking and YouTube, to sustain small languages. As Harrison describes in his book "The Last Speakers," we are all impoverished when a language dies, and all enriched by the human knowledge base found in the world's smallest tongues.

Learn more about Harrison and Anderson's efforts to document languages through the Enduring Voices Project.

Poetic Machine Translation

Tuesday, October 5, 2010 | 3:33 PM

Labels:

Once upon a midnight dreary, long we pondered weak and weary,
Over many a quaint and curious volume of translation lore.
When our system does translation, lifeless prose is its creation;
Making verse with inspiration no machine has done before.
So we want to boldly go where no machine has gone before.
Quoth now Google, "Nevermore!"
Robert Frost once said, “Poetry is what gets lost in translation”. Translating poetry is a very hard task even for humans, and is clearly beyond the capability of current machine translation systems. We therefore, out of academic curiosity, set about testing the limits of translating poetry and were pleasantly surprised with the results! 

We will present a paper on poetry translation at the EMNLP conference this year. In this paper, we investigate the purely technical challenges around generating translations with fixed rhyme and meter schemes. 

The value of preserving meter and rhyme in poetic translation has been highly debated. Vladimir Nabokov famously claimed that, since it is impossible to preserve both the meaning and the form of the poem in translation, one must abandon the form altogether.  Another authority (and for us, computer scientists, perhaps the more familiar one), Douglas Hofstadter argues that preserving the form is very important to maintaining the feeling and the sound of a poem. It is in this spirit that we decided to experiment with translating not only poetic meaning, but form as well.

A Statistical Machine Translation system, like Google Translate, typically performs translations by searching through a multitude of possible translations, guided by a statistical model of accuracy. However, to translate poetry, we not only considered translation accuracy, but meter and rhyming schemes as well. In our paper we describe in more detail how we altered our translation model, but in general we chose to sacrifice a little of the translation’s accuracy to get the poetic form right.

As a pleasant side-effect, the system is also able to translate anything into poetry, allowing us to specify the genre (say, limericks or haikus), or letting the system pick the one it thinks fits best. At the moment, the system is too slow to be made publicly accessible, but we thought we’d share some excerpts:

A stanza from Essai monographique sur les Dianthus des Pyrénées françaises by Edouard Timbal-Lagrave and Eugène Bucquoy, translated to English as a pair of couplets in iambic tetrameter:
So here's the dear child under land,
will not reflect her beauty and
besides the Great, no alter dark,
the pure ray, fronts elected mark.

Voltaire’s La Henriade, translated as a couplet in dactylic tetrameter:
These words compassion forced the small to lift her head
gently and tell him to whisper: “I'm not dead."

Le Miroir des simples âmes, an Old French poem by Marguerite Porete, translated to Modern French by M. de Corberon, and then to haiku by us:
“Well, gentle soul”, said
Love, “say whatever you please,
for I want to hear.”
More examples and technical details can be found in our research paper (as well as clever commentary).


Posted by Dmitriy Genzel, Software Engineer

Veni, Vidi, Verba Verti

Friday, October 1, 2010 | 10:43 AM

Labels:

[We’ve added Latin as an alpha language to translate.google.com! Alpha languages aren’t perfect, but we think the addition will help unlock many classic Latin texts and documents. Learn more from our programmer Jakob in the post below. Don’t speak Latin? Good thing there is now an easy way to translate the language...]

Ut munimenta linguarum convellamus et scientiam mundi patentem utilemque faciamus, Ut munimenta linguarum convellamus et scientiam mundi patentem utilemque faciamus, instrumenta convertendi multarum nationum linguas creavimus. Hodie nuntiamus primum instrumentum convertendi linguam qua nulli nativi nunc utuntur: Latinam. Cum pauci cotidie Latine loquantur, quotannis amplius centum milia discipuli Americani Domesticam Latinam Probationem suscipiunt. Praeterea plures ex omnibus mundi populis Latinae student.

Hoc instrumentum convertendi Latinam rare usurum ut convertat nuntios electronicos vel epigrammata effigierum YouTubis intellegamus. Multi autem vetusti libri de philosophia, de physicis, et de mathematica lingua Latina scripti sunt. Libri enim vero multi milia in Libris Googlis sunt qui praeclaros locos Latinos habent.

Convertere instrumentis computatoriis ex Latina difficile est et intellegamus grammatica nostra non sine culpa esse. Autem Latina singularis est quia plurimi libri lingua Latina iampridem scripti erant et pauci novi posthac erunt. Multi in alias linguas conversi sunt et his conversis utamur ut nostra instrumenta convertendi edoceamus. Cum hoc instrumentum facile convertat libros similes his ex quibus edidicit, nostra virtus convertendi libros celebratos (ut Commentarios de Bello Gallico Caesaris) iam bona est.

Proximo tempore locum Latinum invenies vel auxilio tibi opus eris cum litteris Latinis, conare hunc.

Jakob Uszkoreit, Ingeniarius Programmandi et Ben Bayer, Magister Spatii et Temporis