Tuesday, July 29, 2003

DVDs the Rosetta Stones of the 21st Century

From Slashdot Romancing the Rosetta Stone "Och's method uses matched bilingual texts, the computer-encoded equivalents of the famous Rosetta Stone inscriptions. Or, rather, gigabytes and gigabytes of Rosetta Stones.

"Our approach uses statistical models to find the most likely translation for a given input," Och explained

"It is quite different from the older, symbolic approaches to machine translation used in most existing commercial systems, which try to encode the grammar and the lexicon of a foreign language in a computer program that analyzes the grammatical structure of the foreign text, and then produces English based on hard rules," he continued.

"Instead of telling the computer how to translate, we let it figure it out by itself. First, we feed the system it with a parallel corpus, that is, a collection of texts in the foreign language and their translations into English."

Cool and I know just where to get them from, DVDs. Websites provide subtitles for DivX (like the Princess Bride) - they're great because they're indexed. For the record the Spainish version (see the DVDs link) is: "El m·s famoso es no involucrarse en una guerra terrestre en Asia.".
