February 7, 2009

Thai language added to Google Translate

I just learned that Google Translate, a machine translation tool from everyone's favorite search engine company, has added support for the Thai language, plus six more. This brings the total up to 41 languages, with the capability to translate between any two of them. You can either cut-and-paste text or feed it a URL to have it translate entire pages.

The accuracy of machine translation improves the more data Google has, usually in the form of bilingual corpora. So expect the results to be a huge grab bag for now, but also expect them to get better over time.

Notice that after you paste text to be translated, there's a link on the bottom right of the results page inviting you to "Suggest a better translation." This is great, because it harvests human knowledge to augment the machine translation.

The simple phrase ผมชื่อสมชาย returned "I named Somchai", so I suggested "My name is Somchai" instead. Piece of cake. Then I tried อภิสิทธิ์ เวชชาชีวะ (knowing it would likely translate it literally). It returned "Prerogative physician Charles existence." So I suggested "Abhisit Vejjajiva".

I've never really explored all the features of Google Translate, since I don't really know any of the languages it has hitherto supported. Exploring it now with Thai, though, they have another really cool feature: Translated Search. It's your basic cross-lanugage information retrieval. You search in your native language, specify the target language, and it returns pages in that language about the thing you searched.

For example, if you want to find Thai websites about Thai history (for, say, the images), search "Thai history", with Thai as the target language. It returns pages like the Thai Wikipedia entry on Thai history, with the first few lines in both languages. You can then click through to see the whole page translated, or view it in the original language. Quite handy.

Unfortunately, these new tools suffer from the same problem that Thai searching in regular Google does: tone marker blindness. Type ชา "tea" into Google Translate and you get "slow" (which is actually spelled ช้า), type ดิ้น "squirm" and it returns "soil" (actually ดิน), ไหม "silk" returns "burn" (actually ไหม้), and so forth. It's a problem.

Still, though, I'm really glad to see Thai on Google Translate. It can only get better with time.

2 comments:

  1. Strange it has tone marker blindness. I think Google search didn't used to but they changed it. Not really sure why. Can't be that hard to program it in...

    ReplyDelete
  2. They consider it a feature. It's supposed to help bad spellers find words they want. But when it clobbers real words, that's a big fail.

    Google is aware of the issue, they just haven't done anything about it yet...

    ReplyDelete