August 22, 2007

More on searching the SEAlang dictionary

After the other day's post, I figured it would be good to do some exploring of search functions in SEAlang*'s Thai-English dictionary.

First up: IPA characters
Sometimes you know how a word is pronounced, but you aren't sure how it's spelled in Thai. The IPA search box can help you out. IPA refers to the International Phonetic Alphabet. You can use SEAlang's IPA search box to search by the sound of a word. (In reality, these are a modified form of IPA). The six special characters are: ə ɛ ɔ ʉ ŋ ʰ

ə = The "schwa" is an upside-down letter e, and represents the vowel เออะ (əə = เออ).
For example, เรอ would be r
ɛ = The "script e" looks like a backwards 3, and represents the vowel แอะ (ɛɛ = แอ).
For example, แม่ would be m
ɔ = The "open o" looks like a backwards letter c, and represents the vowel เอาะ (ɔɔ = ออ).
For example, พ่อ would be p
ʉ = The "u-bar" is a u with a line through it, representing the vowel อึ (ʉʉ = อือ).
For example, คือ would be k
ŋ = The "eng" is an n with a tail, which represents the consonant ง.
For example, โง่ would be
ŋoo (and งู would be ŋuu).
ʰ = This superscript h represents an aspirated sound. Must follow c, k, p, or t
For example: ทัน = t
ʰan while ตัน = tan. Likewise, ชัน = cʰan while จันทร์ = jan; พัน = pʰan while ปัน = pan; and คัน = kʰan while กัน = kan)

You can produce these either by clicking the buttons above the search box, or by using the keyboard shortcut (note you must be typing in the IPA box or this won't work, and neither will the search):
ə = shift + a
ɛ = shift + e
= shift + o
ʉ = shift + u
ŋ = shift + n
ʰ = shift + h

Now I'm going to present some scenarios and explore how to get the desired results.

Q: You heard the word สร้างสรรค์ and you want to know how it's spelled.
A: In the IPA search box, you can use a couple of different methods. The easiest is to type the phonetics into the IPA box:
saaŋ-san (you must separate syllables with the hyphen). Up comes สร้างสรรค์. But say you heard the word and weren't sure about the vowel length. You searched saŋ-san, but that only returned สังสรรค์, which isn't what you were looking for. Try the search again, this time clicking the V button in the "Approximate matching" section. This means it will find words with variations on vowel length or other plausible vowel variations. Sure enough, it returns two results: สร้างสรรค์ and สังสรรค์.

Q: You want to know how many words begin with the letter ษ.
The phonetic searching is quite handy, but some of the search syntax can also be used with the native orthography. This is a good example of such a situation. The search ษ.* will match ษ followed by any combination of characters. (Turns out SEAlang isn't the best place to look for rare words like this, since it's based on a student's dictionary, not a comprehensive dictionary. But you can use this same technique, searching กระ.* for all the กระ-words, or ปฎิ.* for all the words that begin with ปฏิ-, etc.)

Q: You want to find all word that begin and with with the sound /k/, with any vowel in the middle.
A: Time to use phonetic search again. This time, we'll see how the search kV*k does. The V represents any vowel, and the * means any number of vowels. This returns 52 results, and only matches words like กอก, not but words like กรอก, because it has a consonant. To match both types of words, search
k.*k, which will return 182 results, matching any consonants and vowels in between. Note, though, that it only matches one syllable, so a word like กระรอก isn't going to come up. To search for two-syllable words that begin and end with k, change your search term to k.*-.*k instead. This will find a first syllable with k followed by anything, and a last syllable with anything ending in k.

Also notice that you can adjust the type of results you get using the radio buttons in the "Approximate matching" section. If you search kV*k, and select "syllable or longer," you'll get all results that contain a syllable that matches your criteria, such as มะกอก or ตะกัก.

Q: You want to know some Thai words that come from Chinese.
A: Clear all the search fields, and head straight down to the "Tags" section. In the Etymology menu, select "Chinese," and click "Show all". Voila. It only has 60 words tagged as Chinese, but you can imagine how combining the search capability of SEAlang with the breadth of some other dictionaries would be a powerful combination.

Q: You want to find all the กระ- words that are nouns.
A: Head back down to the Tags, this time making sure to "Reset all" first and this time selecting "noun" from the "Part of speech" drop-down menu. Now head back up and type กระ.* and then press "Go!" You'll get 506 items returns, pared down from the 1299 you get if you search กระ.* for any part of speech. Try the search again, setting part of speech to "classifier". This time you'll get 9 classifiers that begin with กระ. Or clear out the search field and select "classifier" to see a list of all classifiers in the SEAlang dictionary. Piece of cake.

Q: Finally, someone wrote me to say they
ran across the word ชรัว, which the dictionary says is pronounced [ชฺรัว]. The cluster /chr/ isn't supposed to exist in Thai, if you believe all the books out there. But lo and behold, here is a real live example. So how would we find other words pronounced with this cluster?
A: Doing a search for c
ʰr.* in the IPA field is easiest. It only returns one result, because, again, it's not a comprehensive dictionary, so it's missing a lot of obscure words. And yes, ชรัว is an example of an obscure word. I also don't think it's really pronounced with the cluster /chr/ in the real world. As is, it returns barely 100 Google hits. These seem to be mostly surnames, false hits (where Google has incorrectly detected word boundaries), and jocular spellings of the word ชัวร์ (meaning "sure," แน่นอน). You can search ชร.* for other words spelled with this combination of letters, and find ชรา, pronounced [ชะรา]. If you browse the ชร section of RID (you'll need to scroll down a bit in that link), you'll find more, but they're mostly used in poems, and have variants without this odd cluster.

The search tools are extremely powerful, and while they can take a while to get used to, the searching is vastly more useful. I'm interested to see what cool things other folks have been able to find using SEAlang's search tools. Drop them in the comments, if you please. Or other search quirks you can't figure out. I'm only a user myself (see the footnote), but I've played around with the dictionary long enough (and clicked all the ? buttons to read the instructions enough times) that I may be able to help out.

*In the interest of full disclosure, I should mention that I have worked as a research contractor for CRCL, the parent organization of the SEAlang projects, since February 2007. Previous to my employment, I used data provided by CRCL for research projects while working on my bachelor's degree at Dartmouth College and did a chunk of unpaid spec work. To date I haven't had any involvement with the dictionaries, although that might change at some future point. Right now, I'm just an avid (longtime) user. This is why I use the third person pronoun with respect to SEAlang.


  1. Rikker, this blog is so fascinating. I'm glad you're doing it. I hope you don't mind if I link to it from my blog.

    Sister Grimmius (Jade)

  2. ๋I'm glad you think so. Thanks for reading, and by all means, feel free to link.