February 14, 2008

Know Your Dictionary: Etymology in RID99

We're nearing the end of my translations of the frontmatter from RID99 (พจนานุกรม ฉบับราชบัณฑิตยสถาน พ.ศ. ๒๕๔๒). Etymology's on the agenda today. The original Thai for this section is here. There is actually more to translate in the dead tree version of the RID99, including a synopsis of the history of RID. But the electronic text of that extra stuff isn't included in the web version, which makes it a little harder to work with. Maybe I'll get around to translating that in the future, though.

In this post I've tried something new: the blue text is direct quotes of full or partial entries from the dictionary, for easier scanning. Scattered individual Thai words are still black.

Part 6: Etymology

1. The origin of a word is given at the end of the entry for that word, as an abbreviation in parentheses e.g. สทึง [สะ-] น. แม่น้ำ, ใช้ว่า จทึง ฉทึง ชทึง ชรทึง สทิง หรือ สรทึง ก็มี. (ข. สทึง ว่า คลอง). or การะบุหนิง น. ดอกแก้ว. (ช).

2. Any word that is given as being from another language in fact does not correspond exactly with the source word, because the words from languages such as Pali, Sanskrit, or Khmer that are borrowed in Thai are usually either shortened, changed orthographically, or changed phonetically, e.g. ธมฺม (Pali) and ธรฺม (Sanskrit) correspond to Thai ธรรม; โปฺรส (Khmer) corresponds to Thai โปรด. In giving the etymology, sometimes the spelling in the original language is given as well, e.g. ธรรม has (ส. ธรฺม; ป. ธมฺม), or โปรด has (ข. โปฺรส), in order to compare the original spelling with the spelling used in Thai. For loanwords which are written very close to the original language, only the source language is given, e.g. กฏุก only gives (ป.), and ศิขร only gives (ส.). If a word is both Pali and Sanskrit, then both languages are given, e.g. รจนา has (ป., ส.). If a word is partially Pali and partially Sanskrit, then the original spellings of both Pali and Sanskrit are given, e.g. ปราโมทย์ gives (ส. ปฺรโมทฺย; ป. ปาโมชฺช). If the spelling is Pali, but is very similar to Sanskrit, e.g. หทัย, then it is given as (ป.; ส. หฺฤทย), or if the spelling is Sanskrit but very similar to Pali, e.g. สตัมภ์, then it is given as (ส. สฺตมฺภ, สฺตมฺพ; ป. ถมฺภ).

3. Any word for which the language of origin is uncertain, but is written similarly to another language, it is given is parentheses to compare with this language or that language, e.g. กำปั่น น. เรือเดินทะเลขนาดใหญ่ชนิดหนึ่ง... (เทียบมลายู หรือฮินดูสตานี ว่า capel).

4. Some archaic words are written one way, but nowadays the spelling has changed, in which case both the archaic and the modern spelling may be included, e.g. วงษ์ (โบ) น. วงศ์. วงศ- and วงศ์ [วงสะ-, วง] น. เชื้อสาย, เหล่ากอ, ตระกูล. (ส. วํศ; ป. วํส). Or only the modern spelling may have an entry, and the archaic spelling is given in parentheses at the end of the definition, e.g. กำสรวล [–สวน] (แบบ) ก. โศกเศร้า, คร่ำครวญ, ร้องไห้, เช่น ไทกำสรดสงโรธ ท้ยนสงโกจกำสรวลครวญไปพลาง. (ม. คำหลวง ทานกัณฑ์). (โบ กำสรวญ).
One of the key things I learned from this section is why sometimes RID gives the spelling of the original word, like (ข. โปฺรส), and why sometimes it just gives the language, like (ป., ส.). If no spelling of the source word is given, then (so RID claims) the Thai spelling maintains the original spelling. In other words, it's a transliteration from the source language, which is often the case with Pali and Sanskrit words.

By doing a simple analysis of the (very flawed) full online text of the dictionary, here's a count of Thai word origins from several languages:
Pali: 5183
Sanskrit: 4838
English: 982
Khmer: 427
Chinese: 109
French: 23
Malay: 22

Mistakes in my counting aside, clearly this is a significant weakness of RID.

Point 3 notwithstanding, it often holds that if no connection is certain, no etymological info is given. The (เทียบ X) note is used just over 100 times. Even older but well-known loanwords, like the Thai numbers เอ็ด, ยี่ and สอง through เก้า, which were borrowed from Chinese, are implicitly claimed as Thai. This may have been acceptable 50+ years ago, when the conventional wisdom among Thai scholars was that Thai was a relative of Chinese, but it's hard to excuse nowadays.

In addition, to say there are only 400 words from Khmer in Thai is comical, and I'd be surprised if Malay has really had less of an influence than French--even in Bangkok. The English figure above can't be trusted at all, because RID doesn't have an automatic way to distinguish between words which are transliterations from English (e.g. โฮเต็ล) and words which are translated from English (e.g. โทรทัศน์). I'd have to go through and do a manual count to know that.

The number of Indic loans, at roughly 10,000 words, makes up 25% of the dictionary's total entries, which sounds reasonable. I'm sure that a relatively small number of these make up more than 25% of actual word usage in Thai based on frequency, however. It's also worth mentioning that many Indic words--or alternate versions of them--came into Thai by way of Khmer, which is ignored in RID.

All said, RID is a decent beginning source, if woefully incomplete. Mostly their analysis is simplistic, ignoring how and when a particular word came into Thai, as well as failing to give the meaning of the word in the original language.

Oh, yeah. And Happy Valentine's Day!


  1. I'm interested in how you got those figures Rikker, is there a way of searching the RID by etymology that I'm missing ?

  2. I derived them from the full text of the dictionary available on the site. I have a roughly parsed and tagged version of it that I use for quick searches like this.

    The basic idea is to search for patterns like (ป. or (อ. as well as things like (เทียบเขมร. It doesn't quite catch them all, and in some cases returns false positives.

    I tried to check them as much as I reasonably could without spending hours on it. They're not perfect, but they're roughly accurate, I think.

    Technically speaking, you can do a full text search on the RID's actual site (the lower search box) using a string like \(จ\. but they've got that ridiculous setup whereby they give you relevance-ranked results limited to five hits per letter of the alphabet. Not particularly useful.

  3. You are a genius. I don't know if I would ever have been able to dig into this 'know your dictionary' stuff without your help.