Showing posts with label know your dictionary. Show all posts
Showing posts with label know your dictionary. Show all posts

February 14, 2008

Know Your Dictionary: Etymology in RID99

[Note: See also in this series introduction, sorting, orthography, symbols, pronunciation guides, and word senses. Forthcoming: list of abbreviations.]

We're nearing the end of my translations of the frontmatter from RID99 (พจนานุกรม ฉบับราชบัณฑิตยสถาน พ.ศ. ๒๕๔๒). Etymology's on the agenda today. The original Thai for this section is here. There is actually more to translate in the dead tree version of the RID99, including a synopsis of the history of RID. But the electronic text of that extra stuff isn't included in the web version, which makes it a little harder to work with. Maybe I'll get around to translating that in the future, though.

In this post I've tried something new: the blue text is direct quotes of full or partial entries from the dictionary, for easier scanning. Scattered individual Thai words are still black.

Part 6: Etymology

1. The origin of a word is given at the end of the entry for that word, as an abbreviation in parentheses e.g. สทึง [สะ-] น. แม่น้ำ, ใช้ว่า จทึง ฉทึง ชทึง ชรทึง สทิง หรือ สรทึง ก็มี. (ข. สทึง ว่า คลอง). or การะบุหนิง น. ดอกแก้ว. (ช).

2. Any word that is given as being from another language in fact does not correspond exactly with the source word, because the words from languages such as Pali, Sanskrit, or Khmer that are borrowed in Thai are usually either shortened, changed orthographically, or changed phonetically, e.g. ธมฺม (Pali) and ธรฺม (Sanskrit) correspond to Thai ธรรม; โปฺรส (Khmer) corresponds to Thai โปรด. In giving the etymology, sometimes the spelling in the original language is given as well, e.g. ธรรม has (ส. ธรฺม; ป. ธมฺม), or โปรด has (ข. โปฺรส), in order to compare the original spelling with the spelling used in Thai. For loanwords which are written very close to the original language, only the source language is given, e.g. กฏุก only gives (ป.), and ศิขร only gives (ส.). If a word is both Pali and Sanskrit, then both languages are given, e.g. รจนา has (ป., ส.). If a word is partially Pali and partially Sanskrit, then the original spellings of both Pali and Sanskrit are given, e.g. ปราโมทย์ gives (ส. ปฺรโมทฺย; ป. ปาโมชฺช). If the spelling is Pali, but is very similar to Sanskrit, e.g. หทัย, then it is given as (ป.; ส. หฺฤทย), or if the spelling is Sanskrit but very similar to Pali, e.g. สตัมภ์, then it is given as (ส. สฺตมฺภ, สฺตมฺพ; ป. ถมฺภ).

3. Any word for which the language of origin is uncertain, but is written similarly to another language, it is given is parentheses to compare with this language or that language, e.g. กำปั่น น. เรือเดินทะเลขนาดใหญ่ชนิดหนึ่ง... (เทียบมลายู หรือฮินดูสตานี ว่า capel).

4. Some archaic words are written one way, but nowadays the spelling has changed, in which case both the archaic and the modern spelling may be included, e.g. วงษ์ (โบ) น. วงศ์. วงศ- and วงศ์ [วงสะ-, วง] น. เชื้อสาย, เหล่ากอ, ตระกูล. (ส. วํศ; ป. วํส). Or only the modern spelling may have an entry, and the archaic spelling is given in parentheses at the end of the definition, e.g. กำสรวล [–สวน] (แบบ) ก. โศกเศร้า, คร่ำครวญ, ร้องไห้, เช่น ไทกำสรดสงโรธ ท้ยนสงโกจกำสรวลครวญไปพลาง. (ม. คำหลวง ทานกัณฑ์). (โบ กำสรวญ).
One of the key things I learned from this section is why sometimes RID gives the spelling of the original word, like (ข. โปฺรส), and why sometimes it just gives the language, like (ป., ส.). If no spelling of the source word is given, then (so RID claims) the Thai spelling maintains the original spelling. In other words, it's a transliteration from the source language, which is often the case with Pali and Sanskrit words.

By doing a simple analysis of the (very flawed) full online text of the dictionary, here's a count of Thai word origins from several languages:
Pali: 5183
Sanskrit: 4838
English: 982
Khmer: 427
Chinese: 109
French: 23
Malay: 22

Mistakes in my counting aside, clearly this is a significant weakness of RID.

Point 3 notwithstanding, it often holds that if no connection is certain, no etymological info is given. The (เทียบ X) note is used just over 100 times. Even older but well-known loanwords, like the Thai numbers เอ็ด, ยี่ and สอง through เก้า, which were borrowed from Chinese, are implicitly claimed as Thai. This may have been acceptable 50+ years ago, when the conventional wisdom among Thai scholars was that Thai was a relative of Chinese, but it's hard to excuse nowadays.

In addition, to say there are only 400 words from Khmer in Thai is comical, and I'd be surprised if Malay has really had less of an influence than French--even in Bangkok. The English figure above can't be trusted at all, because RID doesn't have an automatic way to distinguish between words which are transliterations from English (e.g. โฮเต็ล) and words which are translated from English (e.g. โทรทัศน์). I'd have to go through and do a manual count to know that.

The number of Indic loans, at roughly 10,000 words, makes up 25% of the dictionary's total entries, which sounds reasonable. I'm sure that a relatively small number of these make up more than 25% of actual word usage in Thai based on frequency, however. It's also worth mentioning that many Indic words--or alternate versions of them--came into Thai by way of Khmer, which is ignored in RID.

All said, RID is a decent beginning source, if woefully incomplete. Mostly their analysis is simplistic, ignoring how and when a particular word came into Thai, as well as failing to give the meaning of the word in the original language.

Oh, yeah. And Happy Valentine's Day!

January 17, 2008

Know Your Dictionary: Word Senses in RID99

[Note: See also in this series introduction, sorting, orthography, symbols, and pronunciation guides. Forthcoming: etymology and list of abbreviations.]

For this installment of Know Your Dictionary we have the section called
simply "ความหมาย" in the original introduction to Royal Institute Dictionary 1999 (พจนานุกรม ฉบับราชบัณฑิตยสถาน พ.ศ. ๒๕๔๒). Now, ความหมาย means "meaning(s)". However, I decided to go with the word "sense" because it's commonly used in a dictionary context to refer to distinct meanings of an entry. Also, sometimes RID99's use of sense or meaning (ความหมาย) vs. definition (บทนิยาม) doesn't perfectly match how I would use them in English. As in other installments, I have tried to translate closely and consistently, but I may have made a few concessions for style. (You can read the original Thai here.)

Part 5:
Word senses
1. In defining words with many senses, definitions that are constantly used and are thought to have prominent meanings are normally ordered first. But there are some exceptions where it is intended to show the history of senses, in which case the sense that is thought to be the original sense might be first, followed by the origin, opposites, or collocates (if there are any). Effort has been made to give specific examples for uncommon words.

2. For words with numerous senses that include plant or animal names, definitions of plants and animal names are separated from other definitions, by elevating that word as a separate headword, see for example แก้ว.

3. Abbreviations in parentheses tell the characteristics of a word used in a specific context, e.g. (โบ) (แบบ) (กฎ). If it comes before the part of speech, that means that every definition is used on only in the context specified in parentheses, e.g. เข้า ๒ (โบ) น. ข้าว; ขวบปี. If it comes after the part of speech, only the definitions before the semicolon are used in the context specified in parentheses, e.g. ข้าราชการ น. (โบ) คนที่ทำราชการ ตามทำเนียบ; ผู้ปฏิบัติราชการในส่วนราชการ;...

4. Words that are the plant and animal names follow these definition rules:
A. Plants from different families with the same name are defined under the same headword, but are differentiated by a number in parentheses, e.g. กระโดงแดง น. (๑) ชื่อไม้ต้นขนาดใหญ่... (๒) ชื่อไม้ต้นขนาดกลาง ...

B. Animals with the same name that are the same type of animal, but are from different species or families, are defined under the same headword, but separated by a number in parentheses, e.g. กด ๒ น. (๑) ชื่อปลาไม่มีเกล็ด ... (๒) ชื่อปลาน้ำจืดบางชนิด ...

C. Animals with the same name but are a different type of animal are not defined under the same headword, but are given different headwords by type along with separate definitions. See for example แก้ว and จะละเม็ด.


5. Inclusion of sub-entries of plant and animal names follows these rules:
A. Sub-entries of plants or animals are of the same family as the headword, e.g. กระโดน has the subhead กระโดนดิน, which is a plant of the same family, or, หมอ ๓ has the subhead หมอตาล, which is an animal of the same family.

B. If a plant or animal name is a class term for plants or animals with similar features of many species or many families, sub-entries must have a meaning related to the headword, which may be of a different species or family from the headword, e.g. จันทน์ has the subheads จันทน์กะพ้อ, จันทน์ขาว, จันทน์ชะมด, etc., or, เขียว ๓ has the subhead เขียวหางไหม้.

From point 1 we learn about sense ordering. This is important because dictionaries generally fall into one of two camps: historical order or frequency order. In the first camp, senses are ordered by earliest known text citation (OED, for example); in the second camp, the most commonly used senses are listed first. We learn that RID is a mix of both. I'm not sure how common this type of organization is, but I find it distressing because so far as I know there is no way to know which is which. We can't assume one or the other without some kind of notation. So as I see it, this boils down to mean that, unfortunately, there is not a ton we can learn from sense ordering in RID99. If the first sense is a commonly used sense, it's probably safe to assume that they are ordered by (rough) frequency. Unfortunately again, I don't think there is any empirical evidence behind their ordering, so even the frequency ordering is probably based only on the sense of the committee members as to which words are more common.

The rest is mostly explanation of notation, which is useful. Like how to interpret the scope of a given usage note based on surrounding punctuation (see point 3), or the genetic relationships of certain flora and fauna with their sub-entries.

January 5, 2008

Know Your Dictionary: Pronunciation Guides in RID99

[Note: See also in this series introduction, sorting, orthography, symbols, and word senses. Forthcoming: etymology and list of abbreviations.]

I'm back again with another installment in the Know Your Dictionary series, in which I am translating the introduction to the Royal Institute Dictionary, 1999 Edition. The remaining installments will be coming down the pipeline quickly. They're all translated, but I've been staggering them to avoid entirely chasing away readers who prefer less, ahem, arcane topics. The original Thai for this translation is here.

Part 4: Pronunciation guides

Pronunciation guides follow these rules:
1. Words that end with the basic final consonants, e.g. แม่กน* spelled with น, แม่กบ spelled with บ, as in the words คน or พบ, are not given a pronunciation guide.

2. Words with ญ ณ ร ล ฬ pronounced like น, words with ข ค ฆ pronounced like ก, words with จ ช ฎ ฏ ฐ ฑ ฒ ต ถ ท ธ ศ ษ ส pronounced like ด, words with ป พ ฟ ภ pronounced like บ--all four types are given pronunciation guides if the spelling causes ambiguous pronunciation.

3. Pali and Sanskrit words that are samasa (สมาส) compounds must usually be pronounced according to compounding rules, i.e. pronouncing the final syllable with a compound อะ vowel; this type of word is given a pronunciation guide, e.g. ทารุณกรรม [ทารุนนะกำ], สุขนาฏกรรม [สุกขะนาดตะกำ], รูปธรรม [รูบปะทำ]. For words which have developed two pronunciations, i.e. a rule pronunciation and a popular pronunciation, the rule pronunciation is given first, e.g. ประวัติศาสตร์ [ปฺระหฺวัดติสาด, ปฺระหฺวัดสาด], มัธยมศึกษา [มัดทะยมมะ-, มัดทะยม-], อุดมการณ์ [อุดมมะ-, อุดม-].

4. Pronunciations with the phinthu (พินทุ) dot beneath a letter means either:
A. That letter is a "leading letter" and is not pronounced, i.e. a phinthu dot is placed below ห to prevent alternate pronunciations that have different meanings, e.g. เหลา [เหฺลา], เหย [เหฺย], แหงน [แหฺงน].
B. That letter is part of a consonant cluster, which there are three of in Thai, i.e. ร ล ว; the phinthu dot is placed below the first letter of the cluster to cause the two consonants to be read as a cluster, e.g. ไพร [ไพฺร], ปลอบ [ปฺลอบ], กว่า [กฺว่า].

5. Words from other languages which formerly were pronounced with consonant clusters and when borrowed into Thai are pronounced as two syllables are given Thai pronunciation guides, e.g. เสด็จ [สะเด็ด], พยาบาท [พะยาบาด], แสตมป์ [สะแตม].

6. For sub-entries with possibly ambiguous pronunciation because the pronunciation is not the same as that of the headword, a pronunciation guide is given for the sub-entry, e.g. กล, กล- [กน, กนละ- ] น. ... กลไก [กน-] น. ... กลฉ้อฉล [กน-] น. ... กลบท [กนละบด] น. ...
To be honest, I have trouble understanding why the Royal Institute doesn't do a better job with pronunciation guides in their dictionary. It betrays the underlying assumption that anything significant in the pronunciation of a Thai word can be captured by the existing orthography. But this ignores some important things. Among them:

1. Syllable stress. This can be non-transparent in compounds, so it should be made explicit.

2. Vowels written long but pronounced short (for example, เส้น is short but เสน is long; แห่ง is short but แห้ง is long. While there is a regular pattern to this sort of thing, their approach assumes the reader inherently knows this, but this isn't obvious and rarely taught to second-language learners.

3. Words with short vowels but no final glottal stop. For example, the particle นะ, among others. Compare with ณ, which has a final glottal stop, and you'll see what I mean. This is also important because some words must always be pronounced with a glottal stop, in both careful and fast speech (e.g. สะใจ), while huge numbers of words lose their glottal stops in normal/fast speech or in compounds, but not always predictably so (e.g. สาระ--yes glottal vs. สารพัด--no glottal vs. สารสนเทศ--yes glottal). It's time for a publication as venerated as RID to start differentiating. Assuming users know the right answer is not the answer.

In English dictionaries, for as long as I've been using them (that's my cop-out way of saying I don't know when this innovation was introduced), the basic alphabet has to be augmented in order to systematically represent all of the sounds possible in the language. Lexicographers of English use breves and macrons and other diacritics and punctuation to try to best represent the spoken language on paper. RID gets away with using only the พินทุ (the little dot placed below a letter to indicate a consonant cluster). Unfortunately, that doesn't really cut it. It's in instances like this that it's clear that RID is intended for native users. I'm sure the scholars at the Royal Institute could devise a better system using Thai orthography. And it's okay if it takes users some learning to be able to interpret. I remember often having to flip to pronunciation guide on the inside front cover of my old Merriam-Webster Collegiate as a kid to check if a certain symbol stood for a as in father or a as in cat. Nothing invested, nothing gained.

Well, enough of my cranky ranting. If you can think of any other insights into the RID's system, leave a comment. Tune in next time for the section about word senses. Before that, though, I have a couple of gems from the Bangkok Recorder to share, in the long-awaited (by me, if no one else) return of the feature Old News [see previous installments of Old News about Siam's first advertising and the original Siamese twins].


*This format of แม่ X (แม่กก แม่กง แม่กน แม่กด etc.) is used in schools to teach Thai children about which final consonants make what sound. So แม่กด consists of words ending with ด ต ท ถ, and so forth). It's difficult to succinctly translate, but when it says "
แม่กน spelled with น" what it means is final consonants that are pronounced น that are also spelled with น. I chose to translate this concept as the "words that end in the basic final consonants".

November 30, 2007

Know Your Dictionary: Symbols in RID99

[Note: See also in this series introduction, sorting, orthography, pronunciation guides, and word senses. Forthcoming: etymology and list of abbreviations.]

Lest you, dear reader, should run out reading material, let's plow ahead with part four of the Know Your Dictionary series (the original Thai can be found here):

Part 3: Symbols
1. Comma ( , )
A. Used between similar senses of a word within a definition, e.g. กระตือรือร้น ก. รีบร้อน, เร่งรีบ, ขมีขมัน, มีใจฝักใฝ่เร่งร้อน.
a. Separates the definition from synonyms, e.g. เข้าโกศ ก. บรรจุศพลงในโกศ, ลงโกศ ก็ว่า.
b. When a synonym comes after a sense followed by a semicolon (;), it means that synonym applies only to the sense after the semicolon, e.g. ไข่ข้าว น. ไข่ที่ฟักไม่เป็นตัว ต้มแล้วแข็งและเหนียวผิดปรกติ; ไข่ปอกเสียบ ไม้ปักไว้บนยอดบายศรี, ไข่ขวัญ ก็เรียก. (ดู ขวัญ).

B. Used after the last sense before an illustrative example to show that the example applies to all preceding senses, e.g. ขวย ก. กระดาก, อาย, เช่น แก้ขวย ขวยใจ. If there is no comma, it shows that the example applies only the immediately preceding sense, e.g. ขีดคั่น ก. ขีดกั้นไว้, กำหนดไว้โดยเฉพาะ เช่น อ่านหนังสือไปถึงไหนแล้ว ให้ทำเครื่องหมายขีดคั่นไว้.

C. Used after the last definition before more details about the headword, e.g. ถลอก [ถะหฺลอก] ก. ลอกออกไป, ปอกออกไป, เปิดออกไป, (มักใช้แก่สิ่งที่มีผิว) เช่น หนังถลอก สีถลอก.

D. Used between etymological abbreviations, particularly those from Pali and Sanskrit whose orthography is the same as the headword, e.g. ทวิ has parentheses giving the etymology as (ป., ส.).


2. Semi-colon ( ; )
A. Used between each definition that has many senses, and those senses are different but are still related to the original sense, e.g กิ่ง น. ส่วนที่แยกออกจากต้น, แขนง; ใช้เรียกส่วนย่อยที่แยกออกไปจากส่วนใหญ่ แต่ยังขึ้นอยู่กับส่วนใหญ่ เช่น กิ่งอำเภอ กิ่งสถานีตำรวจ; ลักษณนามเรียกงาช้างว่ากิ่ง; ชื่อเรือชนิดหนึ่งในกระบวนพยุหยาตรา.

B. Used between definitions of a word with unrelated meanings, e.g. เจริญ [จะเริน] ก. เติบโต, งอกงาม, ทำให้งอกงาม, เช่น เจริญทางพระราชไมตรี เจริญสัมพันธไมตรี, มากขึ้น; ทิ้ง เช่น เจริญยา, จำเริญยา ก็ว่า; ตัด เช่น เจริญเกศา, จำเริญเกศา ก็ว่า; สาธยาย, สวด, (ในงานมงคล) เช่น เจริญพระพุทธมนต์.

C. Used at the end of a definition, before the synonyms, for headwords with various definitions, to show that those synonyms apply to all senses of the headword, e.g. ปทัสถาน น. แบบแผนสำหรับยึดถือเป็นแนวทางปฏิบัติ; เหตุที่ตั้งเป็นเครื่องถึง, เหตุอันใกล้ที่สุด; บรรทัดฐาน หรือ ปทัฏฐาน ก็ว่า. (ส.; ป. ปทฏฺ?าน).

D. Used between etymological abbreviations, particularly those form Pali and Sanskrit whose orthography is different from the headword, e.g. ศีรษะ has parentheses giving the etymology as (ส.; ป. สีส), and เขม-, เขมา has the etymology (ป.; ส. เกฺษม).


3. Hyphen ( - )
A. Used in place of the omitted first part of a double, e.g. –กระส่าย ใช้เข้าคู่กับคำ กระสับ เป็น กระสับกระส่าย. –กระเฟียด ใช้เข้าคู่กับคำ กระฟัด เป็น กระฟัดกระเฟียด.

B. Used after Pali or Sanskrit words to show that other words can be affixed to them, e.g. อัคร- สม- ศาสตร-

C. Used in place of unambiguous syllables of a pronunciation guide, e.g. ชบา [ชะ-] or ยี่หร่า [–หฺร่า]

D. Used between each syllable of a pronunciation guide for words with ambiguous pronunciation, e.g. เพลา [เพ-ลา] or เสมา [เส-มา]


4. Period ( . )
A. Used at the end of a definition, e.g. กำแหง [–แหง] ว. แข็งแรง, กล้าแข็ง, เข้มแข็ง. ก. อวดดี.

B. Used after parentheses which give the etymology of a word or source of a citation, e.g. กำจาย ๑ ก. กระจาย. (ข. ขฺจาย). กระโสง (กลอน) น. ปลากระสง เช่น กระโสงสังควาดหว้าย ชลา. (สรรพสิทธิ์).
And the commentary:

Insofar as these rules are consistently followed in the dictionary, these symbols are very important to understand for making fine distinctions within the dictionary entry. There are still some unfortunate ambiguities, like points A and B under the semi-colon section (it would be nice if they distinguished these).

Also, the use of the hyphen in headwords is extremely important and one of the things RID does very well, compared with many or most other dictionaries.

If a complex dictionary like this one, complete with this explanation, isn't an argument for the introduction of more Western-style punctuation into Thai, I don't know what is. Thai would lose some of its mystique, perhaps, and no doubt some would claim its "Thainess", but communication would certainly be improved. Reading flowing Thai text without tripping up is hard! So, good thing the Royal Institute took the time to think through and decide to use punctuation.

November 28, 2007

Know Your Dictionary: Orthography in RID99

[Note: See also in this series introduction, sorting, symbols, pronunciation guides, and word senses. Forthcoming: etymology and list of abbreviations.]

In part three of this series I present my translation of the Royal Institute's rules for Thai orthography from the introduction to RID99 (พจนานุกรม ฉบับราชบัณฑิตยสถาน พ.ศ. 2542). The original Thai can be found here.

Part 2: Orthography

1. The following new rules apply to reduplicated and repetitive final consonants:
A. For final consonants which are reduplicated, e.g. กิจจ, เขตต, จิตต, in the case that the final consonant has no accompanying vowel, cut off one of the consonants, leaving กิจ, เขต, จิต. Even if it is the prefix of a samasa compound, the final consonant can be pronounced a little bit without reduplication the letter, e.g. กิจกรรม, นิจศีล, จิตวิทยา. The final consonant is to be reduplicated only when accompanied by a vowel or other consonant, e.g. วักกะ, กิจจา, อัคคี, รัชชูปการ, บุคคล, ประภัสสร.
B. For final consonants with repetitive consonants from the final consonant group ฏ, e.g. รัฏฐ, อัฑฒ, in the case that the last consonant has no accompanying vowel, cut off the first of the two final consonants, leaving only the last consonant, e.g. รัฐ, อัฒ. Even if it is the prefix of a samasa compound, the final consonant can be pronounced without repetition, e.g. รัฐบาล, อัฒจันทร์. The final consonant is to be repeated only when accompanied by a vowel or other consonant, e.g. รัฏฐาภิปาลโนบาย, กุฏฐัง, unless the vowel on the last consonant is ิ, e.g. วุฑฒิ, อัฏฐิ, ทิฏฐิ, in which case cut off the first of the two final consonants, using only the last consonant, e.g. วุฒิ, อัฐิ, ทิฐิ. The full form from the original language is given for words of this type in parentheses after the word. When you find a word written differently from these rules, compare how the word would be written according to these rules, how it would be spelled, and look up that word, e.g. for the word จิตต์ or ทิฏฐิ, look up จิต or ทิฐิ.

2. The following rules apply to use of the vowel symbol ะ:

A. Words which in the original language have two consecutive initial consonants, but in Thai an extra อะ vowel is pronounced, are not to use ะ, e.g. ผจญ, ผทม.
B. Words from Pali and Sanskrit, for which the final syllable is to be pronounced with an อะ vowel, are to use ะ, e.g. ลักษณะ, ศิลปะ, สาธารณะ, หิมะ.
C. Words from other languages which have customarily been written with ะ will continue to be written with ะ, e.g. ระเบียบ is not written รเบียบ after the Khmer. Words which are of uncertain origin, if pronounced with the vowel อะ, are to use ะ in keeping the traditional Thai spelling.
D. Words beginning with the letter ส which has been altered to ตะ or กระ, even if not written with ะ in the original language, in Thai are to be written with ะ, e.g. สะพาน = ตะพาน, สะเทือน = กระเทือน.
E. Various words with an added ร, which are mostly used in poetry, if the original word uses ะ, are to use ะ after adding ร as well, e.g. จะเข้ = จระเข้, ทะนง = ทระนง. If the original word does not use ะ, it is not necessary to use ะ after adding ร, e.g. จมูก = จรมูก.

Therefore, for words that have previously been written with ะ, if not found, look them up without ะ.


3. The following rules apply to use of ไม้ไต่คู้:
A. Do not use ไม้ไต่คู้ for words which are modified from Pali and Sanskrit, e.g. เบญจ, เพชร.
B. Use ไม้ไต่คู้ for words which are pronounced short.

4. For words which have an initial consonant or consonant cluster, when expanded to two syllables, the second syllable will have the same tone as the original word, thus no silent ห is necessary, e.g. กลับ = กระลับ, กวัด = กระวัด, ตรวจ = ตำรวจ. Even if the word is borrowed from Pali and altered along these lines, no silent ห is used, e.g. กนก = กระนก.


5. For words from Pali and and Sanskrit which have several pronunciations and are commonly compounded with other words, several forms are given for convenience, e.g. ศิลป gives three forms: ศิลปะ-, ศิลป์, ศิลปะ. The form ศิลป- is used for compounding with other words, e.g. ศิลปกรรม, ศิลปศาสตร์; the form ศิลป์ is used for the pronunciation “สิน”, e.g. นาฏศิลป์; and the form ศิลปะ is used for standalone use, and for the desired pronunciation “สินละปะ”, e.g. ศิลปะการแสดง, งานศิลปะ.


6. For words written according to ancient orthography, e.g. วงง, วยง, อนน, เกรอก, which nowadays are written as วัง, เวียง, อัน, เกริก, if not found under the archaic spelling, look them up under the modern spelling.
And a bit of commentary:

These guidelines are particularly interesting and significant insofar as they do not simply apply to the organization of the dictionary. As the official dictionary of the Thai language, RID also functions as a standardizing, normative dictionary. If a word's spelling is changed in RID, the new spelling is supposed to be followed. By whom? Everyone, although in reality this is not the case. Several competing spellings may co-exist. But the Royal Institute doesn't generally go changing common spellings. In fact, perhaps the most significant spelling policy change between RID82 and RID99 is the explication of all word-final short /a/ vowels, in Indic-derived words. Whereas RID82 would, say, have the headword ภว with the pronunciation [พะวะ], RID99 would simply have the headword ภวะ, with no pronunciation, since the new spelling's pronunciation is transparent.

Another thing that occurs to me is that since people often take license with them, or because they may have been designated when a different spelling was in use, proper nouns (particularly surnames) are both a historical spelling record and a rich source of former or alternate spellings. For example, in rule 3A for ไม้ไต่คู้ above, เพชร is given as an example. It was once alternately spelled เพ็ชร, and is still pronounced as if it were spelled that way. But that spelling hasn't been common for decades. So why does it turn up so darn many hits on Google? Take a look for yourself and you'll see all the proper nouns.

This also reminds me: In another post for another day, I'll discuss the Royal Institute's rules for transcribing English loanwords, and how these are reflected (or not) in the recent Dictionary of New Words. (Sneak preview: In their attempt to impose new systematic spellings for words that already have de facto standards in the media, the Royal Institute is really only confusing the situation and undermining standardization.)

November 17, 2007

Know Your Dictionary: Sorting in RID99

[Note: See also in this series introduction, orthography, symbols, pronunciation guides, and word senses. Forthcoming: etymology and list of abbreviations.]

Today, in part two of this series, is my English translation of the first part of RID99's introductory material. I'm happy to entertain questions about my translation (the original Thai is here). It is, for the most part, strict. I haven't generally attempted to revise the material to make it clearer, though I have occasionally translated loosely where I thought Thai structure or word choice unnecessarily obscure the meaning.


Part 1: Sorting and word collection method

1. Consonants are ordered by letter, i.e. ก ข ฃ ค up through อ ฮ. They are not ordered by sound, i.e. if you look up the word ทราบ, you must look in the ท section; if you look up the word เหมา, you must look in the ห section. ฤ ฤๅ are ordered after ร and ฦ ฦๅ are ordered after ล.

2. Vowels are not ordered by sound, but are ordered by symbol as follows: -ะ - ั -า - ิ - ี -ึ - ื -ุ -ู เ- แ- โ- ใ- ไ-. The many combined vowel symbols are sorted by the linear vowel symbol order as given above, resulting in the following order:


-ะ

-ั (กัน)

-ั
-ะ (ผัวะ)

-า

-ำ

- ิ

- ี

- ึ

- ื

-ุ

-ู

เ-

เ-ะ (เกะ)

เ-า (เขา)

เ-าะ (เจาะ)

เ- ิ(เกิน)

เ- ี (เสีย)

เ- ีะ (เดียะ)

เ- ื (เสือ)

เ- ืะ (เกือะ)

แ-

แ-ะ (แพะ)

โ-

โ-ะ (โป๊ะ)

ใ-

ไ-


The letters ย ว อ are always sorted as consonants.


3. Word sorting is ordered foremost by initial consonant, and then by vowel symbol. Words without a vowel symbol thus come first, e.g. กก comes before กะ, or ขลา comes before ขะข่ำ.
Words which have both consonants and vowel symbols are sorted by the same method as above, e.g. จริก จริม จรี จรึง จรุก, and normally are not sorted by tone marker, e.g. ไต้ก๋ง ไต้ฝุ่น ไต่ไม้; tone markers is counted in sorting order only for words which otherwise are spelled the same, e.g. ไต ไต่ ไต้ ไต๋, or กระตุ่น กระตุ้น. Words with the symbol -็ (ไม้ไต่คู้) are ordered before tone markers, e.g. เก็ง เก่ง เก้ง เก๋ง.

4. Among words which begin with กระ-, some words can only be spelled with กระ-, but others can also be spelled with กะ-. Those which can alternately be spelled with กะ- are also collected under กะ-, but only the words are given, without definitions. Therefore, for a word beginning with กะ in that section, see the definition under กระ-, e.g. กระทะ กระเปาะ, except for those which are spelled both ways with different definitions, e.g. กระแจะ-กะแจะ กระด้าง-กะด้าง, in which case definitions are given for both words.


5. There are words with an extra prefixed syllable as used in ancient compositions, e.g. มี่ as มะมี่, ริก as ระริก, ครื้น as คะครื้น or คระครื้น, แย้ม as ยะแย้ม, etc., according to the method called in Pali อัพภาส, and in Sanskrit อัภยาส, which translates as "method of overlapping letters", e.g. ททาติ or ททามิ. There are large numbers of these words, in some cases they are included under the prefix, e.g. คะครื้น is included under คะ, which states "prefixed to words which begin with the letter ค, with the same meaning as the original word". In other cases they are included according to their spelling, e.g. มะมี่, but all instances are probably not included, therefore if a word cannot be found under its spelling, see the original word, e.g. ยะแย้ม see แย้ม.


6. Some regional dialects truncate their speech, e.g. กะดะ shortened to ดะ (without กะ), กะง้อนกะแง้น shortened to ง้อนแง้น (without กะ), but the meaning is the same as the full word with กะ. Such words are kept only under กะ.


7. Words with reversed pronunciations, e.g. ตะกรุด as กะตรุด, ตะกร้อ as กะตร้อ, and ตะกรับ as กะตรับ, are normally kept in both ก and ต, but if not found in ก, look in ต.

8. The follow words are used often in poetry:
A. Words which append อา, อี or อิน to the end, e.g. กายา, กายี, กายิน.
B. Words with append เอศ to the end (in poetry terms this is called ศ เข้าลิลิต, making the word called a "toneless word" into a "first tone word" according to khlong poetry rules), e.g. กมเลศ, มยุเรศ.
C. Words which append อาการ to the end, e.g. จินตนาการ, คมนาการ, ทัศนาการ.
D. Words which append ชาติ to the end, e.g. กิมิชาติ, คชาชาติ.

These words typically have the same meaning as the original, and are collected in this dictionary, but perhaps incompletely, because there are so many. If not found under a given spelling, look under the spelling of the original, e.g. กายา, กายี, if not found under กายา or กายี, look under กาย. Whatever the meaning of กาย is, กายา and กายี have the same meaning. Look up other words after this same pattern.


9. Words with the same root that can take many forms, e.g. หิมวัต can can take the forms หิมวันต์, หิมวา, หิมวาต, หิมวาน, and หิมพาน without a change in meaning, are defined only under the original word, in this case หิมวัต. Words which have changed form from the original word are collected separately, but include a note to see the original word, e.g. หิมวันต์, หิมวา, หิมวาต, หิมวาน [หิมมะ]- น. หิมวัต.


10. Words which are subordinate names, e.g. ตะนอย, ช่อน, คา, are not included with the corresponding common noun as used in speech, e.g. มดตะนอย, ปลาช่อน, หญ้าคา, but rather the common nouns มด, ปลา, หญ้า are included according to their spelling, and subordinate names ตะนอย, ช่อน, and คา are included separately according to their spellings, except for words cannot be separated, because the whole word is the name of something, e.g. แมลงภู่, which is types of mussel or fish, and thus it is included whole under the letter ม; or, ปลากริม, which is a type of sweet, not a fish, is included whole under the letter ป. Nevertheless, there are some words which cannot be sorted according to these rules, therefore if a word after this pattern after this pattern is not found under the subordinate name, look it up under the common name, e.g. น้ำตาลกรวด is not found under กรวด, so see น้ำตาล.


11. When two words are compounded, with the first word the same as the headword, and has a meaning related to the headword, it is a subhead of that headword, e.g. กดขี่, กดคอ, กดหัว, are subheads of กด, except for compound words which have an independent meaning or a different meaning from the headword, in which case they are included as separate headwords, e.g. ขวัญอ่อน, meaning an easily startled person, i.e. a child or woman who tends to become frightened frequently, is included as a subhead of ขวัญ; whereas ขวัญอ่อน, referring to a type of dancing song, is a separate headword, because it has a different meaning. Words of this type are also numbered, e.g. ขวัญอ่อน ๑ and ขวัญอ่อน ๒. Compound words which have the same word-for-word meaning as the original words are not included, e.g. ข้าวผัด is not a subhead of ข้าว, because its meaning is the sum of its parts.


12. For words which can be compounded to either the beginning or end of other words, e.g. น้ำ, are compounded in various words, e.g. แม่น้ำ, ลูกน้ำ, น้ำใจ, น้ำต้อย, if the word that comes before the word น้ำ is spelled differently, it is ordered by spelling. For example, the word แม่น้ำ is ordered under ม; ลูกน้ำ is ordered under ล; they are not ordered under น. But if the word น้ำ comes first, it is ordered under น, as a subhead of the word น้ำ, e.g. น้ำกรด, น้ำแข็ง, น้ำย่อย.
[Disclaimer: I am not affiliated with the Royal Institute, nor have I made them aware of my translations. This is intended for educational and research purposes.]

November 16, 2007

Know Your Dictionary: RID99

[Note: See also in this series sorting, orthography, symbols, pronunciation guides, and word senses. Forthcoming: etymology and list of abbreviations.]

Today I'm kicking off a series of posts about what I often refer to on this blog as RID99. That's right, once again I'm talking about the venerable Royal Institute Dictionary. The most recent edition is the 2542 edition, which corresponds to the year 1999.* As the official standard dictionary of Thai, it is a must own for any serious student of the language. That is, unless, the somewhat flawed online edition is sufficient for you.**

It's known in Thai as พจนานุกรม ฉบับราชบัณฑิตยสถาน พ.ศ. ๒๕๔๒.*** And while the print version has plenty of flaws of its own, it's the best monolingual dictionary out there (although I'm fond of Matichon's dictionary, too).


I dont' know about you, but I had been using the dictionary a long time before I ever took the time to carefully read the introductory material. And as it turns out, there's quite a bit of good stuff to learn in those introductory pages. You can find them in the original Thai linked from the main page of RID99 online, underneath the headings for all the letters of the alphabet. It includes these sections:
  • การเรียงลำดับคำและวิธีเก็บคำ (Sorting and word collection method)
  • อักขรวิธี (Orthography)
  • เครื่องหมายต่าง ๆ (Symbols)
  • การบอกคำอ่าน (Pronunciation guides)
  • ความหมาย (Senses)
  • ประวัติของคำ (Etymology)
  • บัญชีอักษรย่อและคำย่อที่ใช้ในพจนานุกรมนี้ (List of abbreviations used in the dictionary)
My small contribution is that I have translated these into English for personal use, and I'm going to post my translations here, in hopes that they help you to get more out of your dictionary experience, as they did me. These explanations provide a better understanding of the thought process and effort that has gone into the structure and contents of the Royal Institute Dictionary, and the structure of the Thai language in general. As I have said, RID is not without its flaws, but see if you don't just learn a thing or two from it. Stay tuned.


*It was actually first published in 2003, but is still branded as such so as to associate itself with the auspicious year of H.M. The King's sixth cycle--72nd--birthday in that year, and to honor him.
**See my previous posts about RID99 online and its problems here, here, here and here.
***You can read a bit about the Royal Institute on Wikipedia, but be warned--I wrote most of the article, so you only have my word to take for it. Feel free to add to the article.