August 8, 2007

More issues with RID online

Following up on two previous posts about the new online version of the 1999 Royal Institute Dictionary (พจนานุกรม ฉบับราชบัณฑิตยสถาน พ.ศ. 2542), I have a bit more to report on.

There's something off with their wildcard searching. Specifically, if you use a wildcard at the beginning of the search string, it doesn't work properly.

Compare, for example, the searches for ใจ* and *ใจ. One would expect the first to return results such as ใจร้อน and ใจเย็น. The second, words like น้อยใจ and สบายใจ.

Indeed, the search for ใจ* returns 60 results (I had to count--there's another feature request already--tell me the number of search results!). Including both the words I expected. It appears to be a complete and thorough search of the dictionary.

But *ใจ returns six results: ดลใจ ดีใจ ดีเนื้อดีใจ ดีอกดีใจ ดูใจ ได้ใจ. All from the letter ด. When you put the wildcard at the beginning of the string in the main search window, somehow it only searches this one letter of the alphabet.

There is a (tiresome) workaround--you can click on any letter of the alphabet to do a letter-specific search. I tried น and got 6 results, ช returns 7 results, etc. But it's highly inefficient to go clicking through every letter of the alphabet.

So why is there a problem with this? Well, if you think about it, this is a special kind of search. From the regular search box on the main page, this is the only type of search that requires searching the entire contents of the dictionary, since the desired results could begin with any letter. Any other regular word search allows the site to return quick results, by simply looking at the first letter and kicking the query straight to that letter of the alphabet.

One idea would be to use a simple radio button for search queries, allowing you to search for the whole word (currently the default), or partial word (which would be equivalent to searching for both ใจ* and *ใจ at the same time). Of course, even then it would be nice to still have wildcards available, to be able to search only for one or the other.

Now, a few things about definition search, which, it should be noted, is not full text search. It will only search the text of definitions. (Feature request #2: true full text search.) Now, wildcards may be unnecessary in this sort of search, since it's searching the full text, but I tried it anyway to see what happened. If you use a wildcard at the beginning of the search, it fails entirely:

Microsoft JScript runtime error '800a139a'

Unexpected quantifier

/new-search/meaning-search-all.asp, line 20234

They simply haven't programmed it to expect this sort of search, which would be a simple thing to fix.

Next, I tried searching ใจ and ใจ* to see if the results are the same. They're not. Searching ใจ* returns false results (i.e. they don't have ใจ anywhere in them). Exactly how many is difficult to assess, because the dictionary is unfortunately programmed in a most unhelpful way. Definition search performs an independent search for every letter of the alphabet, but only returns the top five most relevant results. That's right--it doesn't give you results alphabetically, but rather by relevance rank (which are also included in all their sixteen-decimal-point, screen-real-estate-wasting glory). And there's no option to change or tweak this behavior (especially the 5-result-per-letter limit--talk about handicapping your own product).
Here's a screenshot:



So back to the false results when doing a definition search for ใจ*. Of the top five results for ก, only 3 actually contain ใจ. But of the false results, both contain the vowel ใ. It appears that the presence of the asterisk is causing it to analyze other characters that affect the relevance ranking, since * can be any character or string of characters. The search algorithm should be told either (a) to ignore * in the case of definition searches and just search ใจ, or (b) disqualify any results that don't contain the full text string. The math behind relevance ranking is getting in the way of the usability.

I'll be shooting off an email to the Royal Institute about this as soon as I get a second. In the meantime, drop a comment if you have any other additions (or subtractions) regarding my analysis.

No comments:

Post a Comment