November 6, 2010

Project Gutenberg Thailand: Liberating public domain Thai literature

[Read the PGT 2014 Update!]

For a few years now it's been one of my goals--it's been so long I should probably say 'dreams'--to start Project Gutenberg Thailand, a repository for public domain literature in Thai and about Thailand. The founder of the original Project Gutenberg, Michael Hart, encourages such spin-off sites, and was enthusiastic when I contacted him about the idea back in 2007.

The closest thing that currently exists is Thai Wikisource, but it has little in the way of modern literature, instead having mostly selections of classical Thai verse and public domain government documents. As far as I know, there is no existing movement to identify and disseminate more recent public domain Thai works.

Owing to a number of reasons, however, not the least of which being my own lack of sufficient free time, nothing has ever gotten off the ground.

Last year, frustrated with my inability to make any headway on this project, I began compiling a list of Thai authors whose works are in the public domain. To put it simply, under Thai law a book is copyrighted until 50 years after the author's death.

With such a relatively short copyright term, the works of many well-known 20th century authors have entered the public domain. Unfortunately it too often seems to be those authors who died young. Yakob ยาขอบ, author of the immortal Conqueror of the Ten Directions ผู้ชนะสิบทิศ, died in 1956 at age 48. And two early novelists born in 1905 failed to reach middle age -- Prince Akartdamkoeng มจ.เจ้าอากาศดำเกิง penned such well-remembered tomes as The Circus of Life ละครแห่งชีวิต before killing himself at 25; while Mai Mueangdoem ไม้ เมืองเดิม, of The Old Wound แผลเก่า fame, met his fate at 37. Such authors are no less significant for modern Thai culture than the Fitzgeralds and Steinbecks of American culture, and while they can be still be found in print, it is a shame that such works aren't yet available as free ebooks.

I hear you asking, "So what needs to be done to set these works free?" (I have excellent hearing.) "And how can I help?"

Well, we need to put in place the process for taking the paper books and turning them into electronic text. The basic steps are as follows:

1. Get the book and scan each page, to create images.
2a. Use OCR [optical character recognition] software to turn the images into digital text.
OR
2b. Have humans type out the text contained in the images.
3. Have humans proofread the digital text by comparing it side-by-side with the original image.

This process is well-established for English. Project Gutenberg has a sister website, called Distributed Proofreaders (pgdp.net) that crowdsources this work for books in Latin-alphabet languages. OCR for English is extremely accurate. Until 2009, OCR for Thai was miserably poor, but nowadays it's rather good. In other words, the time is finally right to start digitizing books for Project Gutenberg Thailand.

With the original Project Gutenberg, anyone can sign up to help via the Distributed Proofreaders site. For Project Gutenberg Thailand, we must build a similar community of people willing to contribute a little bit of time here and there to help liberate public domain Thai books from their paper prisons.

So here I am, writing this blog post, hoping to drum up help to get the ball rolling.

The most immediate need is a sympathetic soul with web programming chops to help create the website for crowdsourced Thai proofreading and/or typing. I've put a fair amount of thought into the core features needed, but I lack the programming and design skills needed to make it a reality.

If you are interested in helping with this effort, come join the Google Group, let me know on Twitter at @thai101, or email me at rdockum [at] gmail [dot] com.

No comments:

Post a Comment