June 18, 2008

My idea: Thai Video Transcripts

I've been in the U.S. for a month with my family. If you're clever enough to find my family blog, you can read about it (I'm making it marginally more difficult by not linking you in order to deter lazy stalkers). You'd think I'd have more time to post here, but that's not how it's turned out.

But I'd like to share an idea of mine with readers here.

Presenting my demo, sandbox, not-ready-for-primetime version of Thai Video Transcripts. I've been playing with it off and on for a little while, and sent the link to a number of people who I thought might be interested in the idea, to ask for feedback. Many responded with excellent suggestions and wishlists of features for a site like this. (I apologize for not properly replying to some of those responses.)

So what's it for? It's a place to collaboratively transcribe Thai videos found on sites like YouTube and kosanathai.com.

Maybe that description underwhelms you. So allow me to start at the beginning of my thought process.

In the last year or so, an incredible number of Thai-language videos have cropped up on YouTube (commercials, movie trailers, music videos, TV show clips, even full TV episodes and full movies). I'm sure this increase is in part due to the Great YouTube Drought of Ought-Seven (you know, due to the Streisand Effect and all). Nowadays, you can watch any of Thailand's big network TV shows within hours of their live airing (super-serialized into YouTube-size chunks).

As a second-language speaker, fast colloquial speech in videos is some of the hardest to understand. Scripted dialogue in sitcoms, talk show banter, and off-the-cuff wordplay on variety shows are different from speaking with acquaintances live and in person, who we know and who know we may not always catch every word. And you can't stop the flow to ask about a certain word or phrase.

I often rewatch a clip several times, trying to catch the parts that go over my head. Sometimes I figure it out, sometimes I don't.

So the idea is simple: create a site where anyone (that includes you) can help transcribe the video clips that are already out there. You type out the parts you know, and other people help fill in the gaps. If you can't type in Thai yet, or aren't up to the task of transcribing, you can still benefit from the work that others do. It's a wiki, so it's easy to edit and open to everyone. It's a way to take video clips aimed essentially at native speakers, and add value to them for language learners.

You can help even if you don't know Thai, for that matter. You can create new pages for videos you'd like transcribed, which you can then copy-and-paste into websites like thai2english.com to help make at least some sense of the content.

This brings me to another feature which several people suggested: collaborative translation. I think this is a natural extension of my idea, but I'm concerned it would be like trying to run before walking. But that's the great thing. It's a collaborative site, so anyone is welcome to try what they want.

Here's a sample of the type of thing I've done on the sandbox version of the site:

[00:01] เจ็บที่สุด ... คือการเป็นอีโง่
[00:28] สายเลือดยากูซ่า[1]
[00:33] กับผู้หญิงทรยศ
[00:38] ให้กำเนิดเด็กพิเศษ
[00:38] มาตรวจดูแล้วนะครับ คิดว่าเขาน่าจะมีปัญหาในเรื่องการพัฒนาการทางสมองครับ
[01:01] ออทิสติก[2] ความบกพร่อง หรือพรจากสวรรค์
[01:13] จดจำทุกทักษะการต่อสู้
[01:28] เรียนรู้อย่างอัจฉริยะ
[01:51] จับตา
[01:51] ทุกการเคลื่อนไหว
[02:10] เล่นจริง
[02:18] เจ็บจริง
[02:38] จีจ้า ญานิน
[02:43] ฮิโรชิ อาเบะ ซูเปอร์สตาร์จากญี่ปุ่น
[02:50] ช็อกโกแลต
[02:55] ภาพยนตร์ โดย ปรัชญา ปิ่นแก้ว[3] จาก องค์บาก และ ต้มยำกุ้ง
[02:58] ตรุษจีน[4] 7 กุมภาพันธ์นี้ ทั้งประเทศ

[1] Yakuza, Japanese organized crime.
[2] Transcription of the English word 'autistic'.
[3] Thai film director and producer Prachya Pinkaew is known for such films as Ong-Bak and Tom-Yum-Goong.
[4] /trut ciin/, the Thai word for Chinese New Year. This is the release date for the movie (February 7, 2008).

Additional links
Chocolate at Wikipedia.
Chocolate at IMDb.
Get the idea? I've played around with different notations to indicate various things. Personal names in green, difficult or unusual words in red. I also toyed with italics for words shown onscreen (as opposed to spoken words)--this is a common issue for movie trailers), but I think it's kind of distracting. I'm not married to any of it, and I'm sure there are better ways of doing things that I haven't thought of. They don't have to be my ideas or my standards. Rather, whatever the community comes up with and decides works the best is what we'll use.

As I've said, this is just my sandbox version of my idea, but I'd like to invite everyone to come play around with it, too. Please, make new pages. Start transcribing, if you're so inclined.

I registered a domain name that I plan to eventually move this blog to. And I'll make a subdomain of that domain for the Thai Video Transcripts wiki, using Wikimedia software, which will allow me to better customize the features of the wiki (like hypertext footnotes, a la Wikipedia).
Whatever is done in the sandbox site will be moved to the final site when the time comes.

I've also been looking at other Wikimedia plug-ins that would let me do some pretty cool things. For example, if I'm reading their website correctly, Kaltura can overlay dynamic subtitles on flash videos, which would mean being able to turn a static transcript into live subtitles, which would be awesome (but I think that would require actually hosting the videos myself, instead of piggybacking on YouTube). Another problem for another day.

A few words about what I don't plan for this site to do: create videos. The whole point is to bring something to life as simply as possible that is self-sustaining and community based, built around a motivated group of Thai-language students (and, if we're lucky, some native speakers) who can collaborate to create a great learning resource for themselves and others.

So there you have it. My big idea (or, one of them). I think it's got a lot of potential. I'd be much obliged to receive any comments and suggestions you all may have.


  1. That's a most cool idea! I hope you will soon find some collaborators!

    Two notes:
    -- I couldn't find an RSS feed dedicated to announcing new content (to use the separate RSS feed for every interesting wiki page is cumbersome to say the least)

    -- also, it is difficult to follow the video and the Thai text at tha same time, one gets lost easily. Maintaining a separate youtube channel where you upload the duplicated and hardsubbed videos can help that. I'm doing a similar project, I download the youtube videos in .flv format, write the subtitles, embed the text into the videofile itself, and upload the hardsubbed video back to youtube. For this I use Linux software, but I'm sure they have their Windows equivalents: PyTube, Subtitle Editor, Avidemux.

  2. http://www.infoq.com/interviews/bryant-ruby-maglev-gemstone;jsessionid=42E4A7D0C69CC07C72E35EED2D5AFC89

    has a method for tracking transcripts as the video plays. I'm not sure if that requires locally hosted videos or not.

  3. I added one for M-Shop.
    Hope you like it ;-)

  4. @nyiti (gabor): Thanks, I hope I find a lot of collaborators. The nice thing is you don't even have to sign up to edit it (no spam problem yet), so anyone passing by can make a correction or an addition.

    About RSS feeds at the moment, this is one of the drawbacks of the sandbox version, which I have less control over. If you follow the "Recent Changes" link on the left sidebar, though, you should now see the RSS icon in your browser location bar. This feed tracks all changes, which means it's almost more information than you need, but it's still pretty handy. The direct URL is:


    And you're absolutely right about getting lost in the transcript. For now it works better as a reference, like 'what is that word they say at 1:35? rather than trying to follow along with whole clips.

    I'm hoping we can figure out a way to add an option for dynamic subtitles at some point. Hardcoding and resubmitting them to YouTube is certainly another option, though. But that raises the problem of whether to add hard Thai subs, hard English subs, or both. So "soft subs" would be ideal to the needs of the largest potential group.

  5. @rburns: I'll look into that.

    @anonymous: Thanks! Keep up the good the good work. :)

  6. Ah, I was wondering when this post was coming! I'll try and contribute sometimes when I can. I mentioned in my email to you that the videos weren't showing up, it looks like this is because TOT blocks nearly all youtube videos. If the videos were hosted anywhere but youtube it'd be much easier for us long-suffering TOT customers, and I'd imagine that's a fairly large proportion of all internet users in Thailand.

  7. Still, really? Yikes. Our ISP is True, and I haven't had any problem that I've noticed since September last year when YouTube was unblocked. For example, my wife discovered a few months ago that she doesn't have to stay up until 10:30 to watch the primetime soaps--she can catch them the next day on YouTube.

    As I understand it, True and TOT aren't truly competitors. Rather, TOT was recently privatized, and still state-owned, and True license owns/rents concessions on TOT's network. So their respective domains are geographically divided. When I signed up for hi-speed internet, at first I went to a TOT office, and they said my area was covered by True. Can someone confirm/refute/clarify this?

    Anyhow, I thought it was the MICT that did the blocking, so why would TOT block things that True doesn't? I never knew the Great YouTube Drought of Ought-Seven had extended into Ought-Eight.

    kosanathai.com has embeddable videos, so there's a start. I hope this situation will resolve itself soon. Life without YouTube! The horror! (Cue nasty Nam-style flashback.)

  8. Here's something funny: in this thread on Pantip just yesterday, a TOT user asked how to get around the YouTube block (which, by the way, is a crime enforceable by police raid and punishable by jailtime under current law--stay in school, kids).

    Besides the typical response about proxy sites, someone suggests that if you take the www. out of the address it works. So youtube.com works, while www.youtube.com doesn't. If that's true, that is truly hilarious.

    If only they could, say, block sites only when viewed in Internet Explorer. Or only when viewed while wearing a purple hat.

  9. Doesn't work for me unfortunately, nor does accessing by ip address, adding parameters to the url, using opendns or other similar tricks. Proxy server-ing is about the only thing that does, though it's pretty slow and inconvenient to do each time. Watching the videos on kosanathai.com is definitely a much more pleasant experience.

    I read a post on pantip saying that although ISP's aren't ordered to block youtube anymore, they're still requested to "ให้ความร่วมมือ" and do so. I guess as a state-owned agency TOT is more likely to acquiesce to this request than the privately owned ISPs.

    As for TOT vs True, I don't know, but my experience was the opposite to yours as I initially went to True and was directed by them to try TOT instead. They didn't say it was a solely TOT area though, rather just that "ตอนนี้ยังไม่มีคู่สายว่างครับ" and that I'd have to wait for a few months instead before they would. My impression is that although there are some locations that TOT is dominant in and others that True is, they are still in direct competition with each other in some locations.

  10. I added one here , but I think I've gone wrong somewhere as the video display seems a bit screwed up.

    The original one is http://www.kosanathai.com/tvcupdated/view_tvc.asp?GID=1&TVCID=28839&ad=Muang%20Thai%20Life%20Assurance

  11. Hi Mike,
    Thanks for posting the video.
    I had trouble translating the last two lines.
    guy waiting - ก็คำว่า "แป๊ปหนึ่ง" ของพี่มันนาน จะให้ผมพูดดีๆ หรือจะให้โยนพี่ออกไป
    announcer - ความบันเทิงมันฝังใจ เพราะบัตรเมืองไทยสไมล์คลับ มีกิจกรรมความสุขให้ลูกค้าทั้งปี เมืองไทยประกันชีวิต
    Could you put a translation up if it's not too much trouble?

    Also, ก็รออีกแป๊ปหนึ่งไม่ได้หรือไง - Can't you wait just a little longer?
    What does ไง mean? Is there a difference in meaning if it's left out here?

  12. This is great. It's already sparking conversations. :)

    This demonstrates to me that I need to move things to the "real" site quick, so we can have these conversations in the discussion pages of the site itself.

    Since I'm not sure how best to present translations yet, I'll just give you a translation here:

    guy waiting: Your "one sec" is a long time.. do you want me to ask politely, or just kick you out of the way?

    announcer: Entertainment that sticks with you--because Mueang Thai Smile Club Card has pleasant activities for its customers all year long. Mueang Thai Life Insurance.

    And finally, in the sentence ก็รออีกแป๊ปหนึ่งไม่ได้หรือไง - Can't you wait just a little longer?, ไง is shortened from ยังไง which is shortened from อย่างไร. Literally translated, "You can't wait another second, or what?" ไง corresponds to 'what'.

  13. @Anon,

    I did a translation myself while I was doing the transcribing, but didn't post it as I wasn't quite sure if that's in the spirit of what Rikker intends. It's basically the same as Rikker explained, but FWIW mine was...

    Guy waiting - Hey, are you going to be much longer ? I've got things to do too you know
    Guy at ATM - Can't you just wait a second longer or what ?
    Guy waiting - Your "second" is taking ages, is asking politely enough or am I going to have to push you out the way ?
    Announcer - Being entertaining sticks with you, because of all of Muang Thai Smile Card's year-round fun activities. Muang Thai Life Assurance.


    Thanks for starting this, it was quite fun doing the video and looks like it's going be a useful learning tool.

    I had an idea for the subtitles btw, with a bit of javascript programming and using the Youtube API, I reckon it should be possible to show subtitles that change dynamically outside the video (so no need for video editing and re-uploading). That would also mean you could have, say, checkboxes for showing any combination of Thai script /transliteration / translation subtitles, so people could choose what is most appropriate for them. If you're interested and want me to help out on the programming side of things then let me know.

  14. Mike, I didn't even know there was a YouTube API. Shows what I know. But reading that page, sounds like in theory it's definitely possible to do just what we're looking for. If you could help trying it out in practice, that would be excellent, thanks.

  15. English translations for the adverts would be useful too :)

  16. Ahhh, when I made that previous comment I thought I'd found a reliable way round the youtube block, but now it seems to have stopped working again. I'm moving apartment in a month or so though, and will take that opportunity to change ISP, I'll try and do the API integration then as it's just too frustrating to do it with the TOT block in place.

    But I'm not quite sure what you mean by "definitely possible to do just what we're looking for", perhaps I'm misunderstanding you. My impression of the API is that it's ok, but not great - there's apparently no way to dynamically embed subtitles in the video which would be best, but as is possible to programmatically detect when a video is started/stopped and the detect the current play time of the video, you could use that information to sync up subtitles and dynamically change them just below the video. Wouldn't be perfect perhaps, but should work quite well.

    @Anon, I just added a couple for the "waterfall" and "traffic jams" ads.

  17. Mike, I'm hearing good things about TT&T. They just doubled their speeds for the same price (but I'm not sure if this is a permanent upgrade or a promotion. I'm going to think seriously about switching when I get back. Of course, I need to ask one of their customers if YouTube is blocked first.

    As for the other comment, maybe I spoke too soon. It seems it's possible to do something along the lines of what we're looking for. Overlayed on the video would be better, but below the video could work, too.

    I'm encouraged by the response so far. Thanks, everyone, for your involvement.

  18. Thanks Mike.
    I try and translate/transcribe them myself first for practice and to improve my Thai, but sometimes they're just too difficult and then the translations are helpful.
    Also agree with Rikker that fast spoken Thai like this is the hardest to understand for a non-native speaker. If you're not brought up as a child listening to Thai every day, it's hard to hear the tones, and discern the sounds later on in life. At least I find so!

  19. Thanks Rikker, if you find out more about TT&T and their youtube policy can you post back here again ? I was thinking Maxnet as they seem to be getting positive comments on Thaivisa, but I'll wait and see when the time comes.

    Oh, and a suggestion too - I think it'd be good to have a way of identifying on the video index page which videos have a English translation, as that's obviously important to many people.

    @Anon, yep I agree with you, fast colloquial Thai is the hardest to understand. I think the TV comedy shows are generally the most difficult, with their fast speech, clever word play jokes and frequent distracting "boing" sounds that obscure what people are saying!

    I tried to transcribe a five minute clip from one of them at http://thai101.wikispaces.com/Pentor-SuayPrahan , but there's a few places where I couldn't quite catch what they were saying so any help is welcome there. I'll try a translation soon too if it'd be useful.

  20. @Mike: I think TT&T and Maxnet are the same--like, Maxnet is the service and TT&T is the company. No word on YouTube yet. I just got back into Bangkok last night, and I'll look into that here in the near future. I'm beginning to get scared about all things Google in Thailand, though, after hearing that Thailand apparently blocked All of Blogger for a couple days in an effort to block a single blog. I guess they need a lesson subdomains.

    I totally agree about the video index page, marking which have subtitles. Any ideas on how to most easily do this? I was thinking a little flag icon, so it wouldn't be too intrusive or lengthy. Having translations hugely expands the potential audience for this website, and encourages native
    Thais to get into the mix.

    Thanks for taking the time to post the Pen Tor clip--I've been too busy to get around to anything lengthy myself, plus I'm not sure I could even catch as much as you have. But then, that's the point of the site, right?

    One drawback to this setup that I've noticed that affects longer clips (or slower internet connections) is that if you edit the page you have to reload the video when you're done, making it a bit cumbersome. I get around this by opening video open directly in YouTube. Not sure if there's any way to avoid this issue, though.

  21. A little flag graphic seems like a good way to me too.

    As for the Pen Tor clip, no problem. It took a while to do but I figure it's all good practice, so it's useful to me even if no-one else! Being able to get corrected on mistakes definitely makes it better than just practicing individually too.

    What do you think would be the best way to a translation for that, add the English line-by-line under the Thai or keep the Thai and English in two separate blocks ?

  22. Having a site for translated Thai YouTube videos is a fantastic idea.

    I fell into translating a video (seemed like a good idea at the time).

    It's not a long video, but for a beginner who has to 1) first have a Thai write down the script, then 2) go word for word via thai2english.com while playing back the video, then 3) have my Thai teacher check my work, it's huge.

    I'm half way through at the moment, but if you'd like to see it before I post it on my site, please drop me a line.

    And if it's the quality you are going for, feel free to use it.