Embedding Oh No Robot transcripts

Oh No Robot is an excellent transcription service, but it has the disadvantage of the wonderful plain-text transcripts never actually showing up with the comic. This means that search engines and vision-impaired readers never get to associate the comic text with the comic itself. Here is my solution to that problem.

Getting the transcripts

First, I wrote a very simple Perl script which parses Oh No Robot's textual export file into little HTML snippets for inclusion into the comic's page text. It works by just loading ohnorobot.txt and then for each URL in the database, it emits a file named (basename).ts, where (basename) represents the part of the comic's URL between the last / and the first . after it. (So, for example, http://beesbuzz.biz/d/20081216.php becomes 20081216.ts.)

To use it, I occasionally go to Oh No Robot's "export" page, download the text version, then upload it to my webserver in the same directory as this script (which is in a subdirectory of my comic). From there, there are a few different ways I can process it, based on what's most convenient:

  • Since I have shell access, if I'm logged into the server already I can just run the script directly
  • Since Dreamhost allows CGI scripts, I can also run it as a CGI via my browser or curl
  • I could also set up a cron job to do the first one on a regular basis
  • If I didn't have any way of running perl scripts on the server, I could just run it on my home system and then upload the resulting .ts files to the server (but that's a bit lame)
I also have an older version which parses the XML export (which is a lot slower but also a bit less fragile), but there is a long-standing bug with Oh No Robot's XML export which causes it to not export separate panels (which is something Ryan never has time to get around to fixing).

Using it in the comic

Finally, my comics have the following bit of PHP code in the template:

<?
/* include parsed-out transcript file, if available */
$tfile="onr-parse/<$MTEntryDate format="%Y%m%d"$>.ts";
if (file_exists($tfile)) {
  echo '<div class="transcript">';
  readfile($tfile);
  echo '</div>';
}
?>
In this case, <$MTEntryDate$> is Movable Type's custom tag which I use to determine the transcript snippet filename; if you're using some other CMS or URL scheme then you'll need to do something else. (Generally, basename($_SELF, ".php") will be enough if you use a separate php file per comic page, but if you use something like a ?date=xxx parameter or the like, it will be trickier.) Of course, if the parsed file directory is kept elsewhere, you'll need to change the onr-parse/ part of $tfile.

Anyway, the final thing to do is to add a stylesheet rule like:

.transcript { visibility: hidden; }
In the case of my site, the transcript block is actually kept inside the "transcribe" div and so my full CSS rules are:
.transcribe .transcript { visibility: hidden; position: absolute; }
.transcribe:hover .transcript {
 position: absolute;
 margin-top: 1em;
 visibility: visible;
 background: #ccc;
 border: solid black 1px;
 z-index: 20;
}
.transcribe .transcript div.panel { border: dashed #777 1px; margin: 4px; padding: 2px 4px; background: white; }
.transcribe .transcript div { display: block; text-align: left; }
.transcribe .transcript .line { margin: 2px 2em; text-indent: -2em; }
.transcribe .transcript .line:first-word { font-weight: bold; }
which makes it so that if someone mouses over the "transcribe" link (when using a browser which supports CSS's :hover selector, such as Firefox, Chrome, or Safari), it pops up with some fancy formatting for the panels. (On other browsers it will simply remain hidden.) The same could be accomplished with Javascript if compatibility with MSIE or the like is absolutely vital.

Comments

#  
04/10/2009 11:33 pm 
If anyone cares, I just rewrote the script to use the .txt export, which actually does does delineate the separate panels. The parser seems a bit more fragile but it should be fine, as long as there's no [[[ or ]]] in the transcripts.
Comic Rank