Embedding Oh No Robot transcripts
Oh No Robot is an excellent transcription service, but it has the disadvantage of the wonderful plain-text transcripts never actually showing up with the comic. This means that search engines and vision-impaired readers never get to associate the comic text with the comic itself. Here is my solution to that problem.
Getting the transcripts
First, I wrote a very simple Perl script which parses Oh No Robot's textual export file into little HTML snippets for inclusion into the comic's page text. It works by just loading ohnorobot.txt and then for each URL in the database, it emits a file named (basename).ts, where (basename) represents the part of the comic's URL between the last / and the first . after it. (So, for example, http://beesbuzz.biz/d/20081216.php becomes 20081216.ts.)To use it, I occasionally go to Oh No Robot's "export" page, download the text version, then upload it to my webserver in the same directory as this script (which is in a subdirectory of my comic). From there, there are a few different ways I can process it, based on what's most convenient:
- Since I have shell access, if I'm logged into the server already I can just run the script directly
- Since Dreamhost allows CGI scripts, I can also run it as a CGI via my browser or curl
- I could also set up a cron job to do the first one on a regular basis
- If I didn't have any way of running perl scripts on the server, I could just run it on my home system and then upload the resulting .ts files to the server (but that's a bit lame)
Using it in the comic
Finally, my comics have the following bit of PHP code in the template:
<?
/* include parsed-out transcript file, if available */
$tfile="onr-parse/<$MTEntryDate format="%Y%m%d"$>.ts";
if (file_exists($tfile)) {
echo '<div class="transcript">';
readfile($tfile);
echo '</div>';
}
?>
In this case, <$MTEntryDate$> is Movable Type's custom tag
which I use to determine the transcript snippet filename; if you're using some
other CMS or URL scheme then you'll need to do something else. (Generally, basename($_SELF, ".php") will be enough if you use a separate php file per comic page, but if you use something like a ?date=xxx parameter or the like, it will be trickier.) Of course, if the parsed file directory is kept elsewhere, you'll need to change the onr-parse/ part of $tfile.Anyway, the final thing to do is to add a stylesheet rule like:
.transcript { visibility: hidden; }
In the case of my site, the transcript block is actually kept inside the "transcribe" div and so my full CSS rules are:
.transcribe .transcript { visibility: hidden; position: absolute; }
.transcribe:hover .transcript {
position: absolute;
margin-top: 1em;
visibility: visible;
background: #ccc;
border: solid black 1px;
z-index: 20;
}
.transcribe .transcript div.panel { border: dashed #777 1px; margin: 4px; padding: 2px 4px; background: white; }
.transcribe .transcript div { display: block; text-align: left; }
.transcribe .transcript .line { margin: 2px 2em; text-indent: -2em; }
.transcribe .transcript .line:first-word { font-weight: bold; }
which makes it so that if someone mouses over the "transcribe" link (when using a
browser which supports CSS's :hover selector, such as Firefox, Chrome, or Safari), it pops up with some
fancy formatting for the panels. (On other browsers it will simply remain
hidden.) The same could be accomplished with Javascript if compatibility with
MSIE or the like is absolutely vital.






Comments