Automatically rewriting relative links

Although it's usually safe to use relative hyperlinks within a document to keep things portable, sometimes you run into trouble in certain circumstances where the concept of relative location isn't well-defined. One such situation is in an RSS feed.

This site's RSS feed automatically gets relative HREF and SRC links rewritten to absolute links. Here is how I do it. (Adapting it to other weblogging/CMS software should be pretty simple.)

PHPized RSS feed

Since MovableType doesn't provide any built-in link rewriting capability, I had to write a simple PHP preprocessing layer for it. Since I already had people subscribing to the static file stuff.xml, I had to quietly redirect things over to a dynamic PHP-ified version. So I changed my site's RSS feed location to stuff-rss.php, and put the following into the root .htaccess file:

beesbuzz.biz/.htaccess

RewriteEngine On
RewriteRule stuff.xml /stuff-rss.php [L]
(Actually I already had a RewriteRule for some other things, but that's the minimal file necessary for this particular purpose. Also note that this will rewrite all files called stuff.xml, so be careful.)

URL rewriting filter

Next, I created a global sitefuncs.php which includes the following:

sitefuncs.php

<? // Rewrite all relative links in a chunk of HTML/XML/etc. to point to the appropriate place
function rewriteRelative($html, $base) {

// generate server-only replacement for root-relative URLs
$server = preg_replace('@^([^\:]*)://([^/*]*)(/|$).*@', '\1://\2/', $base);

// replace root-relative URLs
$html = preg_replace('@\<([^>]*) (href|src)="/([^"]*)"@i', '<\1 \2="' . $server . '\3"', $html);

// replace base-relative URLs (kludgy, but I couldn't get ! to work)
$html = preg_replace('@\<([^>]*) (href|src)="(([^\:"])*|([^"]*:[^/"].*))"@i', '<\1 \2="' . $base . '\3"', $html);

return $html;
} ?>

Filtered RSS content

Finally, I just needed to filter my entry text (the stuff between <description>...</description>) to use the right URLs. Since MovableType doesn't let you safely encode a chunk of text for both XML CDATA and for a PHP string, I just had to use a here document. Here's what goes inside my description tags in my new RSS template:
<description><?
echo rewriteRelative(<<<ENTRYBODY_<$MTEntryID$>_XYZZY
<$MTEntryBody encode_xml="1"$>
ENTRYBODY_<$MTEntryID$>_XYZZY
, "<$MTEntryLink encode_php="qq" archive_type="Category"$>");
?></description>
(The entry ID appended onto the heredoc name is primarily to make it so that I can post this code without having to worry about this entry messing up this function later.)

So anyway, that's all there is to it — now URLs are transparently rewritten in a meaningful way.

Caveats

Because preg_replace only operates on entire lines at a time, the tag and attribute need to be on the same line; for example,
<img src="foo">
will work, whereas
<img
src="foo">
will not. Also, this won't rewrite a relative URL with :/ anywhere in it which doesn't start with /, for whatever that's worth.