These past few weeks I seem to keep on running into issues where things have been really bad about handling character encodings.
Back in the day, encodings were an absolute nightmare. You had different 8-bit encodings for every language, each with a bunch of different ISO standards; a very commonly-used one is ISO-8859-1, aka Latin-1, which is basically the characters needed to render all of English and most of several Romance languages (although a bunch of stuff is missing), plus a little extra stuff for math, scientific notation (µ), and German (ß), as well as a bunch of miscellani which were generally useful.
Unfortunately, a lot of Internet standards decided to default to that, including HTML.
Note: There are some updates based on feedback at the very bottom.Read more…