More fun with encodings

Comments

On a Slack I’m on, there was a conversation wondering why so many websites disallow passwords with spaces, punctuation, “special” characters, and so on; shouldn’t they all be hashing the passwords rather than storing them in plain text anyway?

Yes, they should, but that’s not where the problem is. Once again, encodings become a problem.

Read more…

Encodings are the worst

Comments

These past few weeks I seem to keep on running into issues where things have been really bad about handling character encodings.

Back in the day, encodings were an absolute nightmare. You had different 8-bit encodings for every language, each with a bunch of different ISO standards; a very commonly-used one is ISO-8859-1, aka Latin-1, which is basically the characters needed to render all of English and most of several Romance languages (although a bunch of stuff is missing), plus a little extra stuff for math, scientific notation (µ), and German (ß), as well as a bunch of miscellani which were generally useful.

Unfortunately, a lot of Internet standards decided to default to that, including HTML.

Note: There are some updates based on feedback at the very bottom.

Read more…