Base-10 File Sizes

So, Mac OS X 10.6 (Snow Leopard) will be using base-10 file sizes. There has been a lot of nerd outcry over this, but frankly, I think it’s about freaking time.

There is absolutely no reason to use base-2 file sizes. Yes, computers deal with things in terms of base 2, but nobody else does. When you look at a file that is 104768926 bytes big, you think, “oh, 105 megabytes,” not “100 megabytes.” As files get bigger and bigger, the disparity between MB and MiB gets worse and worse.

People have long accused hard drive manufacturers of “inflating” drive sizes by using base-10 instead of base-2, but really it’s been the fault of OS makers for deflating it, based on some really ridiculous legacy which dates back to the 70s, namely that it was a lot easier for OSes to just say how many 1K clusters were available, or divide the bytes available by >>10 instead of /1024, or whatever.

The practice of 1024-as-K has also led to all sorts of weirdness, like 1.44MB disks (which were 1440KiB, i.e. 1474560 bytes - neither 1.44MB nor 1.44MiB).

“But computer parts are sold in terms of 1024 units!” is also crap. The only part that has ever been sold on that basis is RAM, which actually makes sense for various technological reasons not worth getting into. CPU speed is base-10. Network adapters are base-10. Bus speed is base-10. And hard drives are sold based on base-10, but reported based on base-2.

Okay, so RAM sizes will be somewhat disparate from hard disk sizes, but really, why does that matter? RAM sizes only matter to programmers, and as a ballpark figure for users for having “enough” memory. Just because a file on disk takes 1200KB doesn’t mean it will take 1200KB of RAM; chances are it will take much more. (Granted, there are a lot of spots where it makes sense for code to use power-of-2 sizes, for things like memory allocation and caches and the like, but that doesn’t need to be reported to the user.)

The only place where hard disk size really has any base-2 issue is because file systems tend to allocate things in base-2-sized chunks (usually 512 or 1024 bytes), but that’s not counting overhead of the filesystem itself, and anyway the vast majority of files (the ones which take enough space for hard drive availability to be an issue) are so large that the cluster size essentially just amounts to rounding error anyway. Okay, so the “real” storage space taken by a 123456789-byte file is actually 123457536 bytes, but that’s still a lot closer to 123.4MB than it is to 117.7MB!

In short: Apple is doing a good thing by finally freeing us of some ridiculous legacy which has no bearing on reality.

Okay, so it does mean there will be a mismatch between file sizes reported on OSX 10.6 vs. any other OS, but when does that actually matter?

Comments