Radix sort revisited

Around a year and a half ago I wrote an article on the perils of relying on big-O notation, and in it I focused on a comparison between comparison-based sorting (via std::sort) and radix sort, based on the common bucketing approach.

Recently I came across a video on radix sort which presents an alternate counting-based implementation at the end, and claims that the tradeoff point between radix and comparison sort comes much sooner. My intuition said that even counting-based radix sort would still be slower than a comparison sort for any meaningful input size, but it’s always good to test one’s intuitions.

So, hey, it turns out I was wrong about something. (But my greater point still stands.)

Read more…

The danger of big-O notation

A common pitfall I see programmers run into is putting way too much stock into Big O notation and using it as a rough analog for overall performance. It’s important to understand what the Big O represents, and what it doesn’t, before deciding to optimize an algorithm based purely on the runtime complexity.

Read more…

How not to shuffle a list

A frequent thing that people want to do in making games or interactive applications is to shuffle a list. One common and intuitive approach that people take is to simply sort the list, but use a random number generator as the comparison operation. (For example, this is what’s recommended in Fuzzball’s MPI documentation, and it is a common answer that comes up on programming forums as well.)

This way is very, very wrong.

Read more…

Making a hash of data

When I was replacing peewee with PonyORM, I was evaluating a few options, including moving away from an ORM entirely and simply storing the metadata in indexed tables in memory. This would have also helped to solve a couple of minor annoying design issues (such as improper encapsulation of the actual content state into the application instance), but I ended up not doing this.

A big reason why is that there don’t actually seem to be any useful in-memory indexed table libraries for Python. Or many other languages.

Read more…

The Trouble with PHP

This article was originally written for the Publ blog. I have reproduced a slightly modified version here so that it hopefully finds a wider audience.

Whenever I build a piece of software for the web, almost invariably somebody asks why I’m not using PHP to do it. While much has been written on this subject from a standpoint of what’s wrong with the language (and with which I agree quite a lot!), that isn’t, to me, the core of the problem with PHP on the web.

So, I want to talk a bit about some of the more fundamental issues with PHP, which actually goes back well before PHP even existed and is intractably linked with the way PHP applications themselves are installed and run.

(I will be glossing over a lot of details here.)

Read more…


Publ: Like a static site generator, only dynamic.

(Also the software that powers this website.)

#pragma once vs. #ifndef/#define

After getting in an extended discussion about the supposed performance tradeoff between #pragma once and #ifndef guards vs. the argument of correctness or not (I was taking the side of #pragma once based on some relatively recent indoctrination to that end), I decided to finally test the theory that #pragma once is faster because the compiler doesn’t have to try to re-#include a file that had already been included.

For the test, I automatically generated 500 header files with complex interdependencies, and had a .c file that #includes them all. I ran the test three ways, once with just #ifndef, once with just #pragma once, and once with both. I performed the test on a fairly modern system (a 2014 MacBook Pro running OSX, using XCode’s bundled Clang, with the internal SSD).

Read more…

Embedding binary resources with CMake and C++11

The problem

Let’s say you want to make a single-binary application that has embedded resources (images, GLSL shaders, etc.). Let’s say you want to automatically wrap your resources in a storage container to make it easier to deal with stuff. Let’s also say that you might even be using CMake as your build system.

CMake doesn’t provide a way of making a custom build rule, and using extern data is a little unwieldy. So here’s an easy-ish way to do both parts, making use of C++11 language features (and a scary preprocessor hack).

The C++11 bit is also useful on its own, even if you aren’t using CMake, although things will have to be adapted to your build system of choice.

Note: Back when I wrote this the general compiler ecosystem was different, especially on macOS. If you just want a library that does all this stuff for you in a platform-independent manner, check out this resource embedding script. Or you might be interested in a CMake-only approach for the resource generation and using that in conjunction with the rest of this article.

Read more…

The problem with select() vs. poll()

The UNIX select() API should have been deprecated years ago. While unsafe operations like sscanf(), sprintf(), gets(), and so forth all provide compile-time deprecation warnings, select() is also incredibly dangerous and has a more modern, safer replacement (poll()), but yet people continue to use it.

The problem is that it doesn’t scale. In this case, “not scaling” doesn’t just mean it’s bad for performance, “not scaling” means it can destroy your call stack, crash your process, and leave it in a state that is incredibly difficult to debug.

Read more…

VCard phone number normalizer

These days, it’s no longer good enough to use local phone number formats for your address book; you might be trying to dial someone via SIP without any clear locale information, for example, and so trying to dial a 10-digit US number might end up routing the call to some other country, which can be quite embarrassing.

Further, in this day and age, you might actually be travelling between different countries, and so you can’t really predict what your outgoing call routing will be like!

So, here’s a really simple C++ program that wraps libphonenumber to normalize the phone numbers in a .vcf file (used by most modern address book systems) based on your current locale. It can be used both on local .vcf stores as well as ones stored on a CardDAV/CalDAV server.

See the code comments for usage details.

Exceptions vs. error returns, quantified

Today I got into a discussion regarding embedded programmers and their tendency to avoid exceptions in C++ (even after moving to new-school “embedded” systems such as smartphones, game consoles, and set-top boxes). The argument basically boils down to three things: code size, memory usage, and execution speed. So, rather than continue on without actually putting my beliefs to the test (the discussion basically centered around whether engineers are making assumptions based on how things were 15+ years ago), I decided to construct a minimal test which I think shows where the tradeoffs may be.

Note: The timing and memory analysis has been updated as of May 24, 2018.

Read more…

Affine HSV color manipulation

Because the need for color manipulation comes up fairly often in computer graphics, particularly transformations of hue, saturation, and value, and because some of this math is a bit tricky, here’s how to do HSV color transforms on RGB data using simple matrix operations.

Note: This isn’t about converting between RGB and HSV; this is only about applying an HSV-space modification to an RGB value and getting another RGB value out. There is no affine transformation to convert between RGB and HSV, as there is not a linear mapping between the two.

Read more…


This is a simple program for moving the mouse cursor around from a script. Useful as something to bind to a key event in window managers which allow such a thing but don’t have built-in keyboard-mouse functionality (such as pwm).