Radix sort revisited

Comments

Around a year and a half ago I wrote an article on the perils of relying on big-O notation, and in it I focused on a comparison between comparison-based sorting (via std::sort) and radix sort, based on the common bucketing approach.

Recently I came across a video on radix sort which presents an alternate counting-based implementation at the end, and claims that the tradeoff point between radix and comparison sort comes much sooner. My intuition said that even counting-based radix sort would still be slower than a comparison sort for any meaningful input size, but it’s always good to test one’s intuitions.

So, hey, it turns out I was wrong about something. (But my greater point still stands.)

Read more…

The danger of big-O notation

Comments

A common pitfall I see programmers run into is putting way too much stock into Big O notation and using it as a rough analog for overall performance. It’s important to understand what the Big O represents, and what it doesn’t, before deciding to optimize an algorithm based purely on the runtime complexity.

Read more…

How not to shuffle a list

Comments

A frequent thing that people want to do in making games or interactive applications is to shuffle a list. One common and intuitive approach that people take is to simply sort the list, but use a random number generator as the comparison operation. (For example, this is what’s recommended in Fuzzball’s MPI documentation, and it is a common answer that comes up on programming forums as well.)

This way is very, very wrong.

Read more…

Making a hash of data

Comments

When I was replacing peewee with PonyORM, I was evaluating a few options, including moving away from an ORM entirely and simply storing the metadata in indexed tables in memory. This would have also helped to solve a couple of minor annoying design issues (such as improper encapsulation of the actual content state into the application instance), but I ended up not doing this.

A big reason why is that there don’t actually seem to be any useful in-memory indexed table libraries for Python. Or many other languages.

Read more…

The Trouble with PHP

Comments

This article was originally written for the Publ blog. I have reproduced a slightly modified version here so that it hopefully finds a wider audience.

Whenever I build a piece of software for the web, almost invariably somebody asks why I’m not using PHP to do it. While much has been written on this subject from a standpoint of what’s wrong with the language (and with which I agree quite a lot!), that isn’t, to me, the core of the problem with PHP on the web.

So, I want to talk a bit about some of the more fundamental issues with PHP, which actually goes back well before PHP even existed and is intractably linked with the way PHP applications themselves are installed and run.

(I will be glossing over a lot of details here.)

Read more…

Publ

Comments

Publ: Like a static site generator, only dynamic.

(Also the software that powers this website.)

#pragma once vs. #ifndef/#define

Comments

After getting in an extended discussion about the supposed performance tradeoff between #pragma once and #ifndef guards vs. the argument of correctness or not (I was taking the side of #pragma once based on some relatively recent indoctrination to that end), I decided to finally test the theory that #pragma once is faster because the compiler doesn’t have to try to re-#include a file that had already been included.

For the test, I automatically generated 500 header files with complex interdependencies, and had a .c file that #includes them all. I ran the test three ways, once with just #ifndef, once with just #pragma once, and once with both. I performed the test on a fairly modern system (a 2014 MacBook Pro running OSX, using XCode’s bundled Clang, with the internal SSD).

Read more…