select() API should have been deprecated years ago. While unsafe operations like
gets(), and so forth all provide compile-time deprecation warnings,
select() is also incredibly dangerous and has a more modern, safer replacement (
poll()), but yet people continue to use it.
The problem is that it doesn’t scale. In this case, “not scaling” doesn’t just mean it’s bad for performance, “not scaling” means it can destroy your call stack, crash your process, and leave it in a state that is incredibly difficult to debug.
There are actually two problems, and they are closely-related. The hint to both of them comes from the actual signature of the select function:
Do you see that
nfds parameter? Here is what the BSD libc documentation has to say about it:
nfdsdescriptors are checked in each set; i.e., the descriptors from 0 through nfds-1 in the descriptor sets are examined. (Example: If you have set two file descriptors “4” and “17”, nfds should not be “2”, but rather “17 + 1” or “18”.)
What this is hinting at is that fd_set’s implementation is not a container that contains a maximum number of sparsely-populated file descriptors; instead, it is a bit vector saying whether to check the fds at a given offset. That is, in pseudocode, people expect it to be something like:
but it’s really more like:
So this immediately shows what the two problems are, but I’ll still explain it further (since this wasn’t enough when I was helping diagnose a problem with some critical production code that was breaking). Both of them involve what happens when your process has a large number of file descriptors open (which is very common in large-scale services).
Let’s say you are trying to check the state of a single socket with fd of 907 — which happens to be within
FD_SETSIZE on a modern Linux. So your code looks something like this:
Here is what select() is doing internally (again, pseudocodeishly):
In this case, since only fd 907 is actually being checked, it’s looping futilely over 906 entries which don’t need to be checked. If you’re doing
select() a lot, this can add up (remember that the structures described above are just pseudocode, and in reality there’s a lot of bitfiddling/masking/etc. going on as well).
But this isn’t the dire problem.
So let’s look at the same thing as above, except now the fd in question is, say, 2048, which is quite a lot larger than
FD_SETSIZE. Theoretically this is not possible, but in practice it is — it’s pretty simple to raise the socket ulimit for circumstances where you need a lot of connections open. (For example, in a large-scale distributed cache. Again, speaking from experience on this.)
Let’s annotate the code that calls it, first:
Why does this blow away the stack? Well, keep in mind what
select() is doing internally:
That is, for all the values up to 2048 (which, remember, is outside of the
fd_set and well into your stack at this point), it’s checking a bit to see if a garbage fd needs to be checked (which is relatively harmless), checking that fd (which is relatively harmless but kills your performance), and then sets that value in memory based on whether the fd is in that particular state (which is, no matter what, essentially randomizing the stack).
Congratulations, now not only will you get a weird error, you won’t even be able to tell where it comes from because your debugger will give you complete garbage on the call trace.
Hindsight is 20/20, of course, but
poll() is the API that
select() should have been in the first place. When
select() was designed, UNIX had no asynchronous I/O mechanism at all, and the whole concept of a large-scale network server was unheard of. Everything was its own process, and you’d do something like
accept(8) to accept only 8 simultaneous connections at a time, and each of those connections would likely open up a handful of files to serve up static content or whatever. In the early days,
FD_SETSIZE was only 16; Winsock raised it to 64, and early Linux raised it to 256. That was still more than adequate for a reasonably-sized server at the time, and it was still generally the case that you’d be checking all of your fds at once in a single-threaded event loop (heck, back then, the whole concept of “thread” was just an abuse of
vfork() and not something that any reasonable programmer would expect to use) and that there wouldn’t be that many to check; the overhead of managing a variable-sized buffer would be greater than the savings of not having the conditional loop.
But nowadays, any given Internet server can be easily handling thousands, if not tens of thousands, of connections at once, and the
select() API is fundamentally broken in a way which can never, ever make this safe.
Fortunately, there’s another API,
poll(), which is much more like what people expect out of
select() in the first place.
Essentially, you tell it exactly how many fds you’re asking about, and for each fd, exactly which events you want to know about. So now the
poll() call only has to scan the actual specified fds, and only check the actual specified events — no checking readable, writable, and error conditions for ALL fds that are ≤ the highest one you’re asking about.
In the common case,
poll() is a simple drop-in replacement, and is generally simpler and easier to read; this code:
And, obviously, if you want to check multiple types of events on a socket, it becomes even simpler, as you only need a single set of pollfd structures and simple bit masking on the calling side.
I have a couple of theories.
select() is the legacy API that was provided in all of the classic textbooks; universities possibly still teach networking using it, and there’s a lot of sample code out there that uses
select and even the best programmers still have a tendency to copy-paste at times.
Additionally, the fact that gcc doesn’t issue a deprecation warning on it means that people haven’t had any reason to change their practices. The man pages tend to contain warnings, but people generally just read the signature and return codes.
Current versions of certain libc variants will return an error if nfds ≥
FD_SETSIZE (for example, the one on OSX 10.8 will, although there is apparently no warning to this effect on even current Linux glibc), but at that point the stack is already corrupted and if your server has gotten to the point that a subtle hard-to-understand glibc error is occurring, you’re already having a bad day and probably don’t realize where the issue is coming from.
For what it’s worth, one version of the Linux manpage for
select() has this to say:
fd_setis a fixed size buffer. Executing
FD_SET()with a value of
fdthat is negative or is equal to or larger than
FD_SETSIZEwill result in undefined behavior. Moreover, POSIX requires
fdto be a valid file descriptor.
“Undefined behavior” is putting it lightly. (And it makes no mention of how
select() itself is also a ticking time bomb in that situation.)
I purposefully did not discuss them in this article. While both are far superior to
poll, they aren’t simple drop-in replacements to
select, and at least in the basic case of an Internet server the more important thing is not crashing mysteriously. Obviously with a better asynchronous eventing mechanism you will scale better, but scaling better is often a longer-term goal than scaling at all.