Code: Embedding binary resources with CMake and C++11

Embedding binary resources with CMake and C++11

July 31, 2014 1:25 PM (9 years ago)

The problem

Let’s say you want to make a single-binary application that has embedded resources (images, GLSL shaders, etc.). Let’s say you want to automatically wrap your resources in a storage container to make it easier to deal with stuff. Let’s also say that you might even be using CMake as your build system.

CMake doesn’t provide a way of making a custom build rule, and using extern data is a little unwieldy. So here’s an easy-ish way to do both parts, making use of C++11 language features (and a scary preprocessor hack).

The C++11 bit is also useful on its own, even if you aren’t using CMake, although things will have to be adapted to your build system of choice.

Note: Back when I wrote this the general compiler ecosystem was different, especially on macOS. If you just want a library that does all this stuff for you in a platform-independent manner, check out this resource embedding script. Or you might be interested in a CMake-only approach for the resource generation and using that in conjunction with the rest of this article.

First step: building `resource.o`

Using GNU toolchains, the easiest and most efficient way to convert an arbitrary binary into a .o file is with ld -r -b binary -o outfile.o infile, which will generate a .o file with a few symbols that are based on the mangled full path of infile, of the form _binary_[mangled path]_start and _binary_[mangled path]_end. The mangling rule is just that any character that’s not a valid C symbol character turns into _.

Note that this uses the full path of infile, though, and this is not settable at runtime; so, you should run the ld command from the source root directory so that you end up with predictable but unique symbol names.

A CMake rule that does this for you

CMake doesn’t provide a facility for custom build rules. They apparently have a good reason for this. Whatever it is, it’s inconvenient. Fortunately, I managed to find a mailing list post that explains how to fake it. So here’s my CMake snippet that simulates a custom build rule:

# Compilation step for static resources
FUNCTION(ADD_RESOURCES out_var)
  SET(result)
  FOREACH(in_f ${ARGN})
    FILE(RELATIVE_PATH src_f ${CMAKE_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/${in_f})
    SET(out_f "${PROJECT_BINARY_DIR}/${in_f}.o")
    ADD_CUSTOM_COMMAND(OUTPUT ${out_f}
      COMMAND ld -r -b binary -o ${out_f} ${src_f}
      DEPENDS ${in_f}
      WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
      COMMENT "Building GLSL object ${out_f}"
      VERBATIM
      )
    LIST(APPEND result ${out_f})
  ENDFOREACH()
  SET(${out_var} "${result}" PARENT_SCOPE)
ENDFUNCTION()

# A simple program that uses resources
ADD_RESOURCES(someResources test.txt)

ADD_EXECUTABLE(main.cpp ${someResources})

So now you have a means of building and linking the symbols. Now you just have to make them usable, easily.

Using the symbols

The typical usage would be to do something annoying like:

extern const char _binary_test_txt_start, _binary_test_txt_end;

int main() {
    std::cout << std::string(&_binary_test_txt_start, &_binary_test_txt_end - &_binary_test_txt_start) << std::endl;
}

(There’s also a symbol _binary_test_txt_size but it’s kind of unclear about how to actually use it and what the data type is and so on, and so just doing pointer math is a lot more predictable.)

This is okay if you only have a single resource, but that gets really freaking annoying if you’re trying to do a whole bunch of resources. So let’s wrap it! This is where a C++11 feature comes in handy:

// Resource.h
#ifndef _RESOURCE_H
#define _RESOURCE_H

#include <cstddef>

class Resource {
public:
    Resource(const char *start, const char *end): mData(start),
                                                  mSize(end - start)
    {}

    const char * const &data() const { return mData; }
    const size_t &size() const { return mSize; }

    const char *begin() const { return mData; }
    const char *end() const { return mData + mSize; }

private:
    const char *mData;
    size_t mSize;
};

#define LOAD_RESOURCE(x) ([]() {                                    \
        extern const char _binary_##x##_start, _binary_##x##_end;   \
        return Resource(&_binary_##x##_start, &_binary_##x##_end);  \
    })()

#endif

Picking this apart, what it does is creates a generic “Resource” blob object that takes a start and end pointer, and provides a data pointer, a size accessor, and standard iterator accessors (so that you can still use STL access methods and ranged for), and it defines a macro that defines and immediately calls a lambda that instantiates the resource wrapper (## is the preprocessor concatenation operator). Using it is simple; let’s say that the resource’s path (relative to the top-level CMakeLists.txt) is src/test.txt:

// main.cpp
#include <iostream>
#include "Resource.h"

int main(void) {
    Resource text = LOAD_RESOURCE(src_test_txt);
    std::cout << std::string(text.data(), text.size()) << std::endl;
}

The astute may have wondered why the Resource::data and Resource::size accessors return references rather than values. The reason is because OpenGL doesn’t take a single pointer and length when specifying shader source; the API is:

void glShaderSource(GLuint shader, GLsizei count, const GLchar *const*string, const GLint *length);

i.e. it takes an array of them. And having to assign the Resource values to local lvalues just so that you can take the address is annoying; why do that when you could just do this?

Resource shader = LOAD_RESOURCE(src_shaders_basicShader_vert);
glShaderSource(handle, 1, &shader.data(), &shader.size());

Can shaders be done more simply?

Okay, if you only want to load shaders like this and don’t care about supporting other resource types, you can simplify things a lot more (to the extent that you don’t even need C++11 anymore):

#define LOAD_SHADER_SOURCE(handle, x) do {                          \
        extern const char _binary_##x##_start, _binary_##x##_end;   \
        const char *start = &_binary_##x##_start;                   \
        size_t size = &_binary_##x##_end - start;                   \
        glShaderSource(handle, 1, &start, &size);                   \
    } while (0)

So now your code to load the shader is just:

LOAD_SHADER_SOURCE(handle, src_shaders_torus_vert)

What if you want to use `#include` in your shaders?

Normally, to use #include you need to make use of the shading_language_include extension, which requires either pre-registering all of your includes or providing a virtual filesystem driver. Creating a virtual filesystem driver is difficult because you can’t (easily) dynamically generate symbol names to pull the included resource from. (It is possible using dlsym and so on, but that’s annoying.)

Fortunately, you can run the C preprocessor on its own, and generate the preprocessed vert/frag files accordingly. (Obviously, any #included fragment will be repeated in your resulting binary every time it’s used. If this is a problem, use the dlsym method instead.) This requires some changes to the CMake rules. I haven’t done this work just yet but it should be fairly straightforward to make a custom command PREPROCESS_RESOURCE that runs cpp on the resource file into the build directory (maintaining the appropriate subdirectory path) and then ADD_RESOURCE can depend on PREPROCESS_RESOURCE’s output instead (or you can make a separate command, ADD_PREPROCESSED_RESOURCE, that does this all in one step or whatever). If I ever decide to implement this I’ll add it to this writeup, or someone else could post it in the comments, I suppose.