The mainstream real-time graphics APIs, OpenGL and Direct3D, are probably the most widespread way that programmers interact with heterogeneous hardware. But their brand of CPU–GPU integration is unconscionable. CPU-side code needs to coordinate closely with GPU-side shader programs for good performance, but the APIs we have today treat the two execution units as isolated universes. This mindset leads to stringly typed interfaces, a huge volume of boilerplate, and impoverished GPU-specific programming languages.
This post tours a few gritty realities in a trivial OpenGL application. You can follow along with a literate listing of the full source code.
Shaders are Strings
To define an object’s appearance in a 3D scene, real-time graphics applications use shaders: small programs that run on the GPU as part of the rendering pipeline. There are several kinds of shaders, but the two most common are the vertex shader, which determines the position of each vertex in an object’s mesh, and the fragment shader, which produces the color of each pixel on the object’s surface. You write shaders in special C-like programming languages: OpenGL uses GLSL.
This is where things go wrong. To set up a shader, the host program sends a string containing shader source code to the graphics card driver. The driver JITs the source to the GPU’s internal architecture and loads it onto the hardware.
Here’s a simplified pair of GLSL vertex and fragment shaders in C string constants:
const char *vertex_shader =
"in vec4 position;\n"
"out vec4 myPos;\n"
"void main() {\n"
" myPos = position;\n"
" gl_Position = position;\n"
"}\n";
const char *fragment_shader =
"uniform float phase;\n"
"in vec4 myPos;\n"
"void main() {\n"
" gl_FragColor = ...;\n"
"}\n";
(It’s also common to load shader code from text files at startup time.)
Those in
, out
, and uniform
qualifiers denote communication channels between the CPU and GPU and between the different stages of the GPU’s rendering pipeline.
That myPos
variable serves to shuffle data from through vertex shader into the fragment shader.
The vertex shader’s main
function assigns to the magic gl_Position
variable for its output, and the fragment shader assigns to gl_FragColor
.
Here’s roughly how you compile and load the shader program:
// Compile the vertex shader.
GLuint vshader = glCreateShader(GL_VERTEX_SHADER);
glShaderSource(vshader, 1, &vertex_shader, 0);
// Compile the fragment shader.
GLuint fshader = glCreateShader(GL_FRAGMENT_SHADER);
glShaderSource(fshader, 1, &fragment_shader, 0);
// Create a program that stitches the two shader stages together.
GLuint shader_program = glCreateProgram();
glAttachShader(shader_program, vshader);
glAttachShader(shader_program, fshader);
glLinkProgram(shader_program);
With that boilerplate, we’re ready to invoke shader_program
to draw objects.
The shaders-in-strings interface is the original sin of graphics programming.
It means that some parts of the complete program’s semantics are unknowable until run time—for no reason except that they target a different hardware unit.
It’s like eval
in JavaScript, but worse: every OpenGL program is required to cram some of its code into strings.
Direct3D and the next generation of graphics APIs—Mantle, Metal, and Vulkan—clean up some of the mess by using a bytecode to ship shaders instead of raw source code. But pre-compiling shader programs to an IR doesn’t solve the fundamental problem: the interface between the CPU and GPU code is needlessly dynamic, so you can’t reason statically about the whole, heterogeneous program.
Stringly Typed Binding Boilerplate
If string-wrapped shader code is OpenGL’s principal investment in pain, then it collects its pain dividends via the CPU–GPU communication interface.
Check out those variables position
and phase
in the vertex and fragment shaders, respectively.
The in
and uniform
qualifiers mean they’re parameters that come from the CPU.
To use those parameters, the host program’s first step is to look up location handles for each variable:
GLuint loc_phase = glGetUniformLocation(program, "phase");
GLuint loc_position = glGetAttribLocation(program, "position");
Yes, you look up the variable by passing its name as a string.
The phase
parameter is just a float
scalar, but position
is a dynamically sized array of position vectors, so it requires even more boilerplate to set up a backing buffer.
Next, we use these handles to pass new data to the shaders to draw each frame:
// The render loop.
while (1) {
// Bind our compiled shader program.
glUseProgram(shader_program);
// Set the scalar `phase` variable.
glUniform1f(loc_phase, sin(4 * t));
// Set the `location` array by copying data into the buffer.
glBindBuffer(GL_ARRAY_BUFFER, buffer);
glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(points), points);
// Use these parameters and our shader program to draw something.
glDrawArrays(GL_TRIANGLE_FAN, 0, NVERTICES);
}
The verbosity is distracting, but those glUniform1f
and glBufferSubData
calls are morally equivalent to
writing set("variable", value)
instead of let variable = value
.
The C and GLSL compilers can check and optimize the CPU and GPU code separately,
but the stringly typed CPU–GPU interface prevents either compiler from doing anything useful to the complete program.
The Age of Heterogeneity
OpenGL and its equivalents make miserable standard bearers for the age of hardware heterogeneity. Heterogeneity is rapidly becoming ubiquitous, and we need better ways to write software that spans hardware units with different capabilities. OpenGL’s programming model espouses the simplistic view that heterogeneous software should comprise multiple, loosely coupled, independent programs.
If pervasive heterogeneity is going to succeed, we need to bury this 20th-century notion. We need programming models that let us write one program that spans multiple execution contexts. This won’t erase heterogeneity’s essential complexity, but it will let us stop treating non-CPU code as a second-class citizen.