In our example so far we're randomly placing hundreds of trees most of which end up off-screen a lot of the time. While our GPU pretty quickly realises faces are off screen there is still a lot of overhead in dealing with all those off-screen meshes.
Now I know of two techniques that we can use to minimise the number of objects we render based on what is visible and they are kind off siblings of each other and often used in conjunction.
The one we'll be looking at today is checking if an objects bounding volume is on screen. A bounding volume is a simplified mesh that is larger then the object we're drawing but a much simpler geometry then our mesh. So take our tree with thousands of vertices and polygons as an example, our bounding volume is a simple 8 vertices, 12 face mesh that encompasses our tree. It takes far less effort to calculate whether our bounding volume is (partly) on screen then our tree. Now if our bounding volume is on screen we may still have our tree being off screen, but if our bounding volume is off screen, our tree definitely will be off screen and we end up culling a lot of objects with very little waste of CPU. And yes, we are doing this on the CPU.
You can use other bounding shapes or combine a number of boxes. Depending on the overall shape of the original mesh that may reduce the number of 'false positives' but at the expense of more expensive checking.
It will come as no surprise the other technique I was hinting at is GPU bound. We'll keep that for later but in a nutshell, as we are rendering our objects we'll have the GPU provide feedback on whether an object actually got rendered. If so we render it again the next frame, if not we only render it's bounding volume but without actually outputting any colors. Once the GPU provides feedback that the bounding volume would have been rendered we'll render our object again. This is especially effective for complex objects that are rendered but hidden behind other objects and culling those from the render loop. As mentioned however, this is a topic for a later day.
Our CPU checking gets rid of the bulk while our GPU checking adds precision to what is left to render.
Finally, and this bit we'll have a look at as well, we can do our checks in bulk when loads of objects occupy the same space. Say that you have a room inside of a house, you could put a bounding volume around the space of the room and relate all the items within to that bounding volume. If the room as a whole is visible we check the objects within, but if the bounding volume of the room is invisible, we know that we can exclude all objects held within.
In the end, before adding this bit of optimisation my engine comfortably rendered around 1000 trees with our LOD approach full screen on my trusty little MacBook Pro. After these changes that has increase 5 to 10 fold unless off course we move our camera off the ground and we have to render everything. If there is nothing to cull then there is no optimisation.
Adding bounding volume support to our meshNode library
We're adding our bounding volume logic to our meshNode library. Taking our tree as an example again, we have two meshes, one for the tree/branches and one for our leaves. There is no point in having separate bounding boxes for both, they pretty much occupy the same space and it is far easier to just have a single bounding box. So if we create this bounding box on the node that encompasses both we achieve this.
Also if we want to have a bounding box that encompasses a whole bunch of objects, such as say our room example, we can put this bounding box on the node that represents the room and make the contents of the room sub-nodes.
We're going to use our mesh3d object to store our bounding volume in, this may sound like a bit of overkill as we don't need normals, texture coordinates and material information but that is little overhead and why build something nearly identical? Also being able to render the bounding volume as if it is a normal mesh comes in handy for debugging. The above screenshot was created by rendering our bounding volume with a transparent shader.
So the first thing we've changed is add a variable called bounds to our meshnode structure which holds a pointer to a mesh3d object. We also added a function called meshNodeSetBounds which assigns a mesh3d object to be a nodes bounding volume which does the necessary retains and releases. Note that our copy function also copies the objects bounding object.
Next we've added a function called meshNodeMakeBounds, this function loops through all the child nodes and evaluates all meshes held within to calculate a bounding box that fits around all those objects. The code is pretty straight forward and its overkill to post it here so have a look at the original source code.
There are also two new helper functions for debugging purposes:
- meshNodeSetBoundsDebugMaterial sets the material to use when creating bounding volume objects with which we can render them. This must be called before we create any bounding volumes.
- meshNodeSetRenderBounds turns rendering the bounds on and off. I've added a keypress in our engine.c for the "b"-key to toggle rendering the bounding volumes on and off.
Checking our bounding volume has been added to meshNodeBuildRenderList which now gets a pointer to the full shaderMatrices object so we have access to our projection and view matrices.
As we're not yet loading our model matrix into this structure we've added a new function called shdMatGetViewProjection to get a matrix that combines our view and projection matrices. We need these to project our bounding box to on-screen coordinates so we can check if it will render.
After our LOD test but before we check our mesh we're now checking if we have a bounding volume and if so the following bit of code is run:
if (pNode->bounds != NULL) { mat4 mvp; mat4Copy(&mvp, shdMatGetViewProjection(pMatrices)); mat4Multiply(&mvp, &model); if (meshTestVolume(pNode->bounds, &mvp) == false) { // yes we're not rendering but we did pass our LOD test so we're done... return true; }; ... };Here we are taking our view-projection matrix, adding in our model matrix and calling meshTestVolume to check if our bounding volume is on screen. If not we return true as we don't want to check this node any further but we do want to tell our calling method that we would have otherwise rendered our object (i.e. we passed our LOD check).
There is also some code there to add our bounding box to our render list but I've not included it above.
The meshTestVolume function has been added to our mesh3d library:
// assume our mesh is a bounding volume, do a CPU based test to see if any part of the volume would be onscreen bool meshTestVolume(mesh3d * pMesh, const mat4 * pMVP) { vec3 * verts = NULL; int i; bool infront = false; if (pMesh == NULL) { return false; } else if (pMesh->vertices->numEntries == 0) { return false; } else if (pMesh->verticesPerFace != 3) { return false; }; // first project our vertices to screen coordinates verts = (vec3 *) malloc(sizeof(vec3) * pMesh->vertices->numEntries); if (verts == NULL) { return false; }; for (i = 0; i< pMesh->vertices->numEntries; i++) { mat4ApplyToVec3(&verts[i], (vec3 *)dynArrayDataAtIndex(pMesh->vertices,i), pMVP); if (verts[i].z > 0.0) { if ((verts[i].x >= -1.0) && (verts[i].x <= 1.0) && (verts[i].y >= -1.0) && (verts[i].y <= 1.0)) { // on screen, no need to test further ;) free(verts); return true; } else { // we found a vertice thats potentially in front of the camera infront = true; }; }; };So in this first part we're applying our model-view-projection matrix to all of our vertices. As we're not actually rendering anything we can exit as soon as one of the vertices is on screen. We also set a flag if any of the vertices is in front of our camera by checking our z plane. If none of our vertices are, we're also off-screen and need check no further.
// if atleast one vertice lies infront of our camera... if (infront) { // now test our faces for (i = 0; i < pMesh->indices->numEntries; i += 3) { int a = *((int *)dynArrayDataAtIndex(pMesh->indices, i)); int b = *((int *)dynArrayDataAtIndex(pMesh->indices, i+1)); int c = *((int *)dynArrayDataAtIndex(pMesh->indices, i+2)); // we should check if we cross our near-Z-plane here and cut our triangle if so.We now check each of our faces. At this point in time I'm ignoring two things that may improve the logic here: - no backface culling, actually we should cull front faces, but it may not be worth the extra work - we're not doing an intersection test with our Z-plane, now this one is important especially for large area bounding boxes. A worthwhile enhancement for the future.
// sort them horizontally a is left most, c is right most, b is in the middle if (verts[a].x > verts[b].x) swap(&a,&b); if (verts[a].x > verts[c].x) swap(&a,&c); if (verts[b].x > verts[c].x) swap(&b,&c); if (verts[a].x > 1.0) { // completely offscreen to the right } else if (verts[c].x < -1.0) { // completely offscreen to the left } else if ((verts[a].y < -1.0) && (verts[b].y < -1.0) && (verts[c].y < -1.0)) { // off the top off the screen } else if ((verts[a].y > 1.0) && (verts[b].y > 1.0) && (verts[c].y > 1.0)) { // off the bottom off the screen } else {All the above checks where simple checks that check whether our face is completely off screen, now we are left with edge cases, faces who's vertices are off-screen but that may be partially visible. You can call me lazy but I'm just going to assume part of them are on-screen. We could add the extra code to properly check but I believe we won't win a whole lot here. I'll leave this for a rainy day.
// For now assume anything else is on screen, there are some edge cases that give a false positive // but checking those won't increase things alot... free(verts); return true; }; }; }; // if we get this far its false.. free(verts); return false; };And there we have it, a simple relatively low-cost bounding volume check to cull all the off-screen trees from our rendering loop.
Using our new bounding volumes
Our next step is actually using this new logic and that has become really simple.
After loading our tie-bomber, we simply call meshNodeMakeBounds on our first instance and we have a bounding box around our tie-bomber. When we then instance our tie-bomber the pointer to our bounding box is copied along.
We do the same for our trees, when we load LOD1 we create our bounds on our treeLod1 node, the same for treeLod2 BUT we don't do so for treeLod3. Note that our 3rd LOD is just a square, two triangles. Our GPU will exclude that far faster then our CPU doing a bounds check on a complete square!
With this in place we already have a huge performance increase but there is one more step that I want to add. Instead of adding our treeNode directly to our scene as we did in the previous part we're going to group them together in a 5000x5000 grid. This will allow us to discard a whole area of trees that is off-screen without checking each individual tree. I've created an array of meshNodes called treeGroups. This is a 100x100 array which is way bigger then we currently need but I was experimenting with larger fields. We initialise this with NULL pointers so we'll only end up using what we really need anyway.
So near the end of our addTrees method we've replaced our meshNodeAddChild(scene, treeNode) call with the following code:
// add to our tree groups i = (tmpvector.x + 50000.0) / 5000.0; j = (tmpvector.z + 50000.0) / 5000.0; if (treeGroups[i][j] == NULL) { sprintf(nodeName, "treeGroup_%d_%d", i, j); treeGroups[i][j] = newMeshNode(nodeName); // position our node tmpvector.x = (5000.0 * i) - 50000.0; tmpvector.z = (5000.0 * j) - 50000.0; tmpvector.y = getHeight(tmpvector.x, tmpvector.z) - 15.0; mat4Translate(&treeGroups[i][j]->position, &tmpvector); // set a maximum distance as we wouldn't be rendering any trees if it's this far away treeGroups[i][j]->maxDist = 35000.0; // and add to our scene meshNodeAddChild(scene, treeGroups[i][j]); }; // and now reposition our tree and add to our group treeNode->position.m[3][0] -= treeGroups[i][j]->position.m[3][0]; treeNode->position.m[3][1] -= treeGroups[i][j]->position.m[3][1]; treeNode->position.m[3][2] -= treeGroups[i][j]->position.m[3][2]; meshNodeAddChild(treeGroups[i][j], treeNode);By adding our offset of 50000.0 units and then dividing by our 5000.0 grid we find out in which area our tree is placed. We then check if we need to create a new tree group and add our tree group to our scene. We then reposition our tree in relation to our tree group and add our treeNode.
Finally we create bounding boxes for each tree group and release the tree groups (they are retained by our scene. Note that because we've already created bounding boxes for our trees themselves we reuse most of that information and save a bunch of processing:
for (j = 0; j < 101; j++) { for (i = 0; i < 101; i++) { if (treeGroups[i][j] != NULL) { meshNodeMakeBounds(treeGroups[i][j]); meshNodeRelease(treeGroups[i][j]); }; }; };The end result is something like this:
Note that our bounding boxes for each treeGroup aren't perfectly aligned nor should they be. Trees could be placed right at the edge and extend past the 5000x5000 border or we could have no trees near our border and have a smaller bounding box. That's all good:)
And voila, lots more trees at 60fps:
Download the source here
So where from here?
So there are a few improvements we can make here. Our bounding volume test logic can use some improvements such as clipping on the Z-near-plane and checking our edge cases though for both I'm not sure how much will improve.
Also something that is worth thinking about for a scene like hours is to change our terrain to map to our tree group bounding box. So instead of having one large terrain mesh we create a smaller one and render it as a child node of each tree group. This would result in much less terrain being handled through our GPU all though the exclusions we already make in the tessellation shader may already do enough.
Finally there is doing our GPU based testing. We'll get back to that in due time but if you can't wait, here is a great article from the GPU Gems book. It also provides more background info on what we've done so far with bounding volumes.
What's next?
I'll have to give this a little thought. I was planning to dive into changing the rendering to a deferred renderer but I'm also playing around with adding a basic shadow map to our engine first. This is one of those things that may be useful to cover first in a single pass renderer and then examine the differences in a deferred renderer but at the same time I wanted to leave the more complex lighting techniques till after we've switched to deferred rendering.