Saturday 26 March 2016

Using bounding volumes (part 25)

In this post we're going to look at bounding volumes, more specifically bounding boxes, and use them to limit the amount of models we render.

In our example so far we're randomly placing hundreds of trees most of which end up off-screen a lot of the time. While our GPU pretty quickly realises faces are off screen there is still a lot of overhead in dealing with all those off-screen meshes.

Now I know of two techniques that we can use to minimise the number of objects we render based on what is visible and they are kind off siblings of each other and often used in conjunction.

The one we'll be looking at today is checking if an objects bounding volume is on screen. A bounding volume is a simplified mesh that is larger then the object we're drawing but a much simpler geometry then our mesh. So take our tree with thousands of vertices and polygons as an example, our bounding volume is a simple 8 vertices, 12 face mesh that encompasses our tree. It takes far less effort to calculate whether our bounding volume is (partly) on screen then our tree. Now if our bounding volume is on screen we may still have our tree being off screen, but if our bounding volume is off screen, our tree definitely will be off screen and we end up culling a lot of objects with very little waste of CPU. And yes, we are doing this on the CPU.

You can use other bounding shapes or combine a number of boxes. Depending on the overall shape of the original mesh that may reduce the number of 'false positives' but at the expense of more expensive checking.

It will come as no surprise the other technique I was hinting at is GPU bound. We'll keep that for later but in a nutshell, as we are rendering our objects we'll have the GPU provide feedback on whether an object actually got rendered. If so we render it again the next frame, if not we only render it's bounding volume but without actually outputting any colors. Once the GPU provides feedback that the bounding volume would have been rendered we'll render our object again. This is especially effective for complex objects that are rendered but hidden behind other objects and culling those from the render loop. As mentioned however, this is a topic for a later day.

Our CPU checking gets rid of the bulk while our GPU checking adds precision to what is left to render.

Finally, and this bit we'll have a look at as well, we can do our checks in bulk when loads of objects occupy the same space. Say that you have a room inside of a house, you could put a bounding volume around the space of the room and relate all the items within to that bounding volume. If the room as a whole is visible we check the objects within, but if the bounding volume of the room is invisible, we know that we can exclude all objects held within.

In the end, before adding this bit of optimisation my engine comfortably rendered around 1000 trees with our LOD approach full screen on my trusty little MacBook Pro. After these changes that has increase 5 to 10 fold unless off course we move our camera off the ground and we have to render everything. If there is nothing to cull then there is no optimisation.

Adding bounding volume support to our meshNode library


We're adding our bounding volume logic to our meshNode library. Taking our tree as an example again, we have two meshes, one for the tree/branches and one for our leaves. There is no point in having separate bounding boxes for both, they pretty much occupy the same space and it is far easier to just have a single bounding box. So if we create this bounding box on the node that encompasses both we achieve this.

Also if we want to have a bounding box that encompasses a whole bunch of objects, such as say our room example, we can put this bounding box on the node that represents the room and make the contents of the room sub-nodes.

We're going to use our mesh3d object to store our bounding volume in, this may sound like a bit of overkill as we don't need normals, texture coordinates and material information but that is little overhead and why build something nearly identical? Also being able to render the bounding volume as if it is a normal mesh comes in handy for debugging. The above screenshot was created by rendering our bounding volume with a transparent shader.

So the first thing we've changed is add a variable called bounds to our meshnode structure which holds a pointer to a mesh3d object. We also added a function called meshNodeSetBounds which assigns a mesh3d object to be a nodes bounding volume which does the necessary retains and releases. Note that our copy function also copies the objects bounding object.

Next we've added a function called meshNodeMakeBounds, this function loops through all the child nodes and evaluates all meshes held within to calculate a bounding box that fits around all those objects. The code is pretty straight forward and its overkill to post it here so have a look at the original source code.

There are also two new helper functions for debugging purposes:

  • meshNodeSetBoundsDebugMaterial sets the material to use when creating bounding volume objects with which we can render them. This must be called before we create any bounding volumes. 
  • meshNodeSetRenderBounds turns rendering the bounds on and off. I've added a keypress in our engine.c for the "b"-key to toggle rendering the bounding volumes on and off. 

Checking our bounding volume has been added to meshNodeBuildRenderList which now gets a pointer to the full shaderMatrices object so we have access to our projection and view matrices.
As we're not yet loading our model matrix into this structure we've added a new function called shdMatGetViewProjection to get a matrix that combines our view and projection matrices. We need these to project our bounding box to on-screen coordinates so we can check if it will render.

After our LOD test but before we check our mesh we're now checking if we have a bounding volume and if so the following bit of code is run:
  if (pNode->bounds != NULL) {
    mat4 mvp;

    mat4Copy(&mvp, shdMatGetViewProjection(pMatrices));
    mat4Multiply(&mvp, &model);
    if (meshTestVolume(pNode->bounds, &mvp) == false) {
      // yes we're not rendering but we did pass our LOD test so we're done...
      return true;
    };
    ...
  };
Here we are taking our view-projection matrix, adding in our model matrix and calling meshTestVolume to check if our bounding volume is on screen. If not we return true as we don't want to check this node any further but we do want to tell our calling method that we would have otherwise rendered our object (i.e. we passed our LOD check).
There is also some code there to add our bounding box to our render list but I've not included it above.

The meshTestVolume function has been added to our mesh3d library:
// assume our mesh is a bounding volume, do a CPU based test to see if any part of the volume would be onscreen
bool meshTestVolume(mesh3d * pMesh, const mat4 * pMVP) {
  vec3 *  verts = NULL;
  int     i;
  bool    infront = false;

  if (pMesh == NULL) {
    return false;
  } else if (pMesh->vertices->numEntries == 0) {
    return false;
  } else if (pMesh->verticesPerFace != 3) {
    return false;
  };

  // first project our vertices to screen coordinates
  verts = (vec3 *) malloc(sizeof(vec3) * pMesh->vertices->numEntries);
  if (verts == NULL) {
    return false;
  };
  for (i = 0; i< pMesh->vertices->numEntries; i++) {
    mat4ApplyToVec3(&verts[i], (vec3 *)dynArrayDataAtIndex(pMesh->vertices,i), pMVP);
    if (verts[i].z > 0.0) {
      if ((verts[i].x >= -1.0) && (verts[i].x <= 1.0) && (verts[i].y >= -1.0) && (verts[i].y <= 1.0)) {
        // on screen, no need to test further ;)
        free(verts);
        return true;
      } else {
        // we found a vertice thats potentially in front of the camera
        infront = true;
      };
    };
  };
So in this first part we're applying our model-view-projection matrix to all of our vertices. As we're not actually rendering anything we can exit as soon as one of the vertices is on screen. We also set a flag if any of the vertices is in front of our camera by checking our z plane. If none of our vertices are, we're also off-screen and need check no further.
  // if atleast one vertice lies infront of our camera...
  if (infront) {
    // now test our faces
    for (i = 0; i < pMesh->indices->numEntries; i += 3) {
      int a = *((int *)dynArrayDataAtIndex(pMesh->indices, i));
      int b = *((int *)dynArrayDataAtIndex(pMesh->indices, i+1));
      int c = *((int *)dynArrayDataAtIndex(pMesh->indices, i+2));

      // we should check if we cross our near-Z-plane here and cut our triangle if so.
We now check each of our faces. At this point in time I'm ignoring two things that may improve the logic here: - no backface culling, actually we should cull front faces, but it may not be worth the extra work - we're not doing an intersection test with our Z-plane, now this one is important especially for large area bounding boxes. A worthwhile enhancement for the future.
      // sort them horizontally a is left most, c is right most, b is in the middle
      if (verts[a].x > verts[b].x) swap(&a,&b);
      if (verts[a].x > verts[c].x) swap(&a,&c);
      if (verts[b].x > verts[c].x) swap(&b,&c);

      if (verts[a].x > 1.0) {
        // completely offscreen to the right
      } else if (verts[c].x < -1.0) {
        // completely offscreen to the left
      } else if ((verts[a].y < -1.0) && (verts[b].y < -1.0) && (verts[c].y < -1.0)) {
        // off the top off the screen
      } else if ((verts[a].y > 1.0) && (verts[b].y > 1.0) && (verts[c].y > 1.0)) {
        // off the bottom off the screen
      } else {
All the above checks where simple checks that check whether our face is completely off screen, now we are left with edge cases, faces who's vertices are off-screen but that may be partially visible. You can call me lazy but I'm just going to assume part of them are on-screen. We could add the extra code to properly check but I believe we won't win a whole lot here. I'll leave this for a rainy day.
        // For now assume anything else is on screen, there are some edge cases that give a false positive
        // but checking those won't increase things alot...
        free(verts);
        return true;
      };
    };
  };

  // if we get this far its false..
  free(verts);
  return false;
};
And there we have it, a simple relatively low-cost bounding volume check to cull all the off-screen trees from our rendering loop.

Using our new bounding volumes


Our next step is actually using this new logic and that has become really simple.

After loading our tie-bomber, we simply call meshNodeMakeBounds on our first instance and we have a bounding box around our tie-bomber. When we then instance our tie-bomber the pointer to our bounding box is copied along.

We do the same for our trees, when we load LOD1 we create our bounds on our treeLod1 node, the same for treeLod2 BUT we don't do so for treeLod3. Note that our 3rd LOD is just a square, two triangles. Our GPU will exclude that far faster then our CPU doing a bounds check on a complete square!

With this in place we already have a huge performance increase but there is one more step that I want to add. Instead of adding our treeNode directly to our scene as we did in the previous part we're going to group them together in a 5000x5000 grid. This will allow us to discard a whole area of trees that is off-screen without checking each individual tree. I've created an array of meshNodes called treeGroups. This is a 100x100 array which is way bigger then we currently need but I was experimenting with larger fields. We initialise this with NULL pointers so we'll only end up using what we really need anyway.

So near the end of our addTrees method we've replaced our meshNodeAddChild(scene, treeNode) call with the following code:
    // add to our tree groups
    i = (tmpvector.x + 50000.0) / 5000.0;
    j = (tmpvector.z + 50000.0) / 5000.0;
    if (treeGroups[i][j] == NULL) {
      sprintf(nodeName, "treeGroup_%d_%d", i, j);
      treeGroups[i][j] = newMeshNode(nodeName);

      // position our node
      tmpvector.x = (5000.0 * i) - 50000.0;
      tmpvector.z = (5000.0 * j) - 50000.0;
      tmpvector.y = getHeight(tmpvector.x, tmpvector.z) - 15.0;
      mat4Translate(&treeGroups[i][j]->position, &tmpvector);

      // set a maximum distance as we wouldn't be rendering any trees if it's this far away
      treeGroups[i][j]->maxDist = 35000.0;

      // and add to our scene
      meshNodeAddChild(scene, treeGroups[i][j]);
    };

    // and now reposition our tree and add to our group
    treeNode->position.m[3][0] -= treeGroups[i][j]->position.m[3][0];
    treeNode->position.m[3][1] -= treeGroups[i][j]->position.m[3][1];
    treeNode->position.m[3][2] -= treeGroups[i][j]->position.m[3][2];
    meshNodeAddChild(treeGroups[i][j], treeNode);
By adding our offset of 50000.0 units and then dividing by our 5000.0 grid we find out in which area our tree is placed. We then check if we need to create a new tree group and add our tree group to our scene. We then reposition our tree in relation to our tree group and add our treeNode.

Finally we create bounding boxes for each tree group and release the tree groups (they are retained by our scene. Note that because we've already created bounding boxes for our trees themselves we reuse most of that information and save a bunch of processing:
  for (j = 0; j < 101; j++) {
    for (i = 0; i < 101; i++) {
      if (treeGroups[i][j] != NULL) {
        meshNodeMakeBounds(treeGroups[i][j]);

        meshNodeRelease(treeGroups[i][j]);
      };
    };
  };
The end result is something like this:


Note that our bounding boxes for each treeGroup aren't perfectly aligned nor should they be. Trees could be placed right at the edge and extend past the 5000x5000 border or we could have no trees near our border and have a smaller bounding box. That's all good:)

And voila, lots more trees at 60fps:

Download the source here

So where from here?


So there are a few improvements we can make here. Our bounding volume test logic can use some improvements such as clipping on the Z-near-plane and checking our edge cases though for both I'm not sure how much will improve.

Also something that is worth thinking about for a scene like hours is to change our terrain to map to our tree group bounding box. So instead of having one large terrain mesh we create a smaller one and render it as a child node of each tree group. This would result in much less terrain being handled through our GPU all though the exclusions we already make in the tessellation shader may already do enough.

Finally there is doing our GPU based testing. We'll get back to that in due time but if you can't wait, here is a great article from the GPU Gems book. It also provides more background info on what we've done so far with bounding volumes.

What's next?


I'll have to give this a little thought. I was planning to dive into changing the rendering to a deferred renderer but I'm also playing around with adding a basic shadow map to our engine first. This is one of those things that may be useful to cover first in a single pass renderer and then examine the differences in a deferred renderer but at the same time I wanted to leave the more complex lighting techniques till after we've switched to deferred rendering.

Sunday 20 March 2016

LOD3 - bill board (part 24 continued)

Okay, I've split this last bit into a separate post. The bill board part of it isn't as much the issue, but the way we're going to generate our texture is and the previous post was simply getting too long.

So let's first have a look at what the bill-board technique is all about. We're actually already using it's smaller cousin when rendering our trees. The idea with bill-boarding is that when objects are small enough, or far enough in the background, rendering an entire mesh is overkill, rendering a textured square with an image of the mesh is all that is needed.

For small objects our leaves are a good example, our biggest tree has 100 leaves and that was only limited to 100 because the sapling plugin didn't allow me to add more. I guess the idea is that you would use a whole branch as a texture but even for that the principle stays the same. Generally speaking you would never look close enough to a leaf to see it's just a flat textured square, having a complex shaped leaf mesh just strains the GPU for something that you'd never look at, especially when rendering hundreds of leaves.

But bill boarding really comes into its own when we look at distant objects. When we look at our forest, we'll have many trees in the background very often obscured by other trees. You'd never notice they are just textured squares, by the time you get close we're rendering the actual mesh.
Even at our lowest level of detail our trees have many hundreds of faces, why render them all if a textured square will do?

Another good example where this is often used is in cities. Buildings close to the viewer are rendered in detail, buildings in the background are just textured squares.

Another example would be grass where often we add animating the squares for added realism.

When we look at grass, our leaves and buildings the squares we render are often fixed in place. For buildings it is even common to use a cube because buildings are large enough, and featureless enough to make them look weird if they move in unnatural ways.

But for our trees we're going one step further, we are going to alter our shader to make our square always face our camera on the horizontal plane. This might seem strange because this would cause our tree to rotate along with our camera and trees don't do this, but when the tree is far enough in the distance the rotation isn't noticeable unless you go looking for it.

A fun anecdote here is playing Far Cry 4 on my PS3 which uses a version of this technique. With the added helicopter (boy that thing is fun!) you can actually fly high enough above the trees and because you're looking down on them, see they are bill boards that adjust to the camera. On better hardware you can see they keep rendering proper meshes much further in the distance. Having your engine adapt the distances for your LOD switches depending on the hardware you're running your engine on is thus not a bad thing to do.

Ok, to make a long story short, the bill-board itself for our third level of detail isn't hard at all. We simply generate a mesh which contains a textured square or rectangle:
    // create our texture
    tmap = newTextureMap("treeLod3");

    ...

    // create our material
    mat = newMaterial("treeLod3");
    matSetShader(mat, billboardShader);
    matSetDiffuseMap(mat, tmap);
    mat->shininess = 0.0;

    // create our mesh
    mesh = newMesh(4,2);
    meshSetMaterial(mesh, mat);
    vec3Set(&normal, 0.0, 0.0, 1.0);
    meshAddVNT(mesh, vec3Set(&tmpvector, -500.0, 1000.0, 0.0), &normal, vec2Set(&t, 0.0, 0.0));
    meshAddVNT(mesh, vec3Set(&tmpvector,  500.0, 1000.0, 0.0), &normal, vec2Set(&t, 1.0, 0.0));
    meshAddVNT(mesh, vec3Set(&tmpvector,  500.0,    0.0, 0.0), &normal, vec2Set(&t, 1.0, 1.0));
    meshAddVNT(mesh, vec3Set(&tmpvector, -500.0,    0.0, 0.0), &normal, vec2Set(&t, 0.0, 1.0));
    meshAddFace(mesh, 0, 1, 2);
    meshAddFace(mesh, 0, 2, 3);
    meshCopyToGL(mesh, true);

    treeLod3 = newMeshNode("treeLod3");
    meshNodeSetMesh(treeLod3, mesh);

    // cleanup
    matRelease(mat);
    meshRelease(mesh);
    tmapRelease(tmap);
Here we see we create a new material using our texture map, then create a new mesh with 4 vertices and two faces, and set up our 3rd LOD node.

This would however create a mesh that isn't oriented by the camera but just placed in a plane oriented by our instance node. For this trick we've used a separate shader. This shader uses our texture fragment shader but has a slightly simplified vertex shader:
#version 330

layout (location=0) in vec3 positions;
layout (location=1) in vec3 normals;
layout (location=2) in vec2 texcoords;

uniform mat4      projection;     // our projection matrix
uniform mat4      modelView;      // our model-view matrix
uniform mat3      normalView;     // our normalView matrix

// these are in view
out vec4          V;              // position of fragment after modelView matrix was applied
out vec3          Nv;             // normal for our fragment with our normalView matrix applied
out vec2          T;              // coordinates for this fragment within our texture map

void main(void) {
  // load up our values
  V = vec4(positions, 1.0);
  vec3 N = normals;
  T = texcoords;

  // we reset part of our rotation in our modelView and normalView
  mat4 adjModelView = modelView;
  adjModelView[0][0] = 1.0;
  adjModelView[0][1] = 0.0;
  adjModelView[0][2] = 0.0;
  adjModelView[2][0] = 0.0;
  adjModelView[2][1] = 0.0;
  adjModelView[2][2] = 1.0;

  mat3 adjNormalView = normalView;
  adjNormalView[0][0] = 1.0;
  adjNormalView[0][1] = 0.0;
  adjNormalView[0][2] = 0.0;
  adjNormalView[2][0] = 0.0;
  adjNormalView[2][1] = 0.0;
  adjNormalView[2][2] = 1.0;
  
  // our on screen position by applying our model-view-projection matrix
  gl_Position = projection * adjModelView * V;
  
  // V after our model-view matrix is applied
  V = adjModelView * V;
  
  // N after our normalView matrix is applied
  Nv = normalize(adjNormalView * N);  
}
We're basically just doing our normal vertex shader logic here, just concentrating on the outputs actually used by our fragment shader, but with one weird little trick. We're resetting the X and Z vectors in our model view and normal view matrices. This will cancel out our entire rotation except for that on the Y axis. If we'd reset that to we'd always be looking straight at our square.

That was the easy part. Now for the hard part.

Render to texture


At this point we could go into blender, render our tree so it fills the entire screen and write it out to a texture and use that. For the most part that would be the way to go and I would recommend it. But I thought this would be a good opportunity to look at another neat trick and generate the texture inside of our engine. We are going to render to texture.

Rendering to texture is something that can be a really useful tool, it will be our main technique when we switch to deferred lighting but it is also often used to animate things in textures or for doing things like reflections. In those cases we're continuously rendering the texture each frame before rendering our real scene, in this example we're only using it one time to create our texture.

In our texture map library I've added two new functions:
  • tmapRenderToTexture initialises our texture as the destination to render too allocating or reusing things we need for this.
  • tmapFreeFrameBuffers frees up our frame buffers and depth buffer created when tmapRenderToTexture is called. It is automatically called when the texture map is being destructed but you can call it early if you don't want to hang on to the resources.
We'll have a look at tmapRenderToTexture in more detail:
// Prepare our texture so we can render to it. 
// if needed a new framebuffer will be created and made current.
// the calling routine will be responsible for unbinding the framebuffer.
bool tmapRenderToTexture(texturemap * pTMap, bool pNeedDepthBuffer) {
  if (pTMap == NULL) {
    return false;
  };

  // create our frame buffer if we haven't already
  if (pTMap->frameBufferId == 0) {
So the first time we call this for our texture we're going to set up what is called a frame buffer. This is a container in OpenGL that holds all the state that allows us to render to one or more textures.
    GLenum drawBuffers[] = { GL_COLOR_ATTACHMENT0 };
    GLenum status;

    glGenFramebuffers(1, &pTMap->frameBufferId);
    glBindFramebuffer(GL_FRAMEBUFFER, pTMap->frameBufferId);

    glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, pTMap->textureId, 0);
So here we've generated our frame buffer and then bound it by calling glBindFrameBuffer. Binding it makes it active and we can start using it. Once we're done we simply unbind it by setting the frame buffer to 0 and we're back to rendering to screen. We also call glFramebufferTexture2D to bind our texture to a color attachment. Color attachments will bind to outputs in our fragment shader so we can output colors to multiple textures. For now we just use GL_COLOR_ATTACHMENT0
    // bind our texture map
    // init and bind our depth buffer
    if ((pTMap->depthBufferId == 0) && pNeedDepthBuffer) {
      glGenTextures(1, &pTMap->depthBufferId);
      glActiveTexture(GL_TEXTURE0);
      glBindTexture(GL_TEXTURE_2D, pTMap->depthBufferId);
      glTexImage2D(GL_TEXTURE_2D, 0, GL_DEPTH_COMPONENT32F, pTMap->width, pTMap->height, 0, GL_DEPTH_COMPONENT, GL_FLOAT, NULL);
      glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
      glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
      glFramebufferTexture2D(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, pTMap->depthBufferId, 0);
    };
This part is optional, here we generate and bind a depth buffer so we can do depth checks.
    // enable our draw buffers...
    glDrawBuffers(1, drawBuffers);
This last step tells us which of the textures we just bound will actually be used. Note that we've defined our array up top and are just activating our one texture.
    status = glCheckFramebufferStatus(GL_FRAMEBUFFER);
    if (status != GL_FRAMEBUFFER_COMPLETE) {
      errorlog(status, "Couldn't init framebuffer (errno = %i)", status);
      tmapFreeFrameBuffers(pTMap);
      return false;
    };
Finally we check if all this was successful!
  } else {
    // reactivate our framebuffer
    glBindFramebuffer(FRAMEBUFFER, &pTMap->frameBufferId);
  };

  return true;
};
Last but not least, if we already created our frame buffer, we simply reselect it.

Now we're going to use our frame buffer just once to render an image of our tree and thus will free up our framebuffer once we're finished. That code can be found right after we've loaded our two tree meshes. Here is the code:
    // create our texture
    tmap = newTextureMap("treeLod3");
    tmapLoadData(tmap, NULL, 1024, 1024, GL_NEAREST, GL_CLAMP_TO_EDGE, GL_RGBA8, GL_RGBA, GL_UNSIGNED_BYTE);
    if (tmapRenderToTexture(tmap, true)) {
Here we've done 3 things, create our texture map object, initialize an empty 1024x1024 RGBA texture and called tmapRenderToTexture so we're ready to render our texture.
      shaderMatrices  matrices;
      lightSource     light;
      mat4            tmpmatrix;

      // set our viewport
      glViewport(0,0,1024,1024);

      // clear our texture
      glClearColor(0.0, 0.0, 0.0, 0.0);
      glClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT);
This code should be pretty familiar as it is little different then rendering to screen, we set our viewport to cover our entire texture and clear our texture. Notice our we set our background to completely transparent!
      // setup our lightsource
      vec3Set(&light.position, 0.0, 1000000.0, 0.0);
      vec3Set(&light.adjPosition, 0.0, 1000000.0, 0.0);
      light.ambient = 1.0; // all ambient = no lighting
Here I'm cheating, I've set my ambient lighting factor to 1.0 so we basically have no lighting. This is where a lot of improvement can be made by setting up some proper lighting for rendering our texture.
      // setup matrices
      mat4Identity(&tmpmatrix);
      mat4Ortho(&tmpmatrix, -500.0, 500.0, 1000.0, 0.0, 1000.0f, -1000.0f);
      shdMatSetProjection(&matrices, &tmpmatrix);

      mat4Identity(&tmpmatrix);
      mat4LookAt(&view, vec3Set(&eye, 0.0, 500.0, 1000.0), vec3Set(&lookat, 0.0, 500.0, 0.0), vec3Set(&upvector, 0.0, 1.0, 0.0));  
      shdMatSetView(&matrices, &tmpmatrix);
I'm using an orthographic projection here, I thought that would give a nicer result. We also setup a very simple view matrix to ensure we're looking dead center at our tree mesh.
      // and render our tree to our texture
      meshNodeRender(treeLod2, &matrices, (material *) materials->first->data, &light);
And finally we render our tree.
      glBindFramebuffer(GL_FRAMEBUFFER, 0);
      tmapFreeFrameBuffers(tmap);
    };
And finally we make sure we unbind our frame buffer so our output goes to screen again and we free our frame buffer and voila, we have a texture!

Note that we're doing this all during our loading stage.

And here is our end result:



Download the source here

So where from here?


There is a lot of room for improvement but we've got all the basic ingredients now. For our render to texture we can experiment more with lighting and it would help to evaluate our shader a bit more as the flatness of the mesh makes it light up when you look towards the trees with the sun in your back.

Obviously using more detailed meshes, maybe adding a few more levels of details, and having different trees will result in a much nicer forest.

Finally rendering our leaves needs some shader enhancements to deal with the double sidedness of the leaves. On that same subject, adding some randomness here so each leaf is slightly differently colored would be a nice addition to the shader as well. Worth thinking about.

What's next?


Next time around we'll start looking at bounding boxes to skip rendering trees in our forest that aren't on screen so we can add more trees:)

Saturday 19 March 2016

Level of detail on meshes (part 24)

When we look at the terrain we're adding our level of detail on the GPU.

I've applied a similar technique in the space colonization write-up I did last year to generate tree models. There we start with a basic cube based mesh and increase the LOD on the GPU as well. The result is a nice subdivided mesh but it is a technique only really suitable for organic shapes as the end result is a nicely rounded model. Also when we look at models with a lot of small details you may only wish to have those details rendered when the model is close enough to the viewer.

For our example today we're again going to use a tree but this time one that is pre-rendered at two different levels of detail and we'll introduce a third level of detail through a technique called billboarding. We'll then use these to render a whole forest. Well, a bunch of trees anyway as we still have a way to go to remove overhead in rendering things we shouldn't render. Still I went from 5 fps rendering all 1000 trees at full LOD back to 60 fps rendering all the trees with our LOD checks in place so. The goal however is to render loads more trees.

The basis of the LOD system is to have multiple versions of the same mesh at different levels of detail and choose which one to render based on the distance between the object and the camera. We're going to implement our LOD on the nodes we introduced in part 21. This allows us to do the LOD switching on a node and render multiple related meshes. In our case there will be two, one for the tree and one for the leaves but the system would allow us to make further LOD choices on the details using the highest definition main mesh but bringing in more detail when it comes closer. Think of a high level mesh for a car but having the interior only defined when we're really close to the camera.

Our structure for a single instance of a tree will be something like this:
- Tree_[n], firstVisOnly => true
  - TreeLod1, maxDist = 5000.0
    - Hidef tree trunc mesh
    - Hidef leaves mesh
  - TreeLod2, maxDist = 15000.0
    - Lodef tree trunc mesh
    - Lodef leaves mesh
  - TreeLod3, maxDist = 0 (0 = unlimited)
    - Billboard

On our container node we turn a new property "firstVisOnly" to true which means it will only render the first child of this node deemed visible.
Then each child node has a maxDist, the TreeLod1 will be rendered if we're less then 5000.0 units from the camera, TreeLod2 if we're less then 15000.0 units from the camera, else TreeLod3 is rendered.

Note that the first two levels of detail are loaded as meshes but the third we're going to generate in our engine so yes, I'm sneaking a render to texture into this write-up !!

As always however, we've got a few things to do before we can get into this.

Small changes to our materials library


There are a few small changes to our materials library.

The first is that our matSelectProgram now returns a boolean. If we can't setup our shader it's likely we won't be able to in subsequent renders so I wanted to return a false so we can stop rendering meshes that fail to render due to an issue on its material and not fill the logs with copies of the same error message.

The second is the introduction of a new variable called twoSided which basically turns off our backface culling. This does need some more work as the shaders need to be adjusted but it will do for now.

The third is a tiny change that we now communicate our ambient value and store that into our lightSource structure. Note that for this we've also added retrieving our uniform id for our ambient factor in our shader code.

And the last is one that has a lot of impact for such a simple thing and that is that we turn our diffuse maps into mipmaps when loading our materials file. Again this could use a bit more intelligence as we could have textures with mipmaps already baked in. We'll discuss mipmaps in a bit more detail later on.

Small changes to our mesh library


Our mesh library has a few small changes as well.

Deceptively small I've enhanced the solution we added last part to render between triangles and patches to also add rendering lines. We won't be using that yet in this part but it'll be very handy in our next part. As a result some of the methods that generate meshes now have a parameter to state if we want lines, triangles or quads. We also have an meshAddLine for adding lines to our mesh.

For this part however I've added meshAddVT which is a method that works the same way as meshAddVertex but instead of taking a vertex, it takes its individual components i.e. a positional vector, a normal and a texture coords.  We'll be using it to create our bill board mesh.


Adding the LOD functionality to our mesh library


Okay, now it's time to get into the thick of things. As per our introduction up above we've added two new variables to our meshnode structure:
- firstVisOnly, defines we're only going to render the first child node that is deemed visible
- maxDist, defines the maximum distance from the camera before the mesh is no longer visible

Our main logic all happens in meshNodeBuildRenderList which builds our arrays of meshes to be rendered from our nodes. Other then some of the parameters now being defined as constant for safety we have one new parameter which passes the coordinates of the camera.

We also now return true or false depending on whether the node was added to our render list.

Let's have a closer look at this method:
// build our no-alpha and alpha render lists based on the contents of our node
bool meshNodeBuildRenderList(const meshNode * pNode, const mat4 * pModel, const vec3 * pEye, dynarray * pNoAlpha, dynarray * pAlpha) {
  mat4 model;
  
  // is there anything to do?
  if (pNode == NULL) {
    return false;
  } else if (pNode->visible == false) {
    return false;
  };
  
  // make our model matrix
  mat4Copy(&model, pModel);
  mat4Multiply(&model, &pNode->position);
So far nothing much has changed, we return false if our node is not initialized or marked as invisible an add our model matrix to our parents matrix.
  // check our distance
  if (pNode->maxDist > 0) {
    float distance;
    vec3  pos;

    // first get our position from our model matrix
    vec3Set(&pos, model.m[3][0], model.m[3][1], model.m[3][2]);

    // subtract our camera position to get our relative position
    vec3Sub(&pos, pEye);

    // and get our distance
    distance = vec3Lenght(&pos);

    if (distance > pNode->maxDist) {
      return false;
    };
  };
So this is our main distance check, we grab our position from our model matrix and subtract the position of our camera. We then calculate the distance and compare this to our maximum distance. We skip the whole check if no maximum distance is set.
  if (pNode->mesh != NULL) {
    if (pNode->mesh->visible == false) {
      return false;
    };

    // add our mesh
    renderMesh render;
    render.mesh = pNode->mesh;
    mat4Copy(&render.model, &model);
    render.z = 0.0; // not yet used, need to apply view matrix to calculate
    
    if (pNode->mesh->material == NULL) {
      dynArrayPush(pNoAlpha, &render); // this copies our structure
    } else if (pNode->mesh->material->alpha != 1.0) {
      dynArrayPush(pAlpha, &render); // this copies our structure      
    } else {
      dynArrayPush(pNoAlpha, &render); // this copies our structure      
    };
  };
This bit remains the same, we just add our mesh to either our non-alpha or alpha array.
  if (pNode->children != NULL) {
    llistNode * node = pNode->children->first;
    
    while (node != NULL) {
      bool visible = meshNodeBuildRenderList((meshNode *) node->data, &model, pEye, pNoAlpha, pAlpha);

      if (pNode->firstVisOnly && visible) {
        // we've rendered our first visible child, ignore the rest!
        node = NULL;
      } else {
        node = node->next;
      };
    };
  };

  return true;
};
Finally in this last part we recursively call our function for our child nodes but we now have the added check that if a node has resulted in meshes being added and our firstVisOnly variable is set, we do not evaluate the rest of the children and exit.


Loading our trees



I'm going to leave the render to texture and bill-board till last (I'm tempted to split this article in two to keep the size down) so let's have a look at our changes in engine.c.

Just to keep the logic more readable I've moved the code loading and positioning our tie-bombers into a function and created a new function called addTrees to add our trees into our scene.

Before we get to our trees however you'll notice another small change and that is that I no longer discard our height field and I keep a pointer to it. I've added a method called getHeight that allows me to retrieve a height value from our map. For this a color lookup has been added to our texture library which for now assumes our data is filled with standard 32bit RGBA values.

I'll be using this function for placing our trees but I'm also using it in our engine_update to ensure that we don't move our camera through the ground.

Another small change is that I'm checking for an 'i' keypress to switch the text we're rendering on/off. Nice little touch as it's just debug info.

So in our addTrees function we're loading our two tree meshes. These btw where generated using the sapling blender plugin so they aren't particularly great meshes but they will do. For a real forest you'd probably have models of a number of different trees to create some diversity.

The loading code for the two meshes is pretty much the same so I'll just show the code for the first one below:
  // load our tree obj files
  text = loadFile(pModelPath, "TreeLOD1.obj");
  if (text != NULL) {
    llist *       meshes = newMeshList();

    // just scale it up a bit
    mat4Identity(&adjust);
    mat4Scale(&adjust, vec3Set(&tmpvector, 40.0, 40.0, 40.0));

    // parse our object file
    meshParseObj(text, meshes, materials, &adjust);

    // and package as a tree node
    treeLod1 = newMeshNode("treeLod1");
    treeLod1->maxDist = 5000.0;
    meshNodeAddChildren(treeLod1, meshes); 

    // and free up what we no longer need
    llistFree(meshes);
    free(text);
  };
So nothing much different here from how we loaded our tie-bomber with the exception that we're setting our maxDist on our node. The tree loaded looks a little like this:


Our second level of detail is the same, we just load a different mesh and change our distance. This tree has much less detail and only 1/4th the number of leaves:

Finally our third level of detail we generate by rendering to a texture and presenting it as a billboard. We'll skip that right now and come back to this later but the end result looks like this:

There is definitely room for improvement here but as it's only rendered when very far away, we don't care as much and it is a really cheap way to get lots of assets rendered in the background.

All that is left now is adding all the trees and we do that in a loop where we randomly place our trees. I'm not doing any checks to rule out trees places too close to others, just keeping it simple:
  // add some trees
  for (i = 0; i < 1000; i++) {
    meshNode * tree;
    char       nodeName[100];

    // create our node
    sprintf(nodeName, "tree_%d", i);
    tree = newMeshNode(nodeName);
    tree->firstVisOnly = true; // only render the highest LOD

    // add our subnodes
    if (treeLod1 != NULL) {
      meshNodeAddChild(tree, treeLod1);
    };
    if (treeLod2 != NULL) {
      meshNodeAddChild(tree, treeLod2);
    };
    if (treeLod3 != NULL) {
      meshNodeAddChild(tree, treeLod3);
    };

    // position our node
    tmpvector.x = randomF(-30000.0, 30000.0);
    tmpvector.z = randomF(-30000.0, 30000.0);
    tmpvector.y = getHeight(tmpvector.x, tmpvector.z) - 15.0;

    mat4Translate(&tree->position, &tmpvector);

    // and add to our scene
    meshNodeAddChild(scene, tree);

    meshNodeRelease(tree);
  };
So we loop 1000 times to create 1000 instances of our trees, inside our loop we:
  • create a new meshnode, 
  • set firstVisOnly to true
  • add our 3 levels of details nodes as child nodes
  • randomly position our tree and obtain the Y by looking up the height of our heightmap
  • and add the new node to our scene after which we can release it as our scene retains it
Finally at the end of this method we release our three LOD nodes as they are now retained by all our instance nodes.

And here is our end result:





Okay, I think that is already way to much info, I am going to split this writeup in two. All the code including creating the bill boards is already on my GitHub page but I'll be continueing with the second half of this write-up later in the weekend.

So, to be continued...



Thursday 17 March 2016

Some small fixes

I've checked in a few small bugfixes.

A rather dumb mistake had snuck into the standard vertex shader that caused the normal matrix to be applied twice and that had some interesting effects on the lighting.
I've also checked in a few tweaks on the height map shaders.
Finally there is a small fix in the getPixel method in my texture map library and I've added a function for calculating mipmaps. We're not using these yet but we will soon.

Sorry for the lack of updates but I've been very busy behind the scenes. I'm slowly building out the platformer but I want to be a little further along before I start writing up tutorials as I'm unsure about a few things at this point in time.
Also there are a few more enhancements I wish to make to the engine that are easier to demonstrate in my little 3D scene we've got right now.

Don't hold me to it but I'm planning the following sections in the near future:
- a way to switch between meshes with different LOD based on the distance of an object
- adding bounding boxes to limit how much we draw
- switching to a deferred renderer
- shadows

By then I think we'll be ready to jump back into the platformer.

Thursday 3 March 2016

Adding a height field #2 (part 23)

Okay, let's continue with our height field. In this post I'll look at using OpenGLs tessellation shader to automatically vary the level of detail for our height field.

Now this is OpenGL 4.0 functionality only supported on relatively new hardware so we'll keep our shader from last session as a backup if we lack support.

But before we dive into the subject matter, as always, we have some housekeeping to do.

Doing a backflip...


Unfortunately we start with a backflip, I hate doing these but in my previous post I did make a dumb mistake. I decide to take our skybox out of our scene and make our height field a separate mesh object and render both separately after our scene has rendered. The idea being to force these to be the latest things being rendered as they fill large parts of the screen and more often then not are drawn behind other objects.

What I forgot is that doing so also means we're drawing them after we draw any transparent things. That would be pretty desasterous in the long run.

So both are added to our scene and rendered as part of our scene. Once we start optimizing our render loop we'll have to keep in mind that we'll want to render these at the right moment.


Isolating a few things


If you look at the source you'll see I also made two small but significant changes to two headers.
The first is that I've renamed my errorlog.h header to system.h and added the code for loading a file into a buffer in there. The idea is to start using this library to gather some support functions that I may want to be able to swap out easily when compiling for other platforms.

Similarly I've added a little helper header file called incgl.h which contains our includes for GLEW and GLFW. Again the same idea, I want to have file I can easily change to swap out our framework without having to redo my rendering libraries. For instance I may leave GLEW and GLFW behind for compiling to iOS and instead include iOS's OpenGL headers.

That is future music though.

Enhancing the shader library


With the introduction of 3 new shaders (yes 3) it was also long overdue to enhance our shader library. This library is now working along the same lines as a number of the others. We no longer use the shaderStdInfo struct directly but always use pointers through calling the newShader() function. We're using a retain count so we also have a shaderRetain() and shaderRelease() function.

I've renamed the structure to shaderInfo, I wanted to simply call it shader but alas, thats such a common word its already used in one of the 3rd party support libraries and since we're sticking with plain vanilla C, no namespacing.... rats.

I'm still using my shaderCompile and shaderLink functions as they were before but am now calling them from our newShader function. I've also added a shaderLoad function that loads a given file and calls shaderCompile basically making it loads easier to load stuff.

In the end in our engine we can simply use newShader, give the shader a name and provide the file names of the 5 individual shaders that make up a shader. These are:
  • Our vertex shader
  • Our tessellation control shader
  • Our tessellation evaluation shader
  • Our geometry shader
  • Our fragment shader
Don't worry, we'll look into them in more detail later.

In the end to compile our 5 current shaders our load_shaders() function becomes very simple:
void load_shaders() {
  // init some paths
  #ifdef __APPLE__
    shaderSetPath("Shaders/");
  #else
    shaderSetPath("Resources\\Shaders\\");
  #endif

  skyboxShader = newShader("skybox", "skybox.vs", NULL, NULL, NULL, "skybox.fs");
  hmapShader = newShader("hmap", "hmap.vs", NULL, NULL, NULL, "hmap.fs");
  colorShader = newShader("flatcolor", "standard.vs", NULL, NULL, NULL, "flatcolor.fs");
  texturedShader = newShader("textured", "standard.vs", NULL, NULL, NULL, "textured.fs");
  reflectShader = newShader("reflect", "standard.vs", NULL, NULL, NULL, "reflect.fs");
};
If anything goes wrong during compilation of a shader we'll deal with the error and a NULL pointer is returned and as we specifically check for NULLs in our shader code, we're pretty safe.

We'll get back to our shaders in a minute but first....

Quads!!


When we did our height field in our previous post we created a mesh just like all others build out of triangles. OpenGL can only render triangles, very simple.

For tessellation triangles aren't really the best geometry. Quads (4 sided polygons) are much better suited. Tessellation just in case you haven't realized yet is a process where we take a polygon and subdivide it into smaller polygons so we can increase our detail. Our squares in our height field, before we apply our height, are 1000.0 x 1000.0 squares (made up of 2 triangles). When it's far enough from the camera that's a fine size but as we get closer to our camera, that's an awfully big square. In our previous example we could see that our terrain got very blocky. Here is a wireframe that more clearly shows the effect:

We could use much smaller squares for our height field which would really work well closer to our camera, but it would increase our polygon count greatly and polygons further away from our camera would soon be smaller then a pixel which is incredibly inefficient.

I suggested in my previous post that you could change the mesh we're creating to bake in additional detail nearer to the camera but this static approach ultimately is very limited (but still a worth while enhancement).

It would be great if OpenGL can add in the required detail and, tada, it can. That is exactly what the tessellation shader does. Our shaders I should say because there are two. The first allows us to take a square as input and decide how much detail we wish to add. The second, our evaluation shader, is pretty much a vertex shader that works on the newly added vertices. While we input quads the output of our tessellation shaders are triangles which are subsequently rendered just like before.

While we're introducing a pretty linear implementation of a tessellation shader here you can use these for much more interesting processes such as implementing nurbs. For this reason we talk about our quads as being patches.

Anyway, to make a very long story short, we need to do two things before we can start implementing our new shader.

The first is detecting if we have support for tessellation and we do this by requesting the number of patches and the maximum tessellation level that is supported like this:
  // get info about our tesselation capabilities
  glGetIntegerv(GL_MAX_PATCH_VERTICES, &maxPatches);
  errorlog(0, "Supported patches: %d", maxPatches);
  if (maxPatches >= 4) {
    // setup using quads
    glPatchParameteri(GL_PATCH_VERTICES, 4);

    glGetIntegerv(GL_MAX_TESS_GEN_LEVEL, &maxTessLevel);
    errorlog(0, "Maximum supported tesselation level: %d", maxTessLevel);
  };
Here we've requested our maxPatches, which we check is atleast 4 to make a quad (yes some OpenGL hardware support patches with many more control points) and then figure out our maximum tessellation level which defines how far we can subdivide a single quad.

Now that we know whether we have support for quads, we need to enhance our mesh object to be able to store indices for quads instead of triangles. So when you look at mesh3d we see a new variables has been added to our structure called verticesPerFace.
This defaults to 3.

Now we can't mix triangles and quads so I've simply added a function called meshAddQuad which adds a quad instead of a triangle and changes our variable to 4. Doing so for the first time will also reset our indices buffer just in case, it should be empty. Our meshAddFace method has been changed similarly but sets verticesPerFace to 3.

I've also enhanced my meshMakePlane function to specify whether I want triangles or quads, it is currently the only code that generates a mesh that supports quads.

Finally our meshRender function has been enhanced to use the constant GL_PATCHES instead of GL_TRIANGLES when calling glDrawElements.

When we initialize our height field we simply send true to our meshMakePlane pAddQuads parameter if we support quads.

Our new vertex shader

So now it's time to have a look at our shaders. Note that I've names the height map shader that support tessellation "hmap_ts" so we can load either those shaders or "hmap" depending on whether tessellation is supported.

When we look at our vertex shader we see that we mostly gutted it. Note that we assign gl_Position BEFORE applying our view matrix and without adding our height. This is very important if we want even tessellation.

I've also tweaked some of our scaling factors. First off I've gone to a 2000.0x2000.0 starting size of our square as we'll be adding enough detail in to warrant this, and I've also enlarged the space our height map covers. It depends a little on the quality of our height map but having steep inclines can result in some nasty visual artifacts as the level of detail is adjusted. Remember we're using a pretty crappy 225x225 map. 

I do calculate the height of our vertex, then apply our view and projection matrix and output our screen coordinate for use in our tessellation shader. This is because we use the size at which we render our polygon on screen to determine how much we'll tessellate our polygons.

Other then that I've pretty much explained all that is done in our shader in our previous parts.

Our tessellation control shader

This is where our magic starts. It's a bit funny how this is called because it is called for every vertex in our quad but with all quad information available. Note that this means it is also called multiple times for the same vertex if it is used for multiple quads.

For a quad it is called with gl_InvocationID set from 0 to n-1 where n is the number of vertices for that quad and as you can see from our mesh3d code a quad has its vertices ordered as follows:
 2------3
 |      |
 |      |
 0------1
We do the bulk of our work when InvocationID is 0 and for the other 3 calls simply pass through our control points.

Let's look at our inner function in more detail:
void main(void) {
  if (gl_InvocationID == 0) {
    // get our screen coords
    vec3 V0 = Vp[0];
    vec3 V1 = Vp[1];
    vec3 V2 = Vp[2];
    vec3 V3 = Vp[3];
Maybe a bit of overkill but we copy our 4 points into local variables.
    // check if we're off screen and if so, no tessellation => nothing rendered
    if (
      ((V0.z <= 0.0) && (V1.z <= 0.0) && (V2.z <= 0.0) && (V3.z <= 0.0))              // behind camera
      || ((V0.x <= -falloff) && (V1.x <= -falloff) && (V2.x <= -falloff) && (V3.x <= -falloff)) // to the left
      || ((V0.x >=  falloff) && (V1.x >=  falloff) && (V2.x >=  falloff) && (V3.x >=  falloff)) // to the right
      || ((V0.y <= -falloff) && (V1.y <= -falloff) && (V2.y <= -falloff) && (V3.y <= -falloff))   // to the top
      || ((V0.y >=  falloff) && (V1.y >=  falloff) && (V2.y >=  falloff) && (V3.y >=  falloff)) // to the bottom
    ) {
      gl_TessLevelOuter[0] = 0.0;
      gl_TessLevelOuter[1] = 0.0;
      gl_TessLevelOuter[2] = 0.0; 
      gl_TessLevelOuter[3] = 0.0; 
      gl_TessLevelInner[0] = 0.0;
      gl_TessLevelInner[1] = 0.0;
Here we're checking if all 4 points are behind the camera, to the left of the screen, to the right of the screen, to the top of the screen or to the bottom of the screen, or in other words, if our whole quad is visible. If it's not, why bother rendering it? Simply set all our levels to 0 and we won't render the quad.
    } else {
      float level0 = maxTessGenLevel;
      float level1 = maxTessGenLevel;
      float level2 = maxTessGenLevel;
      float level3 = maxTessGenLevel;
Our tessellation works by specifying how many subdivisions we want for each edge. We default to the maximum level possible.
      // We look at the lenght of each edge, the longer it is, the more detail we want to add
      // If any edge goes through our Camera plane we set maximum level
      
      if ((V0.z>0.0) && (V2.z>0.0)) {
        level0 = min(maxTessGenLevel, max(length(V0.xy - V2.xy) * precision, 1.0));
      }
      if ((V0.z>0.0) && (V1.z>0.0)) {
        level1 = min(maxTessGenLevel, max(length(V0.xy - V1.xy) * precision, 1.0));
      }
      if ((V1.z>0.0) && (V1.z>0.0)) {
        level2 = min(maxTessGenLevel, max(length(V1.xy - V3.xy) * precision, 1.0));
      }
      if ((V3.z>0.0) && (V2.z>0.0)) {
        level3 = min(maxTessGenLevel, max(length(V3.xy - V2.xy) * precision, 1.0));
      }
So above we've determined our actual levels by taking the length of each edge of our quad and multiplying that by our precision. The longer the edge, the more we subdivide it.
      gl_TessLevelOuter[0] = level0;
      gl_TessLevelOuter[1] = level1;
      gl_TessLevelOuter[2] = level2;  
      gl_TessLevelOuter[3] = level3;  
      gl_TessLevelInner[0] = min(level1, level3);
      gl_TessLevelInner[1] = min(level0, level2);
So the levels we've calculated are copyied in our gl_TessLevelOuter[n] output variables which define how much the outer edge of our quad get subdivided. But we also have two inner levels, one for our 'horizontal' spacing and one for our 'vertical' one. We take the lower of the two related edges. This is how much the quad gets subdivided internally.
    }
  };
  
  // just copy our vertices as control points
  gl_out[gl_InvocationID].gl_Position = gl_in[gl_InvocationID].gl_Position;
}
And at the end we simply copy our position. Again we're doing mostly linear interpolation here, no fancy nurbs stuff. If we did we would possibly be adjusting our points here.

Our tessellation evaluation shader


As a result of our control shader our mesh is now being subdivided into many more quads and as a result we're introducing many more vertices. Our evaluation shader is run for each of those vertices (including our original ones) so that we can determine their final position. Note that OpenGL hasn't done anything in positioning our new vertices at all, instead it is calling our evaluation shader with all adjoining vertices and two weights with which we can determine the new position of our vertex.

Again we're doing a simple linear interpolation here so:
  // Interpolate along bottom edge using x component of the
  // tessellation coordinate
  vec4 V1 = mix(gl_in[0].gl_Position,
                gl_in[1].gl_Position,
                gl_TessCoord.x);
  // Interpolate along top edge using x component of the
  // tessellation coordinate
  vec4 V2 = mix(gl_in[2].gl_Position,
                gl_in[3].gl_Position,
                gl_TessCoord.x);
  // Now interpolate those two results using the y component
  // of tessellation coordinate
  vec4 V = mix(V1, V2, gl_TessCoord.y);
We can see that we have our 4 input vertices which are the four related to the quad we've just subdivided and inserted our vertex for and our gl_TessCoord weights. The code above does a simple interpolation of our points to end up with vertex V.

From this point on wards we're basically doing what we did in our original vertex shader in our previous post, calculate our height, calculate our texture coordinate, calculate our normals and apply our matrices.

If you want an example of a much more complex tessellation shader look back a bit on this blog to my posts about my procedurally generated tree or look up its source code on my github page. It contains a sub-divisor that also rounds our resulting mesh.

Our geometry shader


Okay, so our geometry shader actually does nothing, you could take it out and simply change our fragment shader to take the inputs of our tessellation evaluation shader.

I only added it in so I could introduce it. The geometry shader allows you to take something that we're rendering, in our case our triangle, and output additional geometry shapes. You could implement a tessellation shader with it if you wanted to but be aware, it wouldn't perform as well as our tessellation shader does.

But using our geometry shader can be very useful for a height field as we could introduce things like simple ground vegetation. Something for another day however.

Our fragment shader


And finally we've ended up at our fragment shader and it is nearly identical to the one in our previous post. It's only difference is that it takes its inputs from our geometry shader instead of our vertex shader.

On a side note here, note that instead of defining each output and input individually as we did before, we've now started using structs such as these:
in GS_OUT {
  vec2  T;
  vec3  N;
  vec4  V;
} fs_in;
I'm not aware of any performance benefits, it's mostly about readability of the shader.

The final result

So here is our final result:

I'll see about adding a movie in due time as it's interesting to see the LOD change as you move around.

Note that for our source code I'm now branching, we'll see if that is easier then maintaining the archive subfolder.

Download the source here

So where from here?


Well the same thing really stands from our last post. It makes a lot of sense to add some basic level of detail to our initial mesh. Especially our quads further away are now so small we're wasting rendering time but it would be good to have some very large quads at the edges of our mesh for far away geometry.

Having a way to load in additional texture maps to create an ever expanding world would be good too.

Also actually creating an interface that allows us to edit the height maps from within our engine would be great. I actually have this up and running in a previous experiment so who knows I may come back to that some day.

What's next?

I honestly don't know yet. I've been working on the 3D version of the platformer a bit during the week and I might return to it for awhile, but I'm also tempted to extend my height field write up. I guess you'll see soon which one won my attention :)