Monday 29 December 2014

Space Colonization Algorithm Part 2

One issue that I ran into a few times after finishing part one is that sometimes a branch would end with equal distance between two attraction points. When adding a new node I call the method called growBranch that adds the new vertex and node to the tree.
This logic has been enhanced in 3 small ways:
- I look up the "parent" node to make further processing easier
- I update a counter for the number of child nodes, this will allow us to vary the thickness of the tree later on.
- I calculate the dot product of the vector of the parent node and the new node to see if it is backtracking on itself (which is what happens in the aforementioned situation) and branch out in a separate direction using the cross product of the two vectors.

Calculating the dot product of two normalised vectors gives us the cosine for the angle between the two vectors, a very handy and oft used little calculation. A value of 1.0 means the vectors are in the same direction, a value of 0.0 means they are at right angles and a value of -1.0 means they are in opposite direction (which is what we're checking for).

The cross product of two vectors gives us a vector that is at right angle to the two vectors. Very handy.


Interface changes

Another small change I added is that I no longer rotate the model automatically but rotated it based on scrolling horizontally and going closer/further away using vertical scrolling. This works really well on a MacBook pro as these actions are controlled by using two fingers on the touchpad but may be unwanted using traditional mice, you may want to stray away from this.

I'm also drawing larger dots for the vertices of the tree to better show the output of the algorithm. 

Optimising the nodes

The real enhancement and the subject of this blog post is the added optimisation step.
After generating our tree (and zooming in a bit), we can see how many segments we've created:

This level of detail is overkill for what we're doing. Again our trusty dot product comes to the rescue. If we have two nodes whose vectors are pointing in (roughly) the same direction we could merge the nodes to a single node between the two end points and remove the point in the middle. We can only do this if a node has only one child node, if it has multiple branches the detail remains. We thus need to test for that situation.

void treelogic::optimiseNodes() {
  unsigned int node = 0;
  std::vector<unsigned int> children;
  
  while (node < mNodes.size()) {
    unsigned int mergeWith = 0;
    
    // we need to find out how many children we have, we can only optimise if just one is found
    children.clear();
    for (unsigned int n = node+1; n < mNodes.size(); n++) {
      if (mNodes[n].parent == node) {
        children.push_back(n);
      };
    };
    
    // only one child? check if we need to merge
    if (children.size() == 1) {
      vec3  parentVector = mVertices[mNodes[node].b] - mVertices[mNodes[node].a];
      parentVector = parentVector.normalized();      

      vec3  childVector = mVertices[mNodes[children[0]].b] - mVertices[mNodes[children[0]].a];
      childVector = childVector.normalized();
      
      // use dot product, this gives our cosine, the closer to 1.0 the more the vectors match direction
      if ((parentVector % childVector) > 0.9) {
        mergeWith = children[0];
      };
    };
    
    // and merge
    if (mergeWith!=0) {
      unsigned int eraseVertice = mNodes[node].b; // should be same as mNodes[mergeWith].a, this we'll erase..
      
      // merge our nodes...
      mNodes[mergeWith].a = mNodes[node].a;
      mNodes[mergeWith].childcount = mNodes[node].childcount;
      mNodes[mergeWith].parent = mNodes[node].parent;
      mNodes.erase(mNodes.begin() + node);
      
      // erase our vertice
      mVertices.erase(mVertices.begin() + eraseVertice);
      
      // adjust our nodes
      for (unsigned int n = 0; n < mNodes.size(); n++) {
        if (mNodes[n].parent > node)mNodes[n].parent--;
        if (mNodes[n].a > eraseVertice) mNodes[n].a--;
        if (mNodes[n].b > eraseVertice) mNodes[n].b--;
      };      
    } else {
      node++;
    };
  };
};

This removes most of our in between nodes:


At this point we can see that we've removed a little to much detail because some of our curves have gone. This is because the small angles between nodes when there are a lot of segments means we end up removing too much.  I'm still playing around to see if this can be improved.

Next up however, taking our nodes and turning it into a mesh that we can then further process. 

Space Colonization Algorithm Part 1

Hitting a roadblock with the model I was using to test my engine I found myself working on a few related projects.

The first was implementing a texture blending technique that was described here: http://www.nongnu.org/techne/research/blending/blending.pdf , I'll write some more about that at a later stage but it was fun to do and works amazingly well.

The second was that trees and vegetation were going to be needed at some point in time. Now I could simply import models of trees and other things but I wanted to find out if it would be possible to generate these and do this in a way that would allow me to create multi LOD meshes for the same tree. Searching the web you find a lot of neat tools and small tutorials.

Most seem to favour a technique called L-System fractals. Google on that term if you wish but it is basically a way to notationally describe a fractal system that can be used to generate a tree (and many other things). However there is a lot of trial and error involved and the more complex and fully grown a tree becomes, the more roadblocks your seem to hit.

Eventually, buried deep down in the search results someone mentioned a technique called "Space Colonization" which I promptly ignored due to the name suggesting it had more to do with space combat games then procedurally generating vegetation. But after it came back a few times most notably on this amazingly cool blog I starting looking into it. Eventually it lead me to the the very well written paper called "Modeling Trees with a Space Colonization Algorithm" written by Adam Runions, Brendan Lane, and Przemyslaw Prusinkiewicz.

I decided to try my hand at implementing it and see how far I'd get. It turned out to be very easy to do the basic generation of the tree. As I've not found many people showing how they've implemented the algorithm I decided to dedicate a few blog posts about it.
This first one is just about the node generation.
The next step will be to optimise the output merging branches, then we'll need to generate a model that will allow us to render the tree and finally I need to add in leaves and such.
Finally I need to add more algorithms for creating differently shaped point clouds or maybe even allow for drawing out the shape.
Not all necessarily in that order:)

Source code, Github and framework

I've uploaded and will be maintaining the project to my GitHub page. I'm working on a Macbook Pro at the moment so everything will only be tested on that platform for now. Obviously you'll need XCode but as I'm using makefiles (I'm a little old fashioned) you'll also need to install the command line tools and possible make (I can't remember, I set it all up years ago). Or off course you start a new project in your favourite IDE and just draw the source files in.

As I'm using OpenGL and GLUT is pretty useless once you go beyond OpenGL 2 (on a Mac at least) I've started using GLFW and GLEW. The GitHub repository contains the header files and static universal libraries compiled for Mac OS X. I just added those to make it easier to duplicate the repository and get started but you'd do well to get the original source code and documentation for both. I won't waste any words here on setting up and compiling these (I might be tempted to do so at some point in time) as the documentation is pretty good.

Right now I've not added a math library but just added those classes and those methods I've needed so far. This may change as the project progresses.

OpenGL 1.0

At this stage the project is using OpenGL 1.0 and worse even, using glBegin/glEnd commands. This is simply done because I needed to test the algorithm and it is the quickest way to just output stuff to screen. It won't stay so for much longer though. I haven't decided yet but it is likely I'll use tessellation shaders only available in OpenGL 4 to generate the detail for the mesh. Also as my main engine runs OpenGL 3 code I'll eventually end up changing the code to using proper vertex arrays and buffers.

Attraction Point Cloud

This is the heart of this algorithm. The basic idea behind the algorithm is that you have a cloud of points to which the tree grows. The shape of the point cloud very much dictates the overall look of the tree but inherent in the algorithm is the branching you would expect with vegetation.

The code I currently use to test the algorithm isn't very imaginative in the cloud it creates, I simply randomly generate positions of points within a hemisphere and stretch that to roughly represent the shape of a tree. How its stretched and how the points are organised can be set with parameters to the method that generates the point cloud. The samples in the aforementioned paper use much better algorithms producing better shapes and I plan to enhance this eventually and take it even further then the paper suggests.

Note that for this reason also the sample code animates the algorithm, this algorithm works very well visually to see how it works and see how changes to the parameters effect the outcome. When you use it for real obviously you'd just run the algorithm and work with the finished mesh.

The code to generate the attraction points is very simple:
void treelogic::generateAttractionPoints(unsigned int pNumOfPoints, float pOuterRadius, float pInnerRadius, float pAspect, float pOffsetY, bool pClear) {
  // Seed our randomiser
  srand (time(NULL));

  if (pClear) {
    // Clear any existing points (shouldn't be any..)
    mAttractionPoints.clear();
  };

  // Add random attraction points until we reached our goal
  for (unsigned int i = 0; i < pNumOfPoints; i++) {
    vec3 point;

    // random normalized vector for half a hemisphere
    point.x = randf();
    point.y = randf(0.0f, 1.0f);
    point.z = randf();
    point = point.normalize();

    // Scale it up to a random radius and stretch if needed
    point *= ((pOuterRadius - pInnerRadius) * randf(0.0f, 1.0f)) + pInnerRadius;
    point.y *= pAspect;
    point.y += pOffsetY;

    // and add it to our buffer
    mAttractionPoints.push_back(attractionPoint(point));
  };
};

You can call it a few times consecutively to create slightly more interesting clouds.
And it gives a point cloud like so:
All the green dots are the points that were generated.

Growing the tree

The magic happens in the iterative logic that grows the tree. Notice in the previous screenshot the brown line, that is the start of our tree. The idea is that you can start off with a basic shape to further control the end result. All the algorithm needs is at least one starting vertex at 0,0,0 which is taken care of in the constructor of the class.

For every iteration we check each point in our attraction point cloud and check which vertex within our tree lies closest to that point. If the distance is within a certain threshold (dk in the paper, pMaxDistance in the code below) we take it into account, else we ignore the attraction point for now. In this manner we find all attraction points that are closest to each vertex of the tree (these are our sets). For each vertex of the tree we calculate the average normalised vectors to these points and then branch out in that direction by a specified distance (D in the paper, pBranchSize in the code beow).
Finally, any attraction point that is closer then a specified distance (di in the paper, pCutOffDistance in the code below) to its nearest vertex is discarded.

We repeat this process until all attraction points have been discarded.

The code for our iteration looks like this:
bool treelogic::doIteration(float pMaxDistance, float pBranchSize, float pCutOffDistance, vec3 pBias) {
  unsigned int numVerts = mVertices.size(); // need to know the number of vertices at the start of our process
  unsigned int i, v;
  std::vector<float> numOfAPoints;
  std::vector<vec3> directions;
  
  // init our temporary buffers
  for (v = 0; v < numVerts; v++) {
    numOfAPoints.push_back(0.0);
    directions.push_back(vec3(0.0f, 0.0f, 0.0f));
  };
  
  // find out what our closest vertice to each attraction points is:
  i = 0;
  while (i < mAttractionPoints.size()) { // use a while loop, we'll be removing points..
    attractionPoint point = mAttractionPoints[i];
    
    // start with our current distance for our attraction point
    vec3 delta = mVertices[point.closestVertice] - point.position;
    float currentDistance = delta.length();
    
    // as our vertices haven't moved we only need to check any new vertices
    for (v = mLastNumOfVerts; v < mVertices.size(); v++) {
      delta = mVertices[v] - point.position;
      float distance = delta.length();
      if (distance < currentDistance) {
        // this one is now our closest
        point.closestVertice = v;
        currentDistance = distance;
      };
    };
    
    if (currentDistance < pCutOffDistance) {
      // we're done with this one...
      mAttractionPoints.erase(mAttractionPoints.begin() + i);
    } else {
      // copy back our new closest vertice and advance...
      mAttractionPoints[i].closestVertice = point.closestVertice;
      
      if (currentDistance < pMaxDistance) {
        // count our vertice
        numOfAPoints[point.closestVertice] += 1.0;
        vec3 norm = mAttractionPoints[i].position - mVertices[point.closestVertice];
        directions[point.closestVertice] += norm.normalize();        
      };
      
      // and advance
      i++;
    };
  };
  
  // Update our last number of vertices
  mLastNumOfVerts = numVerts;
  
  // Now check which vertices need to branch out...
  for (v = 0; v < numVerts; v++) {    
    if (numOfAPoints[v] > 0.0) {
      vec3  vert = mVertices[v];
      directions[v] /= numOfAPoints[v];
      directions[v] = directions[v].normalize() * pBranchSize;        
      vert += directions[v] + pBias;
      
      growBranch(v, vert);      
    };
  };
  
  // as long as we still have attraction points left we must still be growing our tree
  return mAttractionPoints.size() > 0; 
};

I've added a few optimisations into this code.
First of all is that I only check new vertices added since the last iteration and remember the closest vertex for the next round. As vertices once generated do not move only new vertices can be closer then the current closest vertex.
Second, instead of building a set of attraction points that are closest to a particular vertex I simply count the number of attraction points and add their normalised vectors together. Then once I'm finished I simply divide the sum of the vectors by the number of points to get the direction in which to grow my branch.

The end result is a nice stick figure tree:


It's a lot of fun experimenting with the parameters and seeing what different shapes to come up with. One of the things I added in the sample code is to generate a second point cloud after the tree has been build to create roots.

More next time....



Tuesday 23 December 2014

Bounding boxes update

So I've been playing around with the city model with mixed results. I've added an algorithm to loading the model to split it into smaller parts where faces are clustered together but it is dreadfully slow and unreliable.

 


That said, it wasn't a total loss, it has proven that the changes to my code have had the desired effect but it does feel like it is the end of the line for using this model for testing things. As the second screenshot shows the bounding boxes are very erratic as some objects have been correctly isolated (like the sand bags or whatever they are), some are included in much larger boxes (like the car) and some bounding boxes seem unrelated all together.

When rendering the entire scene the frame-rate suffers greatly because the number of objects being rendered is so much higher but I'm not interested in that much because I hope to end up in a situation that when we move far enough away we start rendering lower level of detail models and exclude unnecessary detail. Moving to areas where much less objects are on screen we can suddenly see the frame rate jump up even to a comfortable 60fps (to which GLFW seems to be limited).

It sounds like I'll have to move forward my plans of creating a scene editor of sorts. This will allow me to load individual objects such as buildings and placing them and relating them to other objects.



Monday 22 December 2014

Add bounding boxes and use those to exclude things from rendering

I've got my initial bounding box logic working. As I wrote before, the bounding boxes are stored on the node level of my BSP tree. For now I only calculate boxes for those nodes that actually refer to a model that needs to be rendered but the logic is so that branches in the tree can have a bounding box that encapsulates all the leaves inside and prevent a whole bundle of objects from rendering without evaluating the individual objects held within.

Now the bounding boxes are used in two stages.

The first which I've just implemented is a simple and straight forward on-screen test of the bounding box in software.
At this point in time I'm taking a shortcut by applying the following rules:
  1. As soon as one of the 8 vertices of the bounding box is on screen I stop checking and render my model. 
  2. If all of the 8 vertices are behind the camera, skip
  3. if all of the 8 vertices are above/below/to the left or to the right of the screen, skip
  4. Else render
#4 is where I've really cut a corner and I could end up rendering an object that is just off screen. What I should be doing here is check all the inner faces of the bounding box and skip rendering if all inner faces are off screen. I figured however that the one or two objects that are included while they shouldn't don't justify the extra expense as the first 3 points are very easy to check with the following code:
bool isVisible(matrix pModelViewProj) {
  vec3  verts[8];
  vec3  min;
  vec3  max;

  // Apply our matrix to our vertices
  for (int i = 0; i < 8; i++) {
    verts[i] = pModelViewProj * mVertices[i];

    // get the minimum and maximum values.. 
    if (i==0) {
      min = verts[i];
      max = verts[i];
    } else {
      if (min.x > verts[i].x) min.x = verts[i].x;
      if (min.y > verts[i].y) min.y = verts[i].y;
      if (min.z > verts[i].z) min.z = verts[i].z;

      if (max.x < verts[i].x) max.x = verts[i].x;
      if (max.y < verts[i].y) max.y = verts[i].y;
      if (max.z < verts[i].z) max.z = verts[i].z;
    }

    if ((verts[i].x >= -1.0) && (verts[i].x <= 1.0) && (verts[i].y >= -1.0) && (verts[i].y <= 1.0) && (verts[i].z < 1.0)) {
      // no need to check any further, this vertice is on screen!!
      return true;
    };
  };

  if ((max.z < 0.0) || (min.z > 1.0)) {
    // all vertices are behind the camera, no need to check any further
    return false;
  } else if ((max.x < -1.0) || (min.x > 1.0)) {
    // all vertices are to the left or right of the screen, no need to check any further
    return false;
  } else if ((max.y < -1.0) || (min.y > 1.0)) {
    // all vertices are to the top or bottom of the screen, no need to check any further
    return false;
  };

  // We assume any other situation is visible, if our object is somewhere just off a corner
  // we have a false positive
  return true;
};

Later on we'll also use our bounding box to do occlusion testing. Here we will actually attempt to render our bounding box to test if it is behind things already rendered to the screen. But that is for another day.

Unfortunately I won't know how much impact this change will have as I found out how badly my sample city is for the purpose. I wrongly made the assumption buildings would be separated out into separate models but they are not. The breakup is such that each group within the wavefront obj file covers the entire model and 95% of the model is still being rendered even with most off screen.

I'll either have to find a better way to split the model up or abandon this model for testing. I'll have to think some more about this.

Friday 19 December 2014

The hunt for more frames per second starts..

No boring screenshot this time because there is very little to show so far.

As per my earlier post I got to a point where I could do basic rendering of a model using a deferred lighting technique. This means I'm rendering everything to a set of offscreen buffers first and then apply my lighting in a final set of stages.

Currently I am only doing lighting with a single light source (the sun) and apply an atmospheric ray-caster for those pixels on screen that are not covered by any of the polys being rendered. That may not seem too advantageous though with complex scenes it potentially means doing a lot less light calculations. Eventually I'll be adding more light source but that will come when I'm further along and I'll blog more about the technique I'm using.

The model I'm using for testing has nearly 4 million polygons which the hardware I'm using should have no problem with. My starting point however is a measly 10 to 11fps and it drops to 3fps if I recalculate my shadows every frame. Nothing to write home about. In my defence, I'm rendering everything straight up, nothing is sorted, nothing is excluded, everything is rendered, on or off screen, occluded, at way to high detail.

Scratchpad

This is my little scratchpad of steps I plan to follow in the coming days (xmas holidays yeah so hopefully I'll have time to work on this):
1) break up the model into smaller objects
2) add bounding boxes and use those to exclude things from rendering
3) sort those objects that will be rendered by distance to the camera
4) use our bounding boxes to perform occlusion checks for high poly objects that cover a relatively small area
5) use the BSP tree implementation to group together objects to allow excluding objects where the containing node is deemed off screen
6) use LOD to render a low poly version of a high poly object that is far enough away from the camera

Overview

In my engine a single model has a list of vertices (at the moment each consisting of a vector, normal and texture coord) and 1 or more index lists per material used. The index lists are basically 3 indexes into the list of vertices per polygon. A single model can be reused multiple times, it is referenced from a node in a BSP tree. That node holds the model matrix that determines where in the 3d world this model should be rendered. Each node can hold any number of sub nodes who's model matrix positions another model in relation to its parent node. When rendering a house one could think of a structure like this:
- node "house" renders model "house" somewhere down the street
  - subnode "chair 1" renders model "chair" in the living room
  - subnode "chair 2" renders model "chair" in the dining room

Here we can see that a single model "chair" is rendered twice. We can also see a prelude to how our BSP tree can optimise rendering. If our house is off screen, there is no point in checking any of the furniture. Similarly with the LOD system which is another branch of the tree. It is the high poly version of the house that contains the subnodes, the low poly node only references the low poly model and there is no need to traverse the tree any further again skipping the contents of the house.

None of this is very relevant at this time because the model we're using isn't setup to use this hierarchy. At best we'll have a collection of nodes each rendering a model at a certain location but it is a good starting point for the first couple of steps. It isn't until step 5 where we really start needing to group things together in the right way and by the time we reach it, we'll dive into this deeper.

Break up the model into smaller objects

The model I'm using is split into three files each counting around the 1 million poly mark (2 below, 1 above if memory serves me). Each model is further divided into smaller chunks by their material. As a result we'll end up with 3 models in memory each having a large number of vertices and each having a large number of index lists for each "chunk".

As there is no point in trying to do our optimisations on those 3 big models initially our model becomes pretty useless but if we break up the model by chunk we'll get something that we can use.

However these chunks may be smaller then we'd like. One chunk may be too small and represent a roof of a single house. A chunk could also be too large such as the chunk that represent the streets.

Still breaking up our model into a model for each chunk would give us a really nice base to work from. As a result I added functionality in my wavefront object loader to output a model per "material" instead of one big model. I added this as an option because eventually I want to load the models properly and not have this mostly arbitrary division.

The output of the loader is actually a node with a subnode for each model that has been created. Each model is also centered and the matrix of the subnode initialised to move the model into the right spot. This will help greatly later on in building our bounding boxes.

I wasn't expecting any difference as I'm still rendering everything pretty much at random but surprise surprise, it turns out OpenGL doesn't like the large buffers neither and splitting up the model like this brought the frame rate up to 14 to 15 fps.

We've won our first 3 to 4 fps speed increase by a relatively simple change:)

Next time, we'll start adding bounding boxes.

Monday 15 December 2014

Stereoscopic rendering with a geometry shader

**Edit 16 Dec 2014 - Unfortunately doing this late at night makes you miss a beat or two.. turns out the difference between rendering stereo in two passes doesn't give as much difference as I hoped for. So little in fact that it is not worth the exercise. Still in case someone finds it useful I've updated this article with the relevant info. **

While my engine is still running on a slow frame rate, mostly because I'm sending way more then I need to the rendering pipeline, I wanted to run a little experiment.

Something I added a little while ago was support for stereoscopic rendering using a split screen approach. This because it works on my 3D TV.

My engine is running roughly at 10fps with the cityscape model (see my previous post), once I turn on the stereoscopic rendering this drops to below 6fps:

This isn't surprising as all the geometry is rendered twice, once for the left eye, then for the right eye nearly halving the frame rate. It's not exactly half because I'm using a deferred renderer.

The idea that I wanted to try out is offloading the stereoscopic logic into a geometry shader. This would allow me to render everything in one pass and cut out on a lot of overhead in parsing all the models.

The first step I took was to insert a simple passthrough geometry shader in the middle. I was surprised to find out that in normal rendering this actually dropped the frame rate to just over 9fps. I found out later that when I adjust the geometry shader to support both mono and stereo rendering the frame rate dropped further to 8fps.
It is thus worth having a separate shader that you use when rendering stereoscopic.

Now in the code below I'm putting the relevant parts from my shaders and it is a very simple single pass renderer without lighting, I haven't tested this code but it's purely meant as an example.

First up is our vertex shader. This becomes nearly a pass through shader as we're doing our projection in the geometry shader:

#version 330

layout (location=0) in vec3 vertices;
layout (location=1) in vec2 texcoords;

out VS_OUT {
  vec2 T;
} vs_out;

void main(void) {
  gl_Position = vec4(vertices, 1.0);
  vs_out.T = texcoords.st;
}

Then the magic happens in our geometry shader, note that we provide two model-view-projection matrices, one for our left eye and one for the right:

#version 330

layout(triangles) in;
layout(triangle_strip, max_vertices=6) out;

uniform mvp[2];

in VS_OUT {
  vec2 T;
} gs_in[];

out GS_OUT {
  vec2 S;
  vec2 T;
  float eye;
} gs_out;

void main(void) {
  // Emit our left eye triangle
  for (int i=0; i < 3; i++) {
    vec4 P = mvpMat[0] * gl_in[i].gl_Position;
    P.x = (P.x * 0.5) - (P.w * 0.5);

    gl_Position = P;
    gs_out.S = P.xy / P.w;
    gs_out.T = gs_in[i].T;
    gs_out.eye = -1.0;
    EmitVertex();
  }
  EndPrimitive();


  // Emit our right eye triangle
  for (int i=0; i < 3; i++) {
    vec4 P = mvpMat[0] * gl_in[i].gl_Position;
    P.x = (P.x * 0.5) + (P.w * 0.5);

    gl_Position = P;
    gs_out.S = P.xy / P.w;
    gs_out.T = gs_in[i].T;
    gs_out.eye = 1.0;
    EmitVertex();
  }
  EndPrimitive();
}

Note, I had to do three changes here:
- set max_vertices=6, this is not the number vertices for a single primitive but the total number of vertices you potentially emit
- for the left eye I had to subtract 0.5, note also that I multiply this by P.w as the hardware will divide our X by W
- output S instead of relying on gl_FragCoord, looks like its deprecated in our fragment shader

Now we have two triangles being output, one for the left side using the left eye matrix and one for the right. The problem we still stuck with are triangles for the left eye that cross the border into the right half of the screen and visa versa. The clipping done by OpenGL only takes care what is offscreen. We're going to have to add a little extra sauce into our fragment shader:

#version 330

uniform float viewportWidth;
uniform sampler2D texturemap;

in GS_OUT {
  vec2 S;
  vec2 T;
  float eye;
} fs_in;

out vec4 color;

void main(void) {
  // discard our pixel if we're rendering for the wrong eye...
  if ((fs_in.eye < 0.0) && (fs_in.S.x > 0.0)) {
    discard;
  } else if ((fs_in.eye > 0.0) && (fs_in.S.x < 0.0)) {
    discard;
  }

  color = texture(texturemap, fs_in.T.st);
}

And our result?

We are up to 6.0 fps, well that was hardly worth the effort. One could argue it is a nicer way to do the stereo rendering but still.

There is room for improvement however and in that there may lie some merit in continuing with the approach.

At the moment we are sending many triangles for the left eye to the right side of the screen only to discard all the fragments. We could add a relative simple check to see if all vertices of the triangle are "on screen" for that eye. Only if at least one is on screen we output the triangle, else we skip it.

I've also been thinking about using a similar trick to speed up building my shadow maps. In theory we could render all required shadow maps in a single pass. More on that later.

The next step for me is changing the way I've got my VAOs setup and start looking into occlusion checks to speed things up by rendering a lot less of the scene then I am right now.

Saturday 13 December 2014

Slowly putting a 3D engine together

For the past couple of weeks I've been happily tinkering away at my little 3D engine in what little spare time I have. It is getting to the point where all the basic ingredients are there. It is now a matter of expanding and improving it, it'll still take months before it becomes something useful but thats not the point. If I wanted something useful I'd just download unity or unreal, this is all about learning and having fun.

Now I'm not going to write a tutorial on how to build an engine, or to get started with OpenGL or anything like that. There are hundreds of such tutorials out there. I may try and find some time to write about some of the pitfalls I encountered so far so people can avoid those. The biggest issue I've found so far is that OpenGL has evolved immensely over the years (and the same probably applies to Direct X) and so many tutorials out there have become obsolete as a result. You can't see the wood for the trees.

The biggest challenge was switching over to OpenGL 3/4 where the old fixed pipeline has basically been removed. While many tutorials already make heavy use of programmable shaders they often still rely on other parts that are now deprecated or nonfunctional such as the build in projection and view matrices which you need to take care off yourself in OpenGL 3.

The other one I'll mention just in case it helps someone is that code simply will not function unless you use Vertex Array Objects and as this was an optional before OpenGL 3.0 many tutorials skip over that step. It literally took me a few nights staring at a black screen before figuring that one out. Yes I was using VBOs but without the context provided by a VAO...
VAOs are little special btw, they are simply a container of state that allow you to quickly switch between. So instead of selecting the correct vertex buffer, array buffer, and some other nitbits before rendering each model, you setup that state for a VAO associated with your model once and then make it the active VAO before rendering. Without a VAO there is no state and there thus is nothing to render.

For those who wish to start doing things in OpenGL the hard way, I'm using:
GLFW as a cross platform framework (now that GLUT is defunct)
GLEW as an extension manager
I'm using my own math and 3D primitives classes that I've slowly put together over years of tinkering but I don't recommend doing so. A great math library that suits OpenGL fairly well is GLM, I'm tempted to switch as it is far more complete then my own hobby stuff.

The engine itself uses a deferred lighting technique. Together with shadow mapping this means rendering a scene goes through 3 phases:
- rendering the shadow maps
- rendering the geography buffers
- final output with lighting
The deferred lighting technique I'm not using to its fullest yet but that will come soon.

I've also added stereoscopic support currently geared to what my 3D TV is capable off but I hope to get my hands on a RIFT some day (soon).

Basically there are three parts of my engine I'm working on in Parallel.

The first is a terrain system that now runs fully on the GPU based on a height-map. I'll spend a separate blog post on that some time in the near future once its a little further along. I recently learned about Dual Contouring to render more complex terrain and am tempted to explore that avenue though for my purposes my current approach may be more then sufficient and the best bang for buck performance wise.

The second is an atmosfere rendering that takes care of the background. At this point in time it only does mie and rayleigh scattering mostly based on this article about Sky Rendering. The plan is to mix in a night sky, moon and clouds.

The third allows for managing and rendering 3D models. At the heart of the system is a BSP tree with a build in LOD system. Neither of which I'm using yet but they are an essential part of create larger worlds. Again I'll talk about those in more detail when I'm further along.

The focus for me at the moment is once we get to a point where a model at a certain LOD is visible and needs to be rendered. I recently started using a very high poly model of a city which I found here: Damaged Downtown on tf3dm.com
This model is the worst for what it is I want to do, but as a result its perfect to start improving the engine with.
First off this is a model (split into 3 sub models) of an entire city. That means 90% of what is being rendered is invisible.
Second, everything is at full detail regardless of how far away something is.
Third, the texturing on this thing is insane, nearly every face has its own texture coordinates. This isn't a wide problem for the format the model is stored in but for OpenGL it means that each face needs its own unique vertices and no vertices are shared. For nearly 4 million faces that means 12 million vertices. Thats a LOT.

If I would redo this city in a way that my engine is designed to work it would be setup very differently and eventually I'll get there. But for now it serves its purpose and this is where I am at right now:

So yeah, that is 8.4 frames per second on a 2.6 Ghz I7 Retina Macbook Pro. Nothing to write home about. As it is a static scene I also don't need to redo the shadowmaps every frame or the frame rate would be closer to 2 frames per second.

The challenge for the weeks to come is to get this down as close as I can get to 60 fps.