Friday, 19 December 2014

The hunt for more frames per second starts..

No boring screenshot this time because there is very little to show so far.

As per my earlier post I got to a point where I could do basic rendering of a model using a deferred lighting technique. This means I'm rendering everything to a set of offscreen buffers first and then apply my lighting in a final set of stages.

Currently I am only doing lighting with a single light source (the sun) and apply an atmospheric ray-caster for those pixels on screen that are not covered by any of the polys being rendered. That may not seem too advantageous though with complex scenes it potentially means doing a lot less light calculations. Eventually I'll be adding more light source but that will come when I'm further along and I'll blog more about the technique I'm using.

The model I'm using for testing has nearly 4 million polygons which the hardware I'm using should have no problem with. My starting point however is a measly 10 to 11fps and it drops to 3fps if I recalculate my shadows every frame. Nothing to write home about. In my defence, I'm rendering everything straight up, nothing is sorted, nothing is excluded, everything is rendered, on or off screen, occluded, at way to high detail.

Scratchpad

This is my little scratchpad of steps I plan to follow in the coming days (xmas holidays yeah so hopefully I'll have time to work on this):
1) break up the model into smaller objects
2) add bounding boxes and use those to exclude things from rendering
3) sort those objects that will be rendered by distance to the camera
4) use our bounding boxes to perform occlusion checks for high poly objects that cover a relatively small area
5) use the BSP tree implementation to group together objects to allow excluding objects where the containing node is deemed off screen
6) use LOD to render a low poly version of a high poly object that is far enough away from the camera

Overview

In my engine a single model has a list of vertices (at the moment each consisting of a vector, normal and texture coord) and 1 or more index lists per material used. The index lists are basically 3 indexes into the list of vertices per polygon. A single model can be reused multiple times, it is referenced from a node in a BSP tree. That node holds the model matrix that determines where in the 3d world this model should be rendered. Each node can hold any number of sub nodes who's model matrix positions another model in relation to its parent node. When rendering a house one could think of a structure like this:
- node "house" renders model "house" somewhere down the street
  - subnode "chair 1" renders model "chair" in the living room
  - subnode "chair 2" renders model "chair" in the dining room

Here we can see that a single model "chair" is rendered twice. We can also see a prelude to how our BSP tree can optimise rendering. If our house is off screen, there is no point in checking any of the furniture. Similarly with the LOD system which is another branch of the tree. It is the high poly version of the house that contains the subnodes, the low poly node only references the low poly model and there is no need to traverse the tree any further again skipping the contents of the house.

None of this is very relevant at this time because the model we're using isn't setup to use this hierarchy. At best we'll have a collection of nodes each rendering a model at a certain location but it is a good starting point for the first couple of steps. It isn't until step 5 where we really start needing to group things together in the right way and by the time we reach it, we'll dive into this deeper.

Break up the model into smaller objects

The model I'm using is split into three files each counting around the 1 million poly mark (2 below, 1 above if memory serves me). Each model is further divided into smaller chunks by their material. As a result we'll end up with 3 models in memory each having a large number of vertices and each having a large number of index lists for each "chunk".

As there is no point in trying to do our optimisations on those 3 big models initially our model becomes pretty useless but if we break up the model by chunk we'll get something that we can use.

However these chunks may be smaller then we'd like. One chunk may be too small and represent a roof of a single house. A chunk could also be too large such as the chunk that represent the streets.

Still breaking up our model into a model for each chunk would give us a really nice base to work from. As a result I added functionality in my wavefront object loader to output a model per "material" instead of one big model. I added this as an option because eventually I want to load the models properly and not have this mostly arbitrary division.

The output of the loader is actually a node with a subnode for each model that has been created. Each model is also centered and the matrix of the subnode initialised to move the model into the right spot. This will help greatly later on in building our bounding boxes.

I wasn't expecting any difference as I'm still rendering everything pretty much at random but surprise surprise, it turns out OpenGL doesn't like the large buffers neither and splitting up the model like this brought the frame rate up to 14 to 15 fps.

We've won our first 3 to 4 fps speed increase by a relatively simple change:)

Next time, we'll start adding bounding boxes.

No comments:

Post a Comment