Skip to content
March 6, 2011 / racoonacoon

Engine Optimization Pt2

In my last post (waaay back on the 25th of February) I was hard at work trying to get Claudius to be as smart and efficient as possible. I noted the addition of per object culling as well as the ill-fated debacle that was dynamic batching. All wasn’t all that terrible, however, as working on dynamic batching allowed me to discover some inefficiencies at the core of Claudius, the ObjectRegister. In this post I will mention some of the improvements I made to the Register to get it to run a bit better, and visit the concept of dynamic batching once again.

On to the fun stuff!

In case you’ve forgotten, the performance of Claudius last time was about this:

When looking thoroughly at the ObjectRegister, one aspect in particular caught my eye: a map was being used to store our internal database of items. This wasn’t immediately a red flag, as maps in C++ are decently fast at retrieving objects (O(lgn)), but it was the only possible thing I could see that could slow the thing down. To understand why, we have to take a closer look at how Claudius works internally. Firstly the user creates some objects, be that RenderableObjects to draw to the screen, or UpdatableObjects (such as Timers) that need only update. They then go to Claudius at scene setup and say, “Yo, here is an object.” Claudius now goes to the ObjectRegister and asks it to store it internally. The ObjectRegister gives the object a UniqueID and uses its map to map the ID to the object.

Now let’s imagine it comes time to update the scene. A manager known as the ObjectUpdater handles all the basic update logic for us. Internally it stores a vector of objectIDs that tell it the objects it needs to update. Note that it doesn’t store the object’s themselves, only their identifier assigned by the ObjectRegister. To actually access the objects, then, the ObjectUpdater must ask the ObjectRegister for the object of that specific id. This same basic procedure occurs for the ObjectRenderer as well.

The problem of this is the cost paid when the ObjectRegister looks up the actual object associated with a particular id. The overhead of doing this once is negligible, but doing it many times (say 700 or so) could really add up. So I opted for a different solution. Instead of a map, I would use an array, where the ID of an object is its particular slot in the array. (An object with ID 23 would be at slot 23 in the array for example.) This would provide constant access to items in the register and potentially improve performance.

That small change did wonders to our framerate:

Huzzah! A 20FPS boost just from changing from a O(lgn) map to a O(1) array!

Dynamic Batching

Batching might not be as bad as I originally thought…

On very large levels it seems that dynamic batching actually improves our framerate quite a bit (around 20fps.) It’s only when we have small levels that the penalty overrides the savings. One idea I have been throwing around in my head to improve the efficiency on all counts would be to multithread the thing. It could be something to do, but I am worried of the amount of time doing something like that would take. Thread creation and destruction is an incredibly expensive operation (in regard to real-time environments such as video games), so I would first have to spend the time creating some sort of halfway decent thread pool in order to thread efficiently (C#, I miss thee!) What I most likely will end up doing is just leaving it as is and only worry about threads if I have a bunch of time to burn.

Future Engine Improvements

I have been thinking of ways to further improve performance. One of the ideas that came to mind would be to sort all objects from front to back before rendering them. This would cause distant objects that overlap nearer objects (in screen space) to fail the Z-Test, which could be helpful on large levels with complex effects and many overlapping objects. It would be best to do this on a separate thread sometime before rendering (such at the beginning of update) so that our results would already be ready when render comes. This is obviously more complicated, however, so I don’t know if I want to do this or not.

The next major thing I will likely be adding in is deferred shading. In our latest level we are using the maximum number of lights Claudius supports (16 point lights.) Besides being somewhat slow, larger levels are going to need even more lights, so I have no real choice but implement this deferred shading stuff. Should be fun 🙂



Leave a Comment
  1. jh / Mar 24 2011 4:23 am

    Hash maps would also be a possible way of increasing performance. With a good hash function, hash map lookups are O(1).

    Also, maps would benefit from optimization and inlining being enabled; most debug builds don’t do that, so that will also artificially depress the frame rates compared to a release build.


  1. The Sphere. Finished? « bitwisegames

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: