Author Topic: Procedural content generation and unit testing (Read 13945 times)

NON · « **on:** November 07, 2013, 09:58:13 AM »

Hello

I'm in the process of adding unit tests to my game. I'm somewhat new to writing unit tests, and I'm doing it partly for the learning.

At first I started thinking that this will be damn handy for testing the map generation code (by far the hardest and least predictable part I'd say).

One example: generate a big number of levels and check that there's a path from the player to the stairs in each one (this is probably outside the scope of what a unit test should do, but... I want to test this anyway

).

The problem with this is that the tests will not be deterministic. Results will vary between runs. A successful test will not be reliable, and a failed test will be hard to recreate.

To solve this, I suppose you could specify the seed of the RNG (random number generator). Something like this pseudo code:

Code: [Select]

for(int i = 0; i < 1000; i++) {
  rng.seed(i);
  level = makeDungeonLevel();
  runTestsOnLevel(level);
}

This should at least make the tests deterministic.

However I see two problems with this:
(1) You will only test some cases - although maybe a thousand levels is more than enough...
(2) You may set yourself up for a lot of confusion. Suppose your level generation process makes 20 calls to the RNG. So with a fixed seed, those 20 calls will always be the same - all your tests succeed every time, everything seems to work fine. Now suppose after the 5th call to the RNG, you insert another RNG call for a tiny change to the level generation. Then wouldn't RNG call number 6, 7, 8 etc get new values? So you have significantly altered the way the level is created, even with this tiny change. Suppose now that this new level, which was never tested before, breaks the tests. Wouldn't you assume that your change broke the level generation code somehow? But in reality, it was already broken, it just happened to never generate a broken version until now. Perhaps like in (1), this is not a problem if you generate a sufficiently large number of maps.

I also have a question about the code example above, where I use 0 to 999 as seed. Is there any harm in using this series of numbers (the "first" 1000 integers, in a straight row) ? Would I cover more varied scenarios if I use bigger steps in the seed? Or use more "wild" numbers - not a straight series, but something like 39903 28495 391, 394874, 3233, 10824, etc...? I'm thinking that with a good RNG, just using 0 to 999 should be fine, and create varied results. I'm using Mersenne twister, which is supposed to be very high quality.

TheCreator · « **Reply #1 on:** November 07, 2013, 12:33:13 PM »

I'm afraid that unit tests are not suitable for map validation. You're probably right that (1) is not a big problem, but (2) is. I don't think it can be solved. Unit tests must be deterministic.

By the way, for unit tests it is good to have a completely predictable RNG, not only use a constant seed. I mean whenever you ask your RNG to produce a number from 1 to 10, you must be absolutely sure that it will return 1 (or some other number, but you should always know what it will be).

NON · « **Reply #2 on:** November 07, 2013, 03:39:02 PM »

Quote

By the way, for unit tests it is good to have a completely predictable RNG, not only use a constant seed. I mean whenever you ask your RNG to produce a number from 1 to 10, you must be absolutely sure that it will return 1 (or some other number, but you should always know what it will be).

Hm, I can't grasp how one would achieve this. If I have a method I want to test, which does 4 RNG calls, how would I predetermine those values? I suppose I could do some sort of mock-RNG, and provide it with a list of values, which it will just return the first value from on the first call, the second value on the next call, etc. Is this typically how you do this? It seems very fragile - if the order of the RNG calls change, or if calls are added or removed, the tests will break.

I suppose you could separate deterministic code from non-deterministic as much as possible, and only unit test the former.

If I have a function which should transform a room (make it plus-shaped, or put pillars inside etc), then I could split it in two functions - one which picks random parameters, and another which takes those parameters as arguments and transforms the room in a deterministic way. Then I need to run tests on the deterministic function that covers the necessary cases.
Verification of valid wall placement etc should probably be as much as possible in the deterministic function. The other function should just provide random parameters without considering anything (for example it could give how wide the plus shape should be, in percent of the total room width). Agreed?

Kevin Granade · « **Reply #3 on:** November 07, 2013, 05:00:11 PM »

You absolutely should test your map generation, but that's a system test, which you should ALSO be running. Unit tests by their nature cannot test everything about your program. For this scenario, just add some "system tests" that you run at the same time as your unit tests that exercise the larger-scale, nondeterministic parts of the system. It's sufficient to generate a random seed* for your system test, and store that in case it fails. Yes you can never get total coverage with this, but that's not what testing does, it just increases confidence in the code, and tries to catch breakage as soon as possible.

So for example you could do something like:

Code: [Select]

long test_map_generation(long seed) {
    // Generate a map
    generate_map(seed);
    // Run some tests on the map
    if( test_map() ) {
         // negative value, assuming our seed is positive as it would be if it's from random()
        return -1;
    }
    return seed;
}

run_tests() {
    // Unit tests and such
    // Now for a system test
   srandom( time() );
   long map_generation_result = test_map_generation( random() );
   if( map_generation_result >= 0 ) {
      //Print seed and failure message.
   }
}

A failing test would report the seed used, and you could manually pass that seed to your map generation method in the real game, another test harness or with a debugger to try and figure out why it's failing.

*An aside about rngs. What we generally call a rng, specifically the posix random() call, is not in fact an rng but a prng. When you seed it with srandom(), it initializes the state with the number you provided. The sequence of numbers returned by random() is in fact deterministic relative to the initial seed, so as long as there is no additional source of randomness in your test code, the test code should be deterministic relative to whatever random() was seeded with when you started. Traditionally that's the output of time(), "real" randomness is another thing entirely, and IMO is unnecessary for games.

jlund3 · « **Reply #4 on:** November 07, 2013, 05:11:39 PM »

Personally, I think your method of testing is fine. I have a suite of tests which I use to ensure that maps are fully connected, have the proper ratio of floor to wall, have roughly the correct features, etc. Were it up to me, I would just run the tests with a random seed, rather than a preset. This over time as I run the test thousands upon thousands of times, each run with a different seed, I can become more and more certain that the code does what I expect it to (this is called amplification of stochastic advantage). Furthermore, I would record the state of the rng at the start of each test so that if a test happens to fail, I can replicate the failure case, and figure out why after thousands of successful runs, I finally found the odd corner case in my generation algorithm.

As far as your question about whether using seeds in the range [0, 999] is okay, the answer is almost certainly. To illustrate, lets suppose you and a crappy rng with a really small state space, say in the range [0,9]. If we seed it with a 0, we might generate the sequence (and I'm just picking this out of the air) <0,2,6,5,3,1,7,8,4,9>. Once we reach the end of that sequence, it would start over at 0. Now suppose we seeded with a 1. With our hypothetical rng, we would get the sequence <1,7,8,4,9,0,2,6,5,3>. With this small of a state space, it is easy to see where the second sequence matches the first. However, your mersenne twister has an extremely large state space. For example, the most common implementation of the twister (MT19937) has a state space of 2^19937 - 1, which is roughly 4.3x10^66207. Note that the number of atoms in the observable universe is estimated at around 10^80. In other words, you are almost certainly fine using sequential seeds. The only question remaining is whether you want to trust your non-deterministic tests to just 1000 pre-set cases, or if you would rather be able to have increasing confidence over time as you run on more and more test cases.

NON · « **Reply #5 on:** November 07, 2013, 05:50:13 PM »

@Kevin Granade & jlund3:
Storing the seed for later verification of a failed test definitely sounds better than using a fixed list of seeds! I'll cover far more cases, and I won't have the trouble like I described in (2).

I'll go ahead and make map generation tests in this way.

Quote from: jlund3 on November 07, 2013, 05:11:39 PM

For example, the most common implementation of the twister (MT19937) has a state space of 2^19937 - 1, which is roughly 4.3x10^66207. Note that the number of atoms in the observable universe is estimated at around 10^80.

This made me laugh

Krice · « **Reply #6 on:** November 09, 2013, 11:38:13 AM »

You should be able to check your generation routines without random numbers. After all random numbers are just a selection of static data and the real issue is how good your level generation routines are. If the rules allow strange things happen, they will happen at some point.

News: Read the RULES before posting.

Author Topic: Procedural content generation and unit testing (Read 13945 times)

NON

Procedural content generation and unit testing

TheCreator

Re: Procedural content generation and unit testing

NON

Re: Procedural content generation and unit testing

Kevin Granade

Re: Procedural content generation and unit testing

jlund3

Re: Procedural content generation and unit testing

NON

Re: Procedural content generation and unit testing

Krice

Re: Procedural content generation and unit testing