Farkle AI Part 2: Parallelization

My Farkle AI core logic is not fast. I can’t say I really know how much of an impact modern OOP patterns (such as Inversion of Control) have on performance. When I took a graduate class on parallel programming, I distinctly remember completing a project (a system of linear equations solver using the conjugate gradient method) using basic OOP principles. The result was orders of magnitude slower than the instructor’s C-style solution. While this taught me to always remain aware of performance issues within my code, there are many instances when performance doesn’t matter (or when I don’t care). My Farkle AI is a case of the latter.

Despite this fact, I still didn’t want to wait all day when running AI simulations. Therefore, I decided to investigate parallelizing my simulation runner. Plus, this meant I got to use some skills I learned from the aforementioned class. Writing parallel code in OpenCL or CUDA can get quite complicated. This is where the power of .Net really shows. My serial for loop was something like the following:

void SimulateAIGames(int numberOfGames)
{
    for (var n = 0; n < numberOfGames; n++)
    {
        var winner = PlayGame();

        if (winner == 0)
        {
            _player1Wins++;
        }
        else if (winner == 1)
        {
            _player2Wins++;
        }
    }
}

Parallelization in .Net literally took me 2 minutes. (Who knows how long a comparable parallelization would take in OpenMP or OpenCL…)

SimulateAIGamesParallel(int numberOfGames)
{
    Parallel.For(0, numberOfGames, n => RunSimulation(n));
}

void RunSimulation(int n)
{
    var winner = PlayGame();

    if (winner == 0)
    {
        _player1Wins++;
    }
    else if (winner == 1)
    {
        _player2Wins++;
    }    
}

However, all was not well. I immediately noticed that when I ran a simulation with a large number of games, the total number of wins did not equal the number of games played. For a 100,00 game simulation, I was missing results for almost 1000 games! After a quick think, I realized I was having access issues. While I correctly created all my objects for a game within the PlayGame() method (thus making local variables for each thread), I failed to realize that _player1Wins and _player2Wins could have instances when multiple threads would need access. Thus, I would occasionally lose the increment.

My solution was to allocate an int[] of length numberOfGames and assign the winner to each game index. This would eliminate any conflicting memory requests.

void RunSimulation(int n)
{
    var winner = PlayGame();

    _winnerOfGames[n] = winner;   
}

The results of my parallelizations were comforting. I saw the runtime of simulations decrease by roughly 40% (gotta love my dual-core laptop). These results are consistent with Amdahl’s Law. Now it only takes me an hour to run a 1 million game simulation between 2 AIs…