September 2013

Volume 28 Number 9

Test Run - Multi-Swarm Optimization

By James McCaffrey

James McCaffreyMulti-swarm optimization (MSO) is a technique for estimating the solution to difficult or impossible numerical problems. It’s a variation of particle swarm optimization (see my article on the subject at msdn.microsoft.com/magazine/hh335067). Regular particle swarm optimization models flocking behavior, such as that seen in groups of birds and schools of fish. MSO extends particle swarm optimization by using several swarms of simulated particles rather than a single swarm.

MSO can be applied to several machine-learning scenarios, such as estimating the weights and bias values for an artificial neural network or estimating the weights of weak learners in ensemble classification and prediction. MSO is a meta-heuristic, meaning that the technique is really a set of design principles and guidelines that can be used to construct a specific algorithm to solve a specific optimization problem.

A good way to understand multi-swarm optimization is to examine the demo program in Figure 1 and the graph in Figure 2. The demo program is estimating the solution to a standard benchmark numerical optimization problem called Rastrigin’s function. The goal is to find the values of x0 and x1 that minimize the function:

Rastrigin's equation

Multi-Swarm Optimization Demo
Figure 1 Multi-Swarm Optimization Demo

Rastrigin’s Function
Figure 2 Rastrigin’s Function

The function has a known solution of f = 0.0 when x0 = 0.0 and x1 = 0.0, so the use of MSO isn’t really necessary in this situation. The graph in Figure 2 shows Rastrigin’s function. Though the function has many peaks and valleys representing false solutions, there’s only one global minimum at the center of the image. The multiple close-but-not-quite solutions of Rastrigin’s function are deliberate and are designed to cause trouble for optimization algorithms.

The demo program in Figure 1 begins by setting up some input parameters for the MSO algorithm. There are three swarms, and each swarm has four particles. In most numerical optimization problems, the range of possible values is limited in some way. Here, the search space is initially limited to x values between -100.0 and +100.0.

The demo program initializes each of the 12 particles to random (x0, x1) values. Each pair of values represents a particle position that can also be interpreted as a possible solution. The value of Rastrigin’s function at each position is called the cost of the position, to suggest that the goal is to minimize the function. After initialization, the best particle (the one with the smallest cost) is zero-based index particle 0 in swarm 2. That particle has position [-40.57, 28.54] and associated cost 2498.93.

Multi-swarm optimization is an iterative process. The demo program sets an arbitrary maximum loop value of 150. The MSO algorithm then searches for better solutions, keeping track of the best solution found by any of the 12 particles. At the end of 150 iterations, the best solution found was f = 0.000043 at x0 = -0.0003, x1 = 0.0004, which is very close to but not quite the actual solution. The demo program can in fact find the actual solution by setting the maxLoop variable to 500, but it’s important to remember that in most realistic problem scenarios you won’t know if MSO has found the optimal solution.

This article assumes you have at least intermediate-level programming skills but doesn’t assume you know anything about multi-swarm optimization. The demo program is coded in C# but you shouldn’t have too much trouble refactoring to another language. In order to keep the size of the code small and the main ideas clear, I’ve removed most of the normal error checking from the demo code. The demo is too long to present in its entirety in this article, but the entire source code is available at msdn.microsoft.com/magazine/msdnmag0913.aspx.

Overall Program Structure

The structure of the demo program, with some minor edits and most of the WriteLine statements removed, is presented in Figure 3.

Figure 3 Multi-Swarm Demo Program Structure

using System;
namespace MultiSwarm
{
  class MultiSwarmProgram
  {
    static void Main(string[] args)
    {
      try
      {
        Console.WriteLine(
          "\nBegin Multiple Particle Swarm optimization demo\n");
        int dim = 2;
        double minX = -100.0;
        double maxX = 100.0;
        int numParticles = 4; // Particles in each swarm
        int numSwarms = 3; // Swarms in multi-swarm
        Multiswarm ms = new Multiswarm(numSwarms, numParticles, 
          dim, minX, maxX);
        Console.WriteLine("\nInitial multiswarm:");
        Console.WriteLine(ms.ToString());
        int maxLoop = 150;
        ms.Solve(maxLoop);
        Console.WriteLine("\nFinal multiswarm:");
        Console.WriteLine(ms.ToString());
        Console.WriteLine("\nBest solution found = " +
          ms.bestMultiCost.ToString("F6"));
        Console.Write("at x0 = " + ms.bestMultiPos[0].ToString("F4"));
        Console.WriteLine(", x1 = " 
          + ms.bestMultiPos[1].ToString("F4"));
        Console.WriteLine("\nEnd demo\n");
        Console.ReadLine();
      }
      catch (Exception ex)
      {
        Console.WriteLine(ex.Message);
        Console.ReadLine();
      }
    }
    public static double Cost(double[] position) { . . }
  } // Program
  public class Particle { . . }
  public class Swarm { . . }
  public class Multiswarm { . . }
} // ns

To create the demo program I used Visual Studio 2012. The demo has no significant dependencies, and any version of Visual Studio should work. I selected the C# console application template and named the project MultiSwarm. After Visual Studio loaded the code template, in the Solution Explorer window I renamed the Program.cs file to MultiSwarmProgram.cs and Visual Studio automatically renamed the program class for me. At the top of the source code I deleted all unneeded namespace references, leaving just the reference to the System namespace.

The demo program defines a public-scope Cost method, which is Rastrigin’s function:

public static double Cost(double[] position)
{
  double result = 0.0;
  for (int i = 0; i < position.Length; ++i) {
    double xi = position[i];
    result += (xi * xi) - (10 * Math.Cos(2 * Math.PI * xi)) + 10;
  }
  return result;
}

The Cost method accepts a single array parameter that represents a particle’s position, which is a possible solution. In most machine-learning scenarios, the Cost function you’re trying to minimize represents some form of error and will require additional parameters, such as a training data source. The Cost function is usually the performance bottleneck for MSO, so you should code it to be as efficient as possible.

The Multiswarm class encapsulates the MSO algorithm. Classes Particle and Swarm are helper classes for class Multiswarm. An alternative design in languages that support nested class definitions is to define classes Particle and Swarm inside class Multiswarm.

The demo program begins by setting up five input parameters:

int dim = 2;
double minX = -100.0;
double maxX = 100.0;
int numParticles = 4;
int numSwarms = 3;

Variable dim represents the number of dimensions in the problem to be solved. Because Rastrigin’s function accepts x0 and x1, the value of dim is set to 2. Multi-swarm optimization is well-suited for problems with any number of dimensions. Variables minX and maxX constrain the search to a limited range of values. The values for minX and maxX will vary from problem to problem. Variable numParticles defines the number of particles that are in each swarm. Variable numSwarms defines the total number of swarms. In general, larger values of numParticles and numSwarms produce more accurate solutions at the expense of performance.

Next, the primary multi-swarm object is instantiated and the MSO solving algorithm is called:

Multiswarm ms = new Multiswarm(numSwarms, numParticles, 
  dim, minX, maxX);
Console.WriteLine("\nInitial multiswarm:");
Console.WriteLine(ms.ToString());
int maxLoop = 150;
ms.Solve(maxLoop);

The demo calls a collection of particles a “swarm,” and a collection of swarms a “multi-swarm.” Instead of this naming scheme, some research literature calls a collection of particles a “sub-swarm” and a collection of sub-swarms a “swarm.” The Multiswarm object is instantiated using the previously defined input parameters. Variable maxLoop holds the maximum number of times the main solving algorithm loop will iterate. In general, larger values of maxLoop will produce more accurate solutions at the expense of performance.

The Solve method iteratively searches for the best solution to Rastrigin’s function. Notice that the Cost function is defined externally to the Multiswarm object. In most situations, MSO code is integrated into a software system rather than implemented as a class library DLL, which requires you to pass the Cost function into the Multiswarm object (via a delegate, for example) or use an interface-design approach to define a programming contract.

After the Solve method finishes executing, the final state of the Multiswarm object is displayed, and the best solution found by any particle in any swarm in the multi-swarm is explicitly displayed:

Console.WriteLine("\nFinal multiswarm:");
Console.WriteLine(ms.ToString());
Console.WriteLine("\nBest solution found = " +
 ms.bestMultiCost.ToString("F6"));
Console.Write("at x0 = " + ms.bestMultiPos[0].ToString("F4"));
Console.WriteLine(", x1 = " + ms.bestMultiPos[1].ToString("F4"));

Particles

The Particle class has six members:

static Random ran = new Random(0);
public double[] position;
public double[] velocity;
public double cost;
public double[] bestPartPos;
public double bestPartCost;

I prefer to use public scope for simplicity but you may want to use private scope along with get and set property methods. The Random object is used by the constructor to initialize a Particle object to a random position. The array named position represents the position of a particle. Array velocity represents the speed and direction for a particle. For example, suppose that a particle is at position [12.0, 24.0] and the velocity is [5.0, 0.0]. This can be interpreted to mean that during the next time increment the particle will move 5.0 units along the x0 dimension and 0.0 units along the x1 dimension. After the particle moves, its new position will be [17.0, 24.0].

Variable cost holds the value of the Cost function at the current position. Variable bestPartCost holds the best (smallest) cost value that a particle ever encountered, and variable bestPartPos is the position where the best-known cost was found.

The Particle constructor is defined in Figure 4.

Figure 4 The Particle Constructor

public Particle(int dim, double minX, double maxX)
{
  position = new double[dim];
  velocity = new double[dim];
  bestPartPos = new double[dim];
  for (int i = 0; i < dim; ++i) {
    position[i] = (maxX - minX) * ran.NextDouble() + minX;
    velocity[i] = (maxX - minX) * ran.NextDouble() + minX;
  }
  cost = MultiSwarmProgram.Cost(position);
  bestPartCost = cost;
  Array.Copy(position, bestPartPos, dim);
}

The constructor allocates space for the position, velocity and bestPartPos arrays based on the number of problem dimensions. Each position and velocity cell is assigned a random value between minX and maxX. The cost of the initial position is computed. The particle’s best-known position and cost are set to the initial position and cost.

A significant alternative approach is to assign non-random initial positions to each particle. Some MSO research literature suggests that assigning particles in different swarms to different regions of the search space is superior to a random-assignment approach. For example, if you had two swarms with 10 particles each, the 10 particles in the first swarm could be assigned to positions with x-values between -100.0 and 0.0, and the 10 particles in the second swarm to positions with x-values between 0.0 and +100.0. I’m not entirely convinced by these research results, however, and simple random particle position assignment has worked well for me in practice.

Swarms

A swarm is a collection of particles. The Swarm class has three members:

public Particle[] particles;
public double[] bestSwarmPos;
public double bestSwarmCost;

Naming can be a bit tricky with MSO. I name the array of Particle objects in the Swarm class as “particles,” but you might want to use the name “swarm” instead. Member variable bestSwarmCost holds the best (smallest) cost found by any of the particles in the swarm during algorithm execution. Array bestSwarmPos holds the position where this best swarm-member cost was found.

The Swarm constructor is shown in Figure 5.

Figure 5 The Swarm Constructor

public Swarm(int numParticles, int dim, double minX, double maxX)
{
  bestSwarmCost = double.MaxValue;
  bestSwarmPos = new double[dim];
  particles = new Particle[numParticles];
  for (int i = 0; i < particles.Length; ++i) {
    particles[i] = new Particle(dim, minX, maxX);
    if (particles[i].cost < bestSwarmCost) {
      bestSwarmCost = particles[i].cost;
      Array.Copy(particles[i].position, bestSwarmPos, dim);
    }
  }
}

The Swarm constructor allocates space, then calls the Particle constructor numParticles times to generate random-position particles. As each particle is created, it’s checked to see if it has the best cost of any of the particles in the swarm.

The Multiswarm Class

A multi-swarm is a collection of swarms. The top-level Multiswarm class has seven members:

public Swarm[] swarms;
public double[] bestMultiPos;
public double bestMultiCost;
public int dim;
public double minX;
public double maxX;
static Random ran = new Random(0);

Array swarms holds each Swarm object, each of which is a collection of Particle objects. So swarms[2].particles[3].position[0] represents the x0-value for particle 3 in swarm 2.

Member variable bestMultiCost holds the best cost found by any particle in any swarm during algorithm execution. Array bestMultiPos holds the associated position where the best global cost was found. Member variables dim, minX and maxX are stored for convenience so their values can be used by class methods without being passed as parameters.

Recall that class Particle has a Random object named ran that’s used to generate initial random positions. Class Multiswarm has a different Random object that’s used by the MSO algorithm to insert pseudo-random behavior during the solve process.

The Multiswarm constructor is listed in Figure 6.

Figure 6 The Multiswarm Constructor

public Multiswarm(int numSwarms, int numParticles, int dim,
  double minX, double maxX)
{
  swarms = new Swarm[numSwarms];
  bestMultiPos = new double[dim];
  bestMultiCost = double.MaxValue;
  this.dim = dim;
  this.minX = minX;
  this.maxX = maxX;
  for (int i = 0; i < numSwarms; ++i)
  {
    swarms[i] = new Swarm(numParticles, dim, minX, maxX);
    if (swarms[i].bestSwarmCost < bestMultiCost)
    {
      bestMultiCost = swarms[i].bestSwarmCost;
      Array.Copy(swarms[i].bestSwarmPos, bestMultiPos, dim);
    }
  }
}

After allocating arrays and saving input parameter values, the Multiswarm constructor calls the Swarm constructor numSwarms times. As each swarm is created, the best cost of any particle within that swarm is checked to see if it’s a global best cost. If so, that cost and its associated position are stored into bestMultiCost and bestMultiPos, respectively.

The MSO Algorithm

In very high-level pseudo-code, the basic MSO algorithm is:

loop maxLoop times
  for each swarm
    for each particle
      compute new velocity
      use velocity to update position
      check if a new best cost has been found
    end for
  end for
end loop

The key operation in MSO is computing a new velocity for a particle. A new velocity for a given particle is influenced by the current velocity, the current position, the best-known position of the particle, the best-known position of any particle in the same swarm as the particle, and the best-known position of any particle in any swarm. In math terms, the new velocity is:

v(t+1) = w * v(t) +
         (c1 * r1) * (p(t) - x(t)) +
         (c2 * r2) * (s(t) - x(t)) +
         (c3 * r3) * (m(t) - x(t))

After a new velocity has been computed, a particle’s new position is:

x(t+1) = x(t) + v(t+1)

Bear with me for a moment. The computation is much simpler than it first appears. The term v(t+1) means the velocity at time t+1, in other words, the new velocity. Term v(t) is the current velocity. Term x(t) is the current position. Notice that x and v are in bold, which indicates they’re vectors such as [12.0, 25.0] rather than single values.

Term p(t) is a particle’s best-known position. Term s(t) is the best position of any particle in the particle’s swarm. Term m(t) is the best position of any particle in any swarm.

Term w is a constant called the inertia factor. Terms c1, c2 and c3 are constants that establish a maximum change for each component of the new velocity. Terms r1, r2, and r3 are random values between 0 and 1 that provide a randomization effect to each velocity update.

The new velocity computation is probably best explained by example. Suppose a particle is currently at [12.0, 24.0] and its current velocity is [-1.0, -3.0]. Also, the best-known position of the particle is [8.0, 10.0], the best-known position of any particle in the swarm is [7.0, 9.0], and the best-known position of any particle in any swarm is [5.0, 6.0]. And suppose that constant w has value 0.7, constants c1 and c2 are both 1.4, and constant c3 is 0.4. Finally, suppose random values r1, r2 and r3 are all 0.2.

The new velocity of the particle is shown in Figure 7.

Figure 7 Computing the New Velocity of a Particle

v(t+1) = 0.7         * [-1.0, -3.0] +
         (1.4 * 0.2) * [8.0, 10.0] - [12.0, 24.0] +
         (1.4 * 0.2) * [7.0, 9.0] - [12.0, 24.0] +
         (0.4 * 0.2) * [5.0, 6.0] - [12.0, 24.0]
       = 0.7 * [-1.0, -3.0] +
         0.3 * [-4.0, -14.0] +
         0.3 * [-5.0, -15.0] +
         0.1 * [-7.0, -18.0]
       = [-0.7, -2.1] +
         [-1.2, -4.2] +
         [-1.5, -4.5] +
         [-0.7, -1.8]
       = [-4.1, -12.6]

And so, according to Figure 7, the particle’s new position is:

x(t+1) = [12.0, 24.0] + [-4.1, -12.6]
       = [7.9, 11.4]

Assuming the optimal solution is [0.0, 0.0], as is the case with Rastrigin’s function, notice that the particle has moved from its original position to a new position that’s closer to the optimal solution.

The inertia term in v(t+1) encourages a particle to continue moving in its current direction. The p(t) term encourages a particle to move toward its historical best-known position. The s(t) term encourages a particle to move toward the best-known position found by any of the particle’s swarm-mates. The m(t) term encourages a particle to move toward the best-known position found by any particle in any swarm.

Constants c1, c2, and c3 are sometimes called the cognitive, social, and global weights. Those constants, along with random values r1, r2, and r3, and the inertia weight w, determine how much each term influences the motion of a particle. Some research in regular particle swarm optimization suggests reasonable values for w, c1, and c2 are 0.729, 1.49445, and 1.49445, respectively. There’s little research on the c3 constant in MSO, but I typically use 0.3645 (half of the inertia weight), and this has worked well for me in practice.

Death and Immigration

There are several fascinating ways to modify the basic MSO algorithm. One possibility is to essentially kill a randomly selected particle every now and then, and then give birth to a new particle. The demo program uses this death-birth modification. The first few lines of the Solve method are shown in Figure 8.

Figure 8 The First Lines of the Solve Method

public void Solve(int maxLoop)
{
  // Assign values to ct, w, c1, c2, c3
  double death = 0.005; ; // Prob of death
  while (ct < maxLoop)
  {
    ++ct;
    for (int i = 0; i < swarms.Length; ++i) {
      for (int j = 0; j < swarms[i].particles.Length; ++j) {
        double p = ran.NextDouble();
        if (p < death)
          swarms[i].particles[j] = new Particle(dim, minX, maxX);
        for (int k = 0; k < dim; ++k) {
...

The method generates a random value between 0 and 1 and stores it into p. If the random value is less than 0.005, the current particle is re-instantiated by calling the Particle constructor, effectively killing the current particle and giving birth to a new particle at a random location.

Another MSO option is to model immigration by periodically taking two particles in different swarms and exchanging them. One particle effectively immigrates into the current swarm and another particle emigrates out of the swarm. The demo program contains this option. The key method is:

private void Immigration(int i, int j)
{
  // Swap particle j in swarm i
  // with a random particle in a random swarm
  int otheri = ran.Next(0, swarms.Length);
  int otherj = ran.Next(0, swarms[0].particles.Length);
  Particle tmp = swarms[i].particles[j];
  swarms[i].particles[j] = swarms[otheri].particles[otherj];
  swarms[otheri].particles[otherj] = tmp;
}

The method is simple and allows the undesirable possibility that a particle might be exchanged with itself. The immigration feature is called inside method Solve like so:

double immigrate = 0.005;
...
double q = ran.NextDouble();
if (q < immigrate)
  Immigration(i, j);

Wrapping Up

The explanation presented in this article and the accompanying code download should give you a solid foundation for experi­menting with multi-swarm optimization. Compared to regular particle swarm optimization, multi-swarm optimization is only slightly more complex, and tends to produce better-quality results, though it is slower. My experience with MSO suggests that it tends to handle difficult optimization problems—those with many local minima—better than regular particle swarm optimization.

MSO is a general-purpose numerical optimization meta-heuristic and is typically used as part of a larger machine-learning scenario in order to find a set of weights that minimize some sort of error function. For example, MSO can be used to find the best set of weights and bias values for a neural network by minimizing classification errors on a set of training data. There are many alternatives to MSO, including evolutionary optimization algorithms (also called real-valued genetic algorithms), bacterial foraging optimization and amoeba method optimization.

Compared to most alternative general-purpose numerical optimization approaches, in my opinion MSO is simpler to implement and easier to customize. A disadvantage of MSO compared to some alternatives is that in general there are few guidelines for selecting the value of MSO-free parameters, including the inertia weight constants and the cognitive, social and global weight constants. That said, however, I’m a big fan of MSO and it has performed extremely well for me on occasions when I needed to solve a numerical optimization problem in my software systems.


Dr. James McCaffrey works for Microsoft at the Redmond, Wash., campus. He has worked on several Microsoft products including Internet Explorer and MSN Search. He’s the author of “.NET Test Automation Recipes” (Apress, 2006), and can be reached at jammc@microsoft.com.

Thanks to the following technical expert for reviewing this article: Kirk Olynyk (Microsoft)