This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

OneMax problem introduction

The OneMax problem is a simple optimization task, usually used as a genetic algorithm for Hello World. The OneMax task is to find a binary string of a given length and maximize the sum of the digits in that binary string. For example, for an OneMax problem of length 5, the numbers for 10010 add up to the numbers for 2,01110 add up to 3. Obviously, the optimal solution to this problem is a binary string with each digit 1. But for genetic algorithm, it does not have this knowledge, so it needs to use its genetic operator to find the optimal solution. The goal of the algorithm is to find the optimal solution, or at least an approximate optimal solution, in a reasonable time.

Genetic algorithm practice

Before practice, the definition of elements used in genetic algorithm should be clarified.

  • Selection of chromosomes Since the OneMax problem involves binary strings, it is a natural choice for each individual’s chromosome to be represented directly by a binary string representing the candidate solution. In Python, this is implemented as a list containing only 0/1 integer values. The length of the chromosome matches the size of the OneMax problem. For example, for the OneMax problem of size 5, the individual 10010 is represented by the list [1,0,0,1,0].

  • Since the calculation of fitness is to maximize the sum of the numbers in the binary string, and since each individual is represented by a list of 0/1 integer values, fitness can be designed as the sum of the elements in the list, for example: sum([1,0,0,1,0])= 2.

  • Selection of genetic operators There is no uniform standard for selection of genetic operators, usually several selection schemes can be tried to find the most suitable scheme. Selection operators can usually handle any chromosome type, but crossover and mutation operators usually need to match the used chromosome type, otherwise invalid chromosomes may be produced.

    • Selection operator: Tournament selection is selected here.
    • Crossover operator: select single point crossover here.
    • Mutation operator: bit reversal mutation is selected here.
  • Setting a stop condition limiting the number of generations to reproduce is often a good stop condition to ensure that the algorithm doesn’t run forever. Also, since we know the best solution to the OneMax problem (a binary string of all ones, that is, one whose fitness is equal to the length of the list representing individuals), we can use it as another stop condition. However, for most problems in the real world, there is usually no such priori knowledge that can be formulated accurately.

Genetic algorithm element allocation

Before starting the actual genetic algorithm process, it needs to be implemented in code according to the design of the above elements:

  1. First import the package used:
from deap import base
from deap import creator
from deap import tools
import random
import matplotlib.pyplot as plt
Copy the code
  1. Next, declare constants that set the parameters of the OneMax problem and control the behavior of the genetic algorithm:
# hyperparameter declaration
ONE_MAX_LENGTH = 100    #length of bit string to be optimized
POPULATION_SIZE = 200   #number of individuals in population
P_CROSSOVER = 0.9       #probability for crossover
P_MUTATION = 0.1        #probability for mutating an individual
MAX_GENERATION = 50     #max number of generations for stopping condition
Copy the code
  1. Next, useToolboxClass creates the zeroOrOne operation, which is used to customize the random.randomInt (a,b) function. By fixing the arguments A and b to the values 0 and 1, the zeroOrOne operator randomly returns 0 or 1 when this operation is called. The following code snippet is then used to register:
toolbox = base.Toolbox()# Define toolbox variables
toolbox.register("zeroOrOne",random.randint,0.1)# register zeroOrOne
Copy the code
  1. Next, you need to create the Fitness class. Since there is only one goal here — to maximize the sum of numbers, choose the FitnessMax strategy and use a weight tuple with a single positive weight:
creator.create("FitnessMax",base.Fitness,weights=(1.0)),Copy the code
  1. indeapThe “Individual” class is usually used to represent each Individual in the population. Use the Creator tool to create the class, using the list as the base class to represent the individual’s chromosomes. And add the Fitness property to the class, which is initialized toStep 4The FitnessMax class defined in:
creator.create("Individual".list, fitness=creator.FitnessMax)
Copy the code
  1. Next, register the individualCreator operation, which creates an instance of the Individual class and utilizes itStep 1The custom zeroOrOne operation is randomly populated with 0/1. registeredIndividualCreator operationUse the base classInitRepeat operationAnd instantiate the base class with the following parameters:

(1) Use the Creator.Individual class as the container type to place the resulting object, (2) Use the zeroOrOne operation as the function to generate the object, and (3) use the constant ONE_MAX_LENGTH as the number of objects to be generated Because the zeroOrOne operator generates a random 0/1 object, the resulting individualCreator operator populates a single instance with 100 randomly generated zeros or ones:

toolbox.register("individualCreator",tools.initRepeat,creator.Individual,toolbox.zeroOrOne,ONE_MAX_LENGTH)
Copy the code
  1. Finally, register the populationCreator operation for creating individual lists. This definition uses the following parametersInitRepeat Base class operation:

(1) the list class is used as the container type, and (2) the function used to generate the objects in the list — personalCreator operator is not passed here as the last argument to initRepeat — the number of objects to generate. This means that when operating with populationCreator, you need to specify this parameter to determine the number of individuals to be created:

toolbox.register("populationCreator",tools.initRepeat,list,toolbox.individualCreator)
Copy the code
  1. To simplify the calculation of fitness (called evaluation in DEAP), first define a separate function that takes an instance of the Individual class and returns its fitness.

Here we define the oneMaxFitness function to count the number of 1s in an individual.

def oneMaxFitness(individual) :
    return sum(individual),The fitness in # DEAP is represented as a tuple, so when a single value is returned, it needs to be declared as a tuple with a comma.
Copy the code
  1. Next, define the evaluate operator as an alias for the oneMaxfitness() function. It is a DEAP convention to use the evaluate alias to calculate fitness:
toolbox.register("evaluate",oneMaxFitness)
Copy the code
  1. Genetic operators are typically created by aliasing existing functions in the Tools module and setting parameter values as needed. Create a genetic operator based on the elements designed in the previous section.
toolbox.register("select",tools.selTournament,tournsize=3)
toolbox.register("mate",tools.cxOnePoint)
# mutFlipBit iterates through all the characteristics of the individual, and for each eigenvalue,
# will use the indPB parameter value as the probability of flipping (applying the NOT operator) the eigenvalue.
# This value is independent of the mutation probability, which is set by the P_MUTATION constant.
# Mutation probability is used to determine whether the mutFlipBit function is called for a given individual in the population
toolbox.register("mutate",tools.mutFlipBit,indpb=1.0/ONE_MAX_LENGTH)
Copy the code

Evolution of solutions for genetic algorithms

The genetic process is as follows:

  1. The initial population is created by using the populationCreator operation defined previously and taking the constant POPULATION_SIZE as an argument to this operation. And initialize the generationCounter variable to determine the generation number:
population = toolbox.populationCreator(n=POPULATION_SIZE)
generationCounter = 0
Copy the code
  1. To calculate the fitness of each individual in the initial population, useThe map () functionApply the evaluate operation to each individual in the population. Since the EVALUATE operation is an alias for the oneMaxFitness() function, the result of the iteration consists of a calculated fitness tuple for each individual. Convert the result to a list of tuples:
fitnessValues = list(map(toolbox.evaluate,population)
Copy the code
  1. Since the items of fitnessValues separately match the items in population (list of individuals), you can use the zip() function to combine them and assign each individual the appropriate fitness tuple:
for individual,fitnessValue in zip(population,fitnessValues):
	individual.fitness.values = fitnessValue
Copy the code
  1. Next, since the fitness tuple has only one value, the first value is extracted from the fitness of each individual to obtain the statistics:
fitnessValues = [indivalual.fitness.values[0] for individual in population]
Copy the code
  1. The maximum fitness and average fitness of each generation were calculated. Create two lists to store statistics:
maxFitnessValues = []
meanFitnessValues = []
Copy the code
  1. The main preparation of the genetic process has been completed. Stop conditions need to be set during the loop, one stop condition is set by limiting the number of generations, and another stop condition is set by detecting whether the optimal solution (all binary string bits are 1) has been reached:
while max(fitnessValues)<ONE_MAX_LENGTH and generationCounter<MAX_GENERATIONS:
Copy the code
  1. Next update the generational counter:
	generationCounter = generationCounter + 1
Copy the code
  1. The core of genetic algorithm is genetic operator. The first is the Selection operator, which uses tournament selections defined previously with Toolbox. select. Since we have already set the tournament size when defining the operator, we only need to pass the species and its length as arguments to the selection operator:
	offspring = toolbox.select(population,len(population))
Copy the code
  1. The selected individuals are assigned to the Offspring variable and then cloned so that we can apply genetic operators without affecting the original population:

Note. Despite the name offspring, they are still clones of the previous generation, and we still need to pair them using the crossover operator to create the actual offspring.

	offspring = list(map(toolbox.clone,offspring)
Copy the code
  1. The next genetic operator is crossover. Already defined as the toolbox.mate operator in the previous section, and it is just an alias for a single point of intersection. Use Python slices to pair every even index item in the Offspring list with an odd index item as parent. Then, the crossover probability is set by P_CROSSOVER constant. This will determine whether the pair of individuals will cross or stay the same. Finally, delete the fitness values of descendants because their existing fitness is no longer valid:
	for child1,child2 in zip(offspring[::2],offspring[1: :2) :if random.random() < P_CROSSOVER:
			toolbox.mate(child1,child2)
			del child1.fitness.values
			del child2.fitness.values
Copy the code
  1. The final genetic operator is mutation, which was previously registered as toolbox.mutate operator and set up for flip bit mutation operation. All descendant terms are traversed, and the mutation operator is applied with the probability set by the mutation probability constant P_MUTATION. If an individual has a mutation, we make sure to delete the fitness value. Since this value may be inherited from the previous generation and is no longer correct after mutation, it needs to be recalculated:
	for mutant in offspring:
		if random.random() < P_MUTATION:
			toolbox.mutate(mutant)
			del mutant.fitness.values
Copy the code
  1. Individuals without crossover or variation remain the same, so that their existing fitness values (already calculated in the previous generation) do not need to be calculated again. The fitness values of the remaining individuals were empty. Find these new individuals using the valid property of the “Fitness” class and calculate the new Fitness for them in the same way as the original Fitness value was calculated:
	freshIndividuals = [ind for ind in offspring if not ind.fitness.valid]
	freshFitnessValues = list(map(toolbox.evaluate,freshIndividuals))
	for individual,fitnessValue in zip(freshIndividuals,freshFitnessValues):
		individual.fitness.values = fitnessValue
Copy the code
  1. When the genetic operator is complete, the old population can be replaced with a new population:
	population[:] = offspring
Copy the code
  1. Before continuing with the next cycle, the current fitness values are counted to update the statistics using the same method as above:
	fitnessValues = [ind.fitness.values[0] for ind in population]
Copy the code
  1. Get the maximum and average fitness values and add their values to the statistical list:
	maxFitness = max(fitnessValues)
	meanFitness = sum(fitnessValues) / len(population)
	maxFitnessValues.append(maxFItness)
	meanFItnessValues.append(meanFItness)
	print(" -generation {}: Max Fitness = {}, Avg Fitness = {} ".format(generationCounter,
	maxFitness, meanFitness)
Copy the code
  1. In addition, use the resulting maximum fitness value to find the index of the best individual and print out the individual:
	best_index = fitnessValues.index(max(fitnessValues))
	print(" Best Individual = ", *population[best_index], "\n")Copy the code
  1. After the stop conditions are met and the genetic algorithm process is finished, the obtained statistical information can be used to visualize the statistical information in the algorithm execution process by using the Matplotlib library to show the changes of the optimal and average fitness values of individuals in each generation:
PLT. The plot (maxFitnessValues, color = "red") PLT. The plot (meanFitnessValues, color = 'green') PLT, xlabel (' Generation ') Plt. ylabel(' Max/Average Fitness ') PLt. title(' MaxandAverage Fitness over Generation ') plt.show()Copy the code

The complete code for this part is as follows:

def main() :
    population = toolbox.populationCreator(n=POPULATION_SIZE)
    generationCounter = 0
    fitnessValues = list(map(toolbox.evaluate,population))

    for individual,fitnessValue in zip(population,fitnessValues):
        individual.fitness.values = fitnessValue

    fitnessValues = [individual.fitness.values[0] for individual in population]

    maxFitnessValues = []
    meanFitnessValues = []

    while max(fitnessValues) < ONE_MAX_LENGTH and generationCounter < MAX_GENERATION:
        generationCounter = generationCounter + 1
        
        offspring = toolbox.select(population,len(population))
        offspring = list(map(toolbox.clone,offspring))

        for child1,child2 in zip(offspring[::2],offspring[1: :2) :if random.random() < P_CROSSOVER:
                toolbox.mate(child1,child2)
                del child1.fitness.values
                del child2.fitness.values
        
        for mutant in offspring:
            if random.random() < P_MUTATION:
                toolbox.mutate(mutant)
                del mutant.fitness.values
        
        freshIndividuals = [ind for ind in offspring if not ind.fitness.valid]
        freshFitnessValues = list(map(toolbox.evaluate,freshIndividuals))
        for individual,fitnessValue in zip(freshIndividuals,freshFitnessValues):
            individual.fitness.values = fitnessValue
        
        population[:] = offspring

        fitnessValues = [ind.fitness.values[0] for ind in population]

        maxFitnessValue = max(fitnessValues)
        meanFitnessValue = sum(fitnessValues) / len(population)
        maxFitnessValues.append(maxFitnessValue)
        meanFitnessValues.append(meanFitnessValue)
        print("- Generation {}: Max Fitness = {}, Avg Fitness = {}".format(generationCounter,maxFitnessValue,meanFitnessValue))

        best_index = fitnessValues.index(max(fitnessValues))
        print("Best Indivadual = ", *population[best_index],"\n")

    plt.plot(maxFitnessValues,color="red")
    plt.plot(meanFitnessValues,color="green")
    plt.xlabel("Generation")
    plt.ylabel("Max / Average Fitness")
    plt.title("Max and Average fitness over Generation")
    plt.show()
Copy the code

Now it’s time to test our genetic algorithm, running the code to verify that it has found the optimal solution to the OneMax problem.

Algorithm is run

if __name__ == "__main__":
    main()
Copy the code

When you run the program, you can see the program run output:

- Generation 27: Max Fitness = 99.0, Avg Fitness = 96.805 Best Indivadual = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1  Generation 28: Max Fitness = 99.0, Avg Fitness = 97.235 Best Indivadual = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1  Generation 29: Max Fitness = 99.0, Avg Fitness = 97.625 Best Indivadual = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1  Generation 30: Max Fitness = 100.0, Avg Fitness = 98.1 Best Indivadual = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1Copy the code

It can be seen that after 30 generations, the algorithm found all 1 solutions, resulting in a fitness of 100, and stopped the genetic process. The mean fitness was only about 53 at the beginning and nearly 100 at the end. Draw the graph as follows:

The figure illustrates how the maximum fitness and the average fitness increase with algebra.