TestsTested | ✓ |
LangLanguage | Obj-CObjective C |
License | MIT |
ReleasedLast Release | Apr 2015 |
Maintained by Mike Amaral.
Evolve is a customizable evolutionary algorithm engine with the following features:
0123456789ABCDEF
and the length to be 6
, so you can easily translate a gene sequence like FF0000
to the color red. You might want your domain to be a binary string, 01
with the length 6
, like 110100
, where the first two digits represent the speed of your organism, the next two represent the strength of your organism, and the last two represent how well they're camouflaged. You define what the genome is and how it manifests itself in the wild, the possibilities are endless. Be creative!Population
of Organisms
with either of two methods:
Population
, defining the number of Organisms
, and the characteristics of their Genome
, and the engine creates the initial Population
for you.Organisms
for your initial population, defining the characteristics of their Genome
.EvolutionManager
to handle running the underlying simulation, and set your appropriate delegate
conforming to the <EvolutionDelegate>
protocol. (Currently one required method.)EvolutionManager
the Population
you just created.Organism
in your Population
.EvolutionManager
to proceed with the selection and breeding process.EvolutionManager
has completed the selection process, it will pass back to you information about the generation it just processed, as well as an updated Population
for the next generation.Available via CocoaPods
pod ‘Evolve’
"I don't know who it was first pointed out that, given enough time, a monkey bashing away at random on a typewriter could produce all the works of Shakespeare. The operative phrase is, of course, given enough time. Let us limit the task facing our monkey somewhat. Suppose that he has to produce, not the complete works of Shakespeare but just the short sentence 'Methinks it is like a weasel', and we shall make it relatively easy by giving him a typewriter with a restricted keyboard, one with just the 26 (capital) letters, and a space bar. How long will he take to write this one little sentence?" - Richard Dawkins
*Check out the included demo program for the full implementation of this.
The EvolutionManager
handles all of the simulation mechanics, and is initialized with a starting Population
object. A Population
can be seeded with a group of Organism
objects, or as is the case with this example, randomly generated given specific parameters:
size
is the number of organisms that are alive in any given generation, and will remain constant throughout the simulation.geneSequenceLength
is the length of each organism's gene sequence, or chromosome, represented as an NSString
.genomeDomain
is an NSString
that represents all possible characters that can be used in the organism's gene sequence.Example: If you want an organism's genetic code to be represented as a 5-digit binary string, you would set the domain to
@"01"
and the length to5
. A randomly generated gene sequence from this type of organism might look like01001
, or11100
.
In this example, we create our population with 50 organisms, a gene sequence length equal to the length of the target string, METHINKS IT IS LIKE A WEASEL
, and the domain includes all 26 capital letters as well as the space character. We then create our evolution manager with this population, and set ourself as the delegate, ensuring we're conforming to the <EvolutionDelegate>
protocol.
static NSString * const kTargetString = @"METHINKS IT IS LIKE A WEASEL";
static NSString * const kTargetDomain = @"ABCDEFGHIJKLMNOPQRSTUVWXYZ ";
Population *startingPopulation = [[Population alloc] initRandomPopulationWithSize:50 geneSequenceLength:kTargetString.length genomeDomain:kTargetDomain];
self.evolutionManager = [[EvolutionManager alloc] initWithPopulation:startingPopulation];
self.evolutionManager.delegate = self;
Now that the initial population has been created, the next step is to evaluate the fitness of each organism. In the case of this engine, which is ultimately agnostic to the organism's phenotype and environment, it is the implementer's job to determine the fitness of each organism every generation.
In the case of the Weasel Program, the fitness of each organism is represented simply as the number of characters that match the target string in the appropriate position:
Genetic Sequence | Fitness |
---|---|
HFUR FHERYDKE DPZUNF DJQJEHA |
0 |
MFUR FHE YDKE DPZUNFD JQJSHL |
5 |
METHINKE FHEIS DPZUDD JQJSEL |
15 |
METHINKS IT IS LIKE A WEASEL |
28 |
This could be implemented as such:
- (void)evaluateFitnessForPopulation {
NSArray *organisms = self.evolutionManager.population.organisms;
for (Organism *organism in organisms) {
NSString *genomeString = organism.genome.sequence;
NSInteger geneSequenceLength = genomeString.length;
NSInteger correctCharacters = 0;
for (NSInteger charIndex = 0; charIndex < geneSequenceLength; charIndex++) {
if ([genomeString characterAtIndex:charIndex] == [kTargetString characterAtIndex:charIndex]) {
correctCharacters++;
}
}
organism.fitness = correctCharacters;
}
}
Selection is the process by which organisms are chosen for breeding the next generation. In general, the more fit an organism is, the more likely it is to be selected. Once you have established the fitness for all organisms in the population, the selection process is initiated as such:
[self.evolutionManager proceedWithSelectionAndBreeding];
This algorithm utilizes a tournament selection method for choosing organisms to breed. Subsets of the population are chosen at random to compete for the chance to breed, with the fittest organism of this subset being selected as a parent. The "winners" are not removed from the pool of competitors, and thus it is possible for an organism to parent multiple offspring.
The size of each tournament can be customized by setting the tournamentSize
property on the EvolutionManager
. Experiment with different values here to see how this effects your population. Having a tournament size of 1
is essentially random selection, which would likely lead to slow improvements in overall population fitness generation-over-generation as fitter organisms are given no preference during selection, whereas having a high tournament size increases the mating chances of the most fit organisms and thus decreases mating chances for less-fit organisms, resulting in less diversity and potentially premature convergence.
Separate from the parent selection process, the population is sorted based on each individual's fitness level and some number of the most fit organisms are selected to live on to the next generation unchanged. This process is known as elitism. By setting the elitismPercentage
property on the EvolutionManager
you're able to customize the percentage of organisms from the current generation that will survive and continue into the next. This property can be set to any CGFloat
value between 0.0 (inclusive) to 1.0 (non-inclusive), and indicates the percentage of the total population n
that survives through to generation n + 1
.
Example: If you have a population of 100 organisms, and you set the
elitismPercentage
to0.1
, this will result in 10 organisms from generation0
being included in the population for the generation1
, as well as leave room for 90 offspring to be generated, resulting in the constant population size of 100 organisms. If theelitismPercentage
value is not set, the default value of0.10
is used, resulting in approximately 10% of the population being selected to live on for another generation. This ensures some level of protection against degredation in overall population fitness, but can lead to premature convergence if set too high.
This algorithm simulates sexual reproduction, as opposed to asexual reproduction, which means that offspring are always generated from two distinct parents, where the resulting child has the same genetic characteristics as both parents, meaning the length
and domain
of the genome remain the same. When the two parents are selected from the pool, their genes are combined using either the One-point crossover or Two-point crossover techniques. Single-point crossover results in a random index being chosen between 1
and length - 2
of the gene sequence, both genomes are copied, "split" at this index, and recombined to form the child's genetic code. The range of this index ensures that at least one gene from each parent makes it into the offspring's initial genome. Two-point crossover results in a random range being selected with the same restrictions as above, and that range is used to generate a "slice" from the genome where one parent contributes their genes from before and after the range, and the other contributes their genes from the range itself.
One-point Example: Consider a domain defined as
@"AB"
, with the first parent's gene sequence asAAAAA
and the second's asBBBBB
. If the randomly chosen index was2
, the first parent's contribution would beAA
, and the second parent's contribution would beBBB
, resulting in the child's genetic sequence beingAABBB
.Two-point example Consider a domain defined as
@"ABCD1234"
, with the first parent's gene sequence asAABBCCDD
and the second's as11223344
. If the randomly generate range was(2,3)
, that range would be used to select223
from the second parent's genes, as well asAA
andCDD
from the first parent's genes, resulting in the child's genetic sequence beingAA223CDD
.
After the crossover process is complete, every gene in the offspring's genetic sequence has a chance to mutate. This algorithm utilizes a uniform mutation strategy, meaning that if by chance the gene does mutate, it will mutate into one of the characters defined by the domain of their genome, not including the current character, with an equal chance to mutate into each.
Example: If the domain is defined as
@"1234"
, and the gene at a given point in the sequence is1
, that gene has an equal chance (1/3) of mutating into2
,3
, or4
. Note: This is not to be confused with themutationRate
explained below! This doesn't mean that every gene has a 1/3 chance to mutate into another, but rather IF a gene mutates, it has a 1 / (n - 1) chance to end up any other gene in the domain,n
being the length of the genome's domain.
The probability that a given gene will mutate is determined by the mutationRate
property of the EvolutionManager
. This property can be assigned any CGFloat greater than 0.0 and less than 1.0. Too high a mutation rate and you'll increase the randomness of the overall genetics of the population - making the preservation of any beneficial genetic traits less likely. Too low of a mutation rate and you won't introduce enough diversity into the population, leading to stagnation. Experiment with this number for some interesting results! If the mutationRate
value is not set, the default value of 0.05
is used.
Example: With a
mutationRate
of 0.05 (5%), and a genome domain of length5
, the probability that any given gene mutates is 5%, whereas the probability that every gene in an organism's genome mutates is .00003125% (5% * 5% * 5% * 5% * 5%).
After the selection and generation of offspring, the EvolutionManager
calls its required delegate method evolutionManager:didCompetedGeneration:fittestOrganism:offspring:nextGeneration:
, providing details about the results of the previous generation, and the updated organisms for the next generation. You can use this information for debugging, logging, updating your UI, etc.
The nextGeneration
parameter represents just that, the entire next generation of your population, including offspring and elites. To continue the simulation through another generation, first re-evaluate the fitness of this new population and call proceedWithSelectionAndBreeding
again. If you've determined that you don't want to continue with the simulation, because some maximal fitness was reached, your population has become stagnant, etc., simply don't call proceedWithSelectionAndBreeding
again.
If you want to contribute, open a pull request. Please do your best to keep the style/formatting consistent with the current project and MAKE SURE TO ADD TESTS!
If you have questions, feature requests, etc., I would prefer if you created an issue rather than email me directly.
This project is made available under the MIT license. See LICENSE.txt for details.