COS 323 - Computing for the Physical and Social Sciences
Assignment 4: Simulating Population Genetics
Assignment by Aniket Kittur, adapted from Dannie Durand, modified by Ken Steiglitz
Due Tuesday, Dec. 11
Some introductory genetics: Genes are DNA sequences whose code
determines which proteins are produced, and are grouped
together in chromosomes.
Higher organisms have two copies of each chromosome, one
from the male and one from the female; such organisms are referred to
as diploid. Thus each organism has two, possibly
different, copies of each gene (these copies of a gene are called
To mate, diploid cells divide to produce sex cells, sperm or eggs.
Each sex cell is haploid; that is, it contains only one set of chromosomes
from the parent instead of two. If we consider a diploid organism, say
a mouse, with two possible alleles, $a$ and $A$, this usually means
that fifty percent of the sperm from a male $Aa$ mouse will contain
the $a$ allele and the other fifty percent will contain the $A$ allele.
The offspring of two $Aa$ mice would thus have a 25% chance of being $aa$,
a 50% chance of being $Aa$, and a 25% chance of being $AA$. The
combination of genes in a particular mouse is referred to as its
If we know the initial distribution of alleles in a population we can
calculate a number of useful probabilities. Assuming we know the frequency
of each allele in the population, we know the probabilities of each allele
are $P(A) = p$ and $P(a) = (1-p)=q$. From this we can calculate the
probability of finding a mouse in the population with a particular
genotype ($AA$, $Aa$, or $aa$), how these probabilities vary over time
and what the steady state probabilities will be. G. H. Hardy and
W. Weinberg independently solved this problem in 1908 under a set of
- The population is infinite.
- All male, female pairs are equally likely to mate.
- Alleles do not spontaneously appear or disappear from the population
(i.e. no migration or mutation.)
- All alleles are equally fit.
Under these conditions, the following steady-state genotype frequencies are
P(AA) & P(Aa) & P(aa) \\
p^2 & 2pq & q^2
The first two conditions never hold; the remaining
two conditions hold some of the time, at best. We will examine these
assumptions of the model and see if and how they affect the predictions
and results, by conducting a series of simulations.
Implement the following in Java, C, or C++:
- Simulate a finite population of 5000 mice, randomly assigned to be male
or female, with two alleles at a single locus (e.g. $A$ and $a$). Assume
that mice mate randomly and that the population size is fixed; because of
chance not all the mice will necessarily mate and some may mate more than
once. Also assume that the generations are synchronized: all the mice in
generation $i$ are offspring of mice in generation $i-1$. It is not
necessary to get fancy with implementation (e.g. don't use linked lists
when arrays will suffice). Track how the frequency of each genotype
changes over time.
How do the allele frequencies vary over time?
(Include a few example graphs.)
- Implement a heuristic for recognizing whether an equilibrium condition is
reached. Is equilibrium reached in the above simulation? If so,
how quickly? Does it match the steady-state allele frequencies predicted
by Hardy and Weinberg?
- How does population size affect the answers to the above questions?
(For concreteness, try populations of 50 and 5000.)
- How do the initial allele frequencies affect the answers
to the above questions? (Try A:a ratios of 100%/0%, 90%/10%, and 50%/50%)
- Relax the assumption that all alleles are equally fit. Choose
the $a$ allele to be lethal recessive; that is,
$aa$ mice die at birth but $Aa$ and $AA$ mice
don't. How does this change the equilibrium? Can any starting conditions
change the final equilibrium? In some inherited lethal recessive
diseases, such as Huntington's chorea, alleles continue to propagate
in populations in accordance with the Hardy-Weinberg predicted frequencies.
How might this be possible?
- Now we will model a real-life example of the forces working in evolution,
using the t-haplotype condition found in mice. Mice having two copies
of a mutant t-haplotype gene (call it $t$) die at birth. Mice with one
normal gene and one mutant gene ($+t$) seem almost exactly the same as mice
with two normal genes ($++$), but have one major difference: a +/t male
mouse passes t to more than 90% of his offspring, and thus passes the
normal allele to less than 10%. The t-haplotype has no effect on +/t
females. Can a stable equilibrium be reached with these conditions?
- Current studies suggest that there might be another force involved in the
t-haplotype condition. Some researchers believe that there is sexual
selection at work as well: females slightly prefer males who have two
normal genes over those who have one normal and one mutant gene (of course,
males with two mutant genes never get to mate). This difference is small
but detectable. This type of selection is evolutionarily plausible, since
it would lead to a greater number of viable offspring. What are the
effects of this kind of sexual selection on t-haplotype frequencies?
- How many times should you run each simulation
(i.e. each set of conditions) to have confidence in your results?
This assignment is due Tuesday, December 11, 2012 at 11:59 PM.
Please see the course policies
regarding assignment submission, late assignments, and collaboration.
- One or more well-commented source files containing all the code
written for the assignment.
- A writeup in plain-text or PDF format, containing:
- A description of your code
- Instructions for running the code to conduct the different simulations
and compile the different statistics called for in parts 1-8
- Answers to the boldfaced questions posed in parts 1-8.
Also include any plots you create to assist you in answering the questions.
The Dropbox link to submit your assignment is
smr at princeton edu