COS 323 - Computing for the Physical and Social Sciences

Fall 2012

Assignment 4: Simulating Population Genetics

Assignment by Aniket Kittur, adapted from Dannie Durand, modified by Ken Steiglitz

Due Tuesday, Dec. 11

Some introductory genetics: Genes are DNA sequences whose code determines which proteins are produced, and are grouped together in chromosomes. Higher organisms have two copies of each chromosome, one from the male and one from the female; such organisms are referred to as diploid. Thus each organism has two, possibly different, copies of each gene (these copies of a gene are called alleles).

To mate, diploid cells divide to produce sex cells, sperm or eggs. Each sex cell is haploid; that is, it contains only one set of chromosomes from the parent instead of two. If we consider a diploid organism, say a mouse, with two possible alleles, $a$ and $A$, this usually means that fifty percent of the sperm from a male $Aa$ mouse will contain the $a$ allele and the other fifty percent will contain the $A$ allele. The offspring of two $Aa$ mice would thus have a 25% chance of being $aa$, a 50% chance of being $Aa$, and a 25% chance of being $AA$. The combination of genes in a particular mouse is referred to as its genotype.

If we know the initial distribution of alleles in a population we can calculate a number of useful probabilities. Assuming we know the frequency of each allele in the population, we know the probabilities of each allele are $P(A) = p$ and $P(a) = (1-p)=q$. From this we can calculate the probability of finding a mouse in the population with a particular genotype ($AA$, $Aa$, or $aa$), how these probabilities vary over time and what the steady state probabilities will be. G. H. Hardy and W. Weinberg independently solved this problem in 1908 under a set of idealized assumptions:

The population is infinite.
All male, female pairs are equally likely to mate.
Alleles do not spontaneously appear or disappear from the population (i.e. no migration or mutation.)
All alleles are equally fit.

Under these conditions, the following steady-state genotype frequencies are established: $$ \begin{array}{ccc} P(AA) & P(Aa) & P(aa) \\ p^2 & 2pq & q^2 \end{array} $$

Your assignment:

The first two conditions never hold; the remaining two conditions hold some of the time, at best. We will examine these assumptions of the model and see if and how they affect the predictions and results, by conducting a series of simulations.

Implement the following in Java, C, or C++:

Simulate a finite population of 5000 mice, randomly assigned to be male or female, with two alleles at a single locus (e.g. $A$ and $a$). Assume that mice mate randomly and that the population size is fixed; because of chance not all the mice will necessarily mate and some may mate more than once. Also assume that the generations are synchronized: all the mice in generation $i$ are offspring of mice in generation $i-1$. It is not necessary to get fancy with implementation (e.g. don't use linked lists when arrays will suffice). Track how the frequency of each genotype changes over time. How do the allele frequencies vary over time? (Include a few example graphs.)
Implement a heuristic for recognizing whether an equilibrium condition is reached. Is equilibrium reached in the above simulation? If so, how quickly? Does it match the steady-state allele frequencies predicted by Hardy and Weinberg?
How does population size affect the answers to the above questions? (For concreteness, try populations of 50 and 5000.)
How do the initial allele frequencies affect the answers to the above questions? (Try A:a ratios of 100%/0%, 90%/10%, and 50%/50%)
Relax the assumption that all alleles are equally fit. Choose the $a$ allele to be lethal recessive; that is, $aa$ mice die at birth but $Aa$ and $AA$ mice don't. How does this change the equilibrium? Can any starting conditions change the final equilibrium? In some inherited lethal recessive diseases, such as Huntington's chorea, alleles continue to propagate in populations in accordance with the Hardy-Weinberg predicted frequencies. How might this be possible?
Now we will model a real-life example of the forces working in evolution, using the t-haplotype condition found in mice. Mice having two copies of a mutant t-haplotype gene (call it $t$) die at birth. Mice with one normal gene and one mutant gene ($+t$) seem almost exactly the same as mice with two normal genes ($++$), but have one major difference: a +/t male mouse passes t to more than 90% of his offspring, and thus passes the normal allele to less than 10%. The t-haplotype has no effect on +/t females. Can a stable equilibrium be reached with these conditions?
Current studies suggest that there might be another force involved in the t-haplotype condition. Some researchers believe that there is sexual selection at work as well: females slightly prefer males who have two normal genes over those who have one normal and one mutant gene (of course, males with two mutant genes never get to mate). This difference is small but detectable. This type of selection is evolutionarily plausible, since it would lead to a greater number of viable offspring. What are the effects of this kind of sexual selection on t-haplotype frequencies?
How many times should you run each simulation (i.e. each set of conditions) to have confidence in your results?

Submitting

This assignment is due Tuesday, December 11, 2012 at 11:59 PM. Please see the course policies regarding assignment submission, late assignments, and collaboration.

Please submit:

One or more well-commented source files containing all the code written for the assignment.
A writeup in plain-text or PDF format, containing:
- A description of your code
- Instructions for running the code to conduct the different simulations and compile the different statistics called for in parts 1-8
- Answers to the boldfaced questions posed in parts 1-8. Also include any plots you create to assist you in answering the questions.

The Dropbox link to submit your assignment is here.

Last update 20-Nov-2012 13:52:28

smr at princeton edu