COS 323 - Computing for the Physical and Social Sciences

Fall 2010

Course home Outline and lecture notes Assignments


Assignment 5: Simulating Population Genetics

Assignment by Aniket Kittur, adapted from Dannie Durand, modified by Ken Steiglitz

Due Friday, Dec. 17


Some introductory genetics: Genes are DNA sequences whose code determines which proteins are produced, and are grouped together in chromosomes. Higher organisms have two copies of each chromosome, one from the male and one from the female; such organisms are referred to as diploid. Thus each organism has two, possibly different, copies of each gene (these copies of a gene are called alleles).

To mate, diploid cells divide to produce sex cells, sperm or eggs. Each sex cell is haploid; that is, it contains only one set of chromosomes from the parent instead of two. If we consider a diploid organism, say a mouse, with two possible alleles, $a$ and $A$, this usually means that fifty percent of the sperm from a male $Aa$ mouse will contain the $a$ allele and the other fifty percent will contain the $A$ allele. The offspring of two $Aa$ mice would thus have a 25% chance of being $aa$, a 50% chance of being $Aa$, and a 25% chance of being $AA$. The combination of genes in a particular mouse is referred to as its genotype.

If we know the initial distribution of alleles in a population we can calculate a number of useful probabilities. Assuming we know the frequency of each allele in the population, we know the probabilities of each allele are $P(A) = p$ and $P(a) = (1-p)=q$. From this we can calculate the probability of finding a mouse in the population with a particular genotype ($AA$, $Aa$, or $aa$), how these probabilities vary over time and what the steady state probabilities will be. G. H. Hardy and W. Weinberg independently solved this problem in 1908 under a set of idealized assumptions:

Under these conditions, the following steady-state genotype frequencies are established: $$ \begin{array}{ccc} P(AA) & P(Aa) & P(aa) \\ p^2 & 2pq & q^2 \end{array} $$


Your assignment:

The first two conditions never hold; the remaining two conditions hold some of the time, at best. We will examine these assumptions of the model and see if and how they affect the predictions and results, by conducting a series of simulations.

Implement the following in Java, C, or C++:

Extra Credit:


Submitting

This assignment is due Friday, December 17, 2010 at 11:59 PM. Please see the general notes on submitting your assignments, as well as the late policy and the collaboration policy.

Please submit:

The Dropbox link to submit your assignment is here.


Last update 7-Dec-2010 13:41:14
smr at princeton edu