Problem Set 1 - Alignments.
  1. Find the optimal global alignment(s) between CGATTC and GAATTC using a scoring system which assigns
    1. 1 for a match

    2. -1 for a mismatch
      -2 for a gap
    3. 1 for a match

    4. -1 for a mismatch
      0 for a gap
  2. Find the best local alignment between CGCAGCATTCTTT and TCCATGCTTATGCG using the standard scoring system used in the textbook.  Also turn in the scoring matrix produced by the algorithm.

  3. Theoretical Exercise (11.41 from Dan Gusfield;'s book)

  4. The tRNA folding problem:
    The following is a crude version of a problem that arises in predicting the secondary structure of transfer RNA molecules.  Let S be a string of n characters over the RNA alphabet A, C, U, G.  We define a pairing as a set of disjoint pairs of characters in S.  A pairing is called proper if it only contains (A, U) or (C, G) pairs.  This constraint arises because in RNA the (A, U) and (C, G) are pairs of complementary nucleotides.  If we draw S as a circular string, we define a nested pairing as a proper pairing where each pair in the pairing is connected by a line inside the circle, and where the lines do not cross each other. (A figure will be attached).  The problem is to find a nested pairing of largest cardinality.  Show how to solve this version of the tRNA folding problem in O(n^3) using dynamic programming.
    We now modify the problem by adding weights to the objective function so that the weight of an (A, U) pair is different than the weight of a (C, G) pair.  The goal is to find a nested pairing of maximum total weight.  Give an efficient algorithm for this weighted problem. 

  5. Affine gap penalty functions.

  6. For biology majors:
    Follow the link to the Smith-Waterman Alignment Engine.
    Usage Help is available.

    Experiment with various gap penalty functions and try to derive some relationships between the funcitons used and the resulting alignment.

    For computer science majors:
    Write a C-function to perform a global alignment between two sequences using an affine gap penalty function.  Print the resulting matrix as well as all the optimal alignments to stdout.

         void align(char *seq1,
                        char *seq2,
                        char *matrix_file,
                        int gap_open_penalty,
                        int gap_extension_penalty);
     
     

  7. What changes would you need to make to the local alignment algorithms if the gaps in s were penalized differently from gaps in t?

  8. Suppose
    1. gaps in s are penalized by -2 and in t by -1
    2. gaps in t are penalized by -2 and in s by -1

  9. Multiple Alignments:

  10. For this problem, use the sum-of-pairs metric and follow the ``once a
    gap, always a gap'' rule. Consider the following three sequences.

    (1) ACGTC
    (2) TCCT
    (3) ACGTCCT
     

    1. Compute all three optimal pairwise alignments assuming a cost of 2 for each deletion and 3 for each substitution. Give the cost of each alignment.
    2. Compute a progressive multiple alignment starting with the pairwise alignment (1,3). Now use the pairwise alignment (2,3) to merge sequence 2 into the multiple alignment. Show the resulting alignment and give its cost.
    3. Repeat problem 2, but this time use the pairwise alignment (1,2) to merge sequence 2 into the multiple alignment. Show the resulting alignment and give its cost. Are the two alignments the same? Which has a lower cost?
    4. What is the optimal multiple alignment?
    5. Suppose you charge a cost of 1 for each deletion and 1 for each substitution. What is the optimal alignment? Is it unique?