COS597A: Structural Bioinformatics

Structural bioinformatics books

[Bourne03]	P.E. Bourne, H. Weissig, "Structural Bioinformatics," Wiley-Liss, Hoboken, NJ, 2003. This is a good introductory book on structural bioinformatics. It practical rather than theoretical - it reviews the main sources of structural data (e.g., PDB, NDB, etc.) and surveys the most popular methods used for predicting structure, aligning structures, predicting function, etc.
[Orengo04]	C.A. Orengo, D.T. Jones, J.M. Thornton, "Bioinformatics: Genes, Proteins, \& Computers," BIOS Scientific Publishers, Abingdon, UK, 2004. This is a good book on bioinformatics, with particular emphasis on structural bioinformatics.

Cheminformatics books

[Gasteiger]	J. Gasteiger, T. Engel, "Cheminformatics," Wiley-VCH, Weinheim, Germany, 2003.
[Leach03]	A. Leach, V. Gillet, "An Introduction to Cheminformatics," Springer, 2003.
[Bajorath04]	J. Bajorath, "Chemoinformatics: Concepts, Methods, and Tools for Drug Discovery (Methods in Molecular Biology)," Humana Press, 2004.

Molecular modeling books

[Leach97]	A. Leach, "Molecular Modelling: Principles and Applications," Longman Pub Group, 1997.
[Schlick02]	T. Schlick, "Molecular Modeling and Simulation," Springer, 2002.
[Holtje03]	Holtje, Sippl, Rognan, Folkers, "Molecular Modeling: Basic Principles and Applications," Wiley-VCH, 2003.

Structural bioinformatics overviews

[Goldsmith-Fischman03]	S. Goldsmith-Fischman, B. Honig, "Structural genomics: Computational methods for structure analysis," Protein Science, 12, 2003, pp. 1813-1821.
[Blundell00]	T.L. Blundell, K. Mizuguchi, "Structural genomics: an overview," Progress in Biophysics \& Molecular Biology, 73, 2000, pp. 289-295.

Sequence databases

[Apweiler04]	R. Apweiler, A. Bairoch, C.H. Wu, "Protein sequence databases," Current Opinion in Chemical Biology, 8, 1, 2004, pp. 76-80. This is the main reference for the UniProt database
[Bairoch05]	A. Bairoch, R. Apweiler, C.H. Wu, W.C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, M.J. Martin, D.A. Natale, C. O'Donovan, N. Redaschi, L.S. Yeh, "The Universal Protein Resource (UniProt)," Nucleic Acids Res., 33, 2005, pp. D154-D159. This is a paper about the UniProt database

Structure databases

[Berman00]	H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, "The Protein Data Bank," Nucleic Acids Research, 28, 2000, pp. 235-242. This is the main reference for the PDB
[Berman92]	H.M. Berman, W.K. Olson, D.L. Beveridge, J. Westbrook, A. Gelbin, T. Demeny, S.H. Hsieh, A.R. Srinivasan, B. Schneider, "The Nucleic Acid Database: A Comprehensive Relational Database of Three-Dimensional Structures of Nucleic Acids," Biophys. J., 63, 1992, pp. 751-759. This is the main reference for the NDB

Structure database annotations

[Laskowski97]	R.A. Laskowski, E.G. Hutchinson, A.D. Michie A.C. Wallace, M.L. Jones, J.M. Thornton, "PDBsum: A Web-based database of summaries and analyses of all PDB structures," Trends Biochem. Sci., 22, 1997, pp. 488-490. This is the original paper about PDBsum, a web-based service for summarizing known information about every PDB file - very useful (http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/).
[Laskowski01]	R.A. Laskowski, "PDBsum: summaries and analyses of PDB structures," Nucleic Acids Res, 29, 2001, pp. 221-222. This is an update to PDBsum paper
[Laskowski05b]	R.A. Laskowski, V.V. Chistyakov, J.M. Thornton, "PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids," Nucleic Acids Res 33 Database, Issue, 2005, pp. D266-268. This is another update to PDBsum paper
[Velankar05]	S. Velankar, P. McNeil, V. Mittard-Runte, A. Suarez, D. Barrell, R. Apweiler, K. Henrick, "E-MSD: an integrated data resource for bioinformatics," Nucleic Acids Res (Database Issue), 33, 2005, pp. D262-D265. This is the main reference for the MSD, the macromolecular structural database (http://www.ebi.ac.uk/msd/). It provides a relational database with structures (e.g., PDB), quaternary structure predictions (PQS), classifications (e.g., SCOP, CATH, EC, etc.) and results of analyses (e.g., protein-ligand contacts).
[Schomburg00]	I. Schomburg, O. Hofmann, C. Bansch, A. Chang, D. Schomburg, "Enzyme data and metabolic information: BRENDA, a resource for research in biology, biochemistry, and medicine," Gene Funct. Dis., 3, 4, 2000, pp. 109-118. This is the original reference for BRENDA, a database with information about enzymes (http://www.brenda.uni-koeln.de/).
[Bairoch00]	A. Bairoch, "The ENZYME database in 2000," Nucleic Acids Res, 28, 2000, pp. 304-305. This is the main reference for the ENZYME database, which contains information about binding sites in enzymes (contacts, cofactors, etc.) (http://www.expasy.org/enzyme/).
[Hobohm92]	U. Hobohm, M. Scharf, R. Schneider, C.Sander, "Selection of a representative set of structures from the Brookhaven Protein Data Bank," Protein Science, 1, 1992, pp. 409-417. This is the original reference for PDBSelect.
[Hobohm94]	U. Hobohm, C. Sander, "Enlarged representative set of protein structures," Protein Science, 3, 1994, pp. 522. This is an update for PDBSelect.
[Henrick98]	K. Henrick, J.M. Thornton, "PQS: a protein quaternary structure file server," Trends in Biochemical Sciences, 23, 9, 1998, pp. 358-361.

Databases of small molecules

[Irwin05]

J.J. Irwin, B.K. Shoichet,
"ZINC - A Free Database of Commercially Available Compounds for Virtual Screening,"
J. Chem. Inf. Model, 45, 1, 2005, pp. 177-182.

This is the main reference for ZINC (http://blaster.docking.org/zinc/).

Protein-ligand complex databases

[Chalk04]	A.J. Chalk, C.L. Worth, J.P. Overington, A.W.E Chan, "PDBLIG: Classification of Small Molecular Protein Binding in the Protein Data Bank," J. Med. Chem., 47, 15, 2004, pp. 3807-3816.
[Feng04]	Z. Feng, L. Chen, H. Maddula, O. Akcan, R. Oughtred, H.M. Berman, J. Westbrook, "Ligand Depot: a data warehouse for ligands bound to macromolecules," Bioinformatics, 20, 13, 2004, pp. 2153-2155.
[Puvanendrampillai03]	D. Puvanendrampillai, J. Mitchell, "Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein-ligand complexes," Bioinformatics, 19, 14, 2003, pp. 1856-1857. This is the main reference for the Protein Ligand Database (PLD).
[Golovin05]	A. Golovin, D. Dimitropoulos, T. Oldfield, A. Rachedi, K. Henrick, "MSDsite: A Database Search and Retrieval System for the Analysis and Viewing of Bound Ligands and Active Sites," PROTEINS: Structure, Function, and Bioinformatics, 58, 1, 2005, pp. 190-199.
[Bergner02]	A. Bergner, J. Gunther, M. Hendlich, G. Klebe, M. Verdonk, "Use of Relibase for Retrieving Complex 3D Interaction Patterns Including Crystallographic Packing Effects," Biopolymers (Nucleic Acid Sci.), 61, 2002, pp. 99-110.
[Hendlich98]	M. Hendlich, "Databases for Protein-Ligand Complexes," Acta Crystallographica, D54, 1998, pp. 1178-1182. This is the main reference for Relibase, a database of protein-ligand interactions (http://relibase.ebi.ac.uk/).
[Sheu05]	S.H. Sheu, D.R. Lancia, Jr,, K.H. Clodfelter, M.R. Landon, S. Vajda, "PRECISE: a Database of Predicted and Consensus Interaction Sites in Enzymes," Nucleic Acids Research, 33 (Database issue), 2005, pp. D206-D211.

Protein structure fundamentals

[Branden99]	Carl-Ivar Branden, John Tooze, "Introduction to Protein Structure," Garland Publishing; 2nd edition, 1999. This is a classic book on protein structure
[Lesk01]	Arthur M. Lesk, "Introduction to Protein Architecture: The Structural Biology of Proteins," Oxford University Press, 2001. This is a good book on protein structure.
[Lehninger04]	David L. Nelson, Michael M. Cox, "Lehninger Principles of Biochemistry," W.H. Freeman; 4th edition, 2004. This is a classic book on biochemistry.
[Hunter93]	L. Hunter, "Molecular Biology for Computer Scientists," Artificial Intelligence and Molecular Biology, AAAI Press, 1993. This is a high-level review article covering all of molecular biology.

Protein structure characterization

[Varrazzo05]	D. Varrazzo, A. Bernini, O.Spiga, A. Ciutti, S. Chiellini, V. Venditti, L. Bracci, Neri Niccolai, "Three-dimensional computation of atom depth in complex molecular structures," Bioinformatics, 21, 12, 2005, pp. 2856-2860.
[Gerstein00]	M. Gerstein, F.M. Richards, "Protein Geometry: Volumes, Areas, and Distances," International Tables for Crystallography (Molecular Geometry and Features in Macromolecular Crystallography), Chapter 22, Volume F, 2000.
[Tsai99]	J. Tsai, R. Taylor, C. Chothia, M. Gerstein, "The Packing Density in Proteins: Standard Radii and Volumes," J. Mol. Biol., 290, 1999, pp. 253-266.
[Singh92]	Juswinder Singh, J.M. Thornton, "Protein Side-Chain Interactions," Oxford University Press, 1992.
[Sobolev99]	V. Sobolev, A. Sorokine, J. Prilusky, E.E. Abola, M. Edelman, "Automated analysis of interatomic contacts in proteins," Bioinformatics, 15, 4, 1999, pp. 327-332.

Protein fold classification

[Murzin95]	A.G. Murzin, S.E. Brenner, T. Hubbard, C. Chothia, "SCOP: a structural classification of proteins database for the investigation of sequences and structures," J. Mol. Biol, 247, 1995, pp. 536-540. This is the main reference for the SCOP hierarchy
[Andreeva04]	A. Andreeva, D. Howorth, S.E. Brenner, T.J.P. Hubbard, C. Chothia, A.G. Murzin, "SCOP database in 2004: refinements integrate structure and sequence family data," Nucleic Acids Research, 32, 2004, pp. D226-D229. This is a more recent reference for the SCOP hierarchy
[Orengo97]	C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, J.M. Thornton, "CATH - A Hierarchic Classification of Protein Domain Structures," Structure, 5, 8, 1997, pp. 1093-1108. This is the original reference for the CATH hierarchy
[Pearl05]	F. Pearl, A. Todd, I. Sillitoe, M. Dibley, O. Redfern, T. Lewis, C. Bennett, R. Marsden, et al, "The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis," Nucleic Acids Research, 33, 2005, pp. D247-D251. This is a more recent reference for the CATH hierarchy
[Taylor02a]	W.R. Taylor, "A ``periodic table'' for protein structures," Nature, 416, 6881, 2002, pp. 657-660. This paper formalizes both secondary and tertiary links to allow the rigorous and automatic definition of protein topology.
[Holm96]	L. Holm, C. Sander, "The FSSP database: fold classification based on structure-structure alignment of proteins," Nucleic Acids Research, 24, 1, 1996, pp. 206-209. This is the main reference for the FSSP classification, which is based on DALI structural alignments.

Protein fold space

[Chothia92]	C. Chothia, "One thousand families for the molecular biologist," Nature, 357, 1992, pp. 543-544. This is the classic paper in which Chothia predicted that the number of folds observed in nature is quite small compared to the number of proteins.
[Chothia86]	C. Chothia, A.M. Lesk, "The relation between the divergence of sequence and structure in proteins," The EMBO Journal, 5, 1986, pp. 823-826.
[Orengo94]	C.A. Orengo, D.T. Jones, J.M. Thornton, "Protein superfamilies and domain superfolds," Nature, 372, 1994, pp. 631-634.
[Sander91]	C. Sander, R. Schneider, "Database of homology-derived protein structures and the structural meaning of sequence alignment," Proteins, 9, 1, 1991, pp. 56-68.
[Wang96]	Z-X. Wang, "How many fold types of protein are there in nature?," Proteins, 26, 1996, pp. 186-191.
[Zhang97]	C-T. Zhang, "Relations of the numbers of protein sequences, families and folds," Protein Engineering, 10, 7, 1997, pp. 757-761.

Pairwise sequence alignment

[Needleman71]	S.B. Needleman, C.D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," J. Mol. Biol., 48, 1971, pp. 443-453. This is the original paper on global sequence alignment
[Smith81]	T.F. Smith, M.S. Waterman, "Identification of common molecular subsequences," J. Mol. Biol., 147, 1981, pp. 195-197. This is the original paper on local sequence alignment. It provides the main reference for the Smith-Waterman alignment score.
[McGinnis04]	S. McGinnis, T.L. Madden, "BLAST: at the core of a powerful and diverse set of sequence analysis tools," Nucleic Acids Res, 32, 2004, pp. W20-W25. This is the main reference for BLAST
[Pearson90]	W.R. Pearson, "Rapid and sensitive sequence comparison with FASTP and FASTA," Methods Enzymol, 183, 1990, pp. 63-98. This is the main reference for FASTA
[Altschul94]	S.F. Altschul, M.S. Boguski, W. Gish, J.C. Wootton, "Issues in searching molecular sequence databases," Nature Genetics, 6, 2, 1994, pp. 119-129. This is an overview of sequence alignment issues and methods. It provides a good reference for sequence alignment methods as a whole.

Multiple sequence alignment

[Higgins96]

D.G. Higgins, J.D. Thompson, T.J. Gibson,
"Using CLUSTAL for multiple sequence alignments,"
Methods Enzymol, 266, 1996, pp. 383-402.

Sequence motifs

[Altshul97]	S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman, "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Nucleic Acids Research, 25, 17, 1997, pp. 3389-3402. This is the main reference for PSI-BLAST.
[Bateman02]	A. Bateman, E. Birney, L. Cerruti, R. Durbin, L. Etwiller, S.R. Eddy, S. Griffiths-Jones, K.L. Howe, M. Marshall, E.L.L. Sonnhammer, "The Pfam protein families database," Nucleic Acids Res, 30, 2002, pp. 276-280. This is the main reference for Pfam.
[Soding04]	J. Soding, "Protein homology detection by HMM-HMM comparison," Bioinformatics, 21, 2004, pp. 951-960.
[Falquet02]	L. Falquet, M. Pagni, P. Bucher, N. Hulo, C.J. Sigrist, K. Hofmann, A. Bairoch, "The PROSITE database, its status in 2002," Nucleic Acids Res, 30, 2002, pp. 235-238. Describes the PROSITE database, which contains HMM profiles.
[Jonassen97]	I Jonassen, "Efficient discovery of conserved patterns using a pattern graph," Comput Appl Biosci, 13, 1997, pp. 509-522.

Pairwise structure alignment (overviews)

[Brown96]	N. Brown, C.A. Orengo, "A protein structure comparison methodology," Computers Chem, 20, 1996, pp. 359-380. This provides a nice review of structural alignment issues and methods.
[Sierk04a]	M.L. Sierk,, G.J. Kleywegt, "Deja vu all over again: finding and analyzing protein structure similarities," Structure (Camb), 12, 2004, pp. 2103-2111. ``This article is meant to guide the structural biologist in the basics of structural alignment, and to provide an overview of the available software tools. The main purpose is to encourage users to gain some understanding of the strengths and limitations of structural alignment, and to take these factors into account when interpreting the results of different programs.''
[Sierk04b]	M.L. Sierk, W.R. Pearson, "Sensitivity and selectivity in protein structure comparison," Protein Science, 13, 2004, pp. 773-785. This paper compares alignment methods with ROC curves on CATH database. ``Seven protein structure comparison methods and two sequence comparison programs were evaluated on their ability to detect either protein homologs or domains with the same topology (fold) as defined by the CATH structure database. The structure alignment programs Dali, Structal, Combinatorial Extension (CE), VAST, and Matras were tested along with SGM and PRIDE, which calculate a structural distance between two domains without aligning them. We also tested two sequence alignment programs, SSEARCH and PSI-BLAST. Depending upon the level of selectivity and error model, structure alignment programs can detect roughly twice as many homologous domains in CATH as sequence alignment programs. ... These results help quantify the statistical distinction between analogous and homologous structures, and provide a benchmark for structure comparison statistics.''
[Godzik96]	A. Godzik, "The structural alignment between two proteins: is there a unique answer?," Protein Sci, 5, 1996, pp. 1325-1338. This paper studies ``the problem of uniqueness and stability of structural alignments with the help of visualization of the suboptimal alignments. It is shown that alignments are often degenerate and whole families of alignments can be generated with almost the same score as the optimal alignment.''
[Holm94]	L. Holm, C. Sander, "Searching protein structure databases has come of age," Proteins, 19, 1994, pp. 165-173.
[Lemmen00]	C. Lemmen, T. Lengauer, "Computational methods for the structural alignment of molecules," J Comput Aided Mol Des, 14, 3, 2000, pp. 215-32. This paper reviews ``the past six years of scientific publishing on molecular superposition. Our focus lies on automatic procedures to be performed on arbitrary molecular structures. Methodical aspects are our main concern here ... providing pointers to the recent literature providing important contributions to computational methods for the structural alignment of molecules. Finally we provide a perspective on how superposition methods can effectively be used for the purpose of virtual database screening.''
[Eidhammer00]	I. Eidhammer, I. Jonassen, W.R. Taylor, "Structure comparison and structure patterns," J. Comput. Biol., 7, 2000, pp. 658-716. ``This article investigates aspects of pairwise and multiple structure comparison, and the problem of automatically discover common patterns in a set of structures. Descriptions and representation of structures and patterns are described, as well as scoring and algorithms for comparison and discovery. A framework and nomenclature is developed for classifying different methods, and many of these are reviewed and placed into this framework.''
[Kolodny04]	R. Kolodny, N. Linial, "Approximate protein structural alignment in polynomial time," PNAS, 101, 33, 2004, pp. 12201-12206. Here, we study the structural alignment problem as a family of optimization problems and develop an approximate polynomial-time algorithm to solve them. We argue that such approximate solutions are, in fact, of greater interest than exact ones because of the noisy nature of experimentally determined protein coordinates.

Pairwise structure alignment (methods)

[Holm93]	Lisa Holm, Chris Sander, "Protein Structure Comparison by Alignment of Distance Matrices," J. Mol. Biol, 233, 1993, pp. 123-138. This is the main reference for DALI alignment algorithm
[Holm95]	L. Holm,, C. Sander, "Dali: a network tool for protein structure comparison," Trends Biochem Sci, 20, 1995, pp. 478-480. This is a reference for DALI website (http://www.ebi.ac.uk/dali/).
[Holm00]	L. Holm, J. Park, "DaliLite workbench for protein structure comparison," Bioinformatics, 16, 6, 2000, pp. 566-567. This is the reference for DaliLite (http://ekhidna.biocenter.helsinki.fi:9801/dali_lite/start)
[Subbiah93]	S. Subbiah, D.V. Laurents, M. Levitt, "Structural Similarity of DNA-binding Domains of Bacteriophage Repressors and the Globin Core," Current Biol, 3, 1993, pp. 141-148. This is the original reference for STRUCTAL, which uses an EM algorithm that alternates between solving for the best superposition (least squares) and the best correspondences (dynamic programming).
[Gerstein98]	M. Gerstein, M. Levitt, "Comprehensive Assessment of Automatic Structural Alignment against a Manual Standard, the SCOP Classification of Proteins," Protein Science, 7, 1998, pp. 445-456. This is the second reference for STRUCTAL
[Krissinel04]	E. Krissinel, K. Henrick, "Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions," Acta Crystallogr D Biol Crystallogr, D60, 2004, pp. 2256-2268. This is the main reference for SSM (http://www.ebi.ac.uk/msd-srv/ssm/), which aligns proteins structures in two phases. The first phase aligns the main alpha-helix and beta-sheet secondary structure elements. The second phase aligns the alpha-carbon atoms of residues more precisely.
[Harrison03]	A. Harrison, F. Pearl, I. Sillitoe, T. Slidel, R. Mott, J.M. Thornton, C. Orengo, "Recognising the fold of a protein structure," Bioinformatics, 19, 2003, pp. 1748-1759. This is the main reference for GRATH
[Taylor89]	W.R. Taylor, C.A.Orengo, "Protein Structure Alignment," J. Mol. Biol., 208, 1, 1989. This is the original reference for SSAP, which employs double dynamic programming.
[Orengo90]	C.A. Orengo, W.R. Taylor, "A Rapid Method for Protein Structure Alignment," J. Theor Biol, 147, 1990, pp. 517-551.
[Orengo92]	C.A. Orengo, N.P. Brown, W.R. Taylor, "Fast structure alignment for protein databank searching," Proteins, 14, 1992, pp. 139-167. This describes a fast version of SSAP suitable for database searching. It is used to build the 2nd (A) and 3rd (T) levels of the CATH hierarchy.
[Orengo96]	C.A. Orengo, W.R. Taylor, "SSAP: sequential structure alignment program for protein structure comparison," Methods Enzymol, 266, 1996, pp. 617-635. This is a reference for SSAP (http://www.biochem.ucl.ac.uk/~orengo/ssap.html)
[Madej95]	T. Madej, J.F. Gibrat, S.H. Bryant, "Threading a database of protein cores," Proteins, 23, 1995, pp. 356-369. This is the main reference for VAST
[Gibrat96]	J.F. Gibrat, T. Madej, S.H. Bryant, "Surprising similarities in structure comparison," Curr Opin Struct Biol, 6, 3, 1996, pp. 377-385. Describes results achieved with VAST
[Shindyalov98]	I.N. Shindyalov, P.E. Bourne, "Protein structure alignment by incremental combinatorial extension (CE) of the optimal path," Protein Eng, 11, 1998, pp. 739-747. This is the main reference for CE (http://cl.sdsc.edu/ce.html).
[Zhu05]	J. Zhu J, Z. Weng, "FAST: a novel protein structure alignment algorithm," Proteins, 58, 2005, pp. 618-627. This is the main reference for FAST (http://biowulf.bu.edu/FAST/).
[Maiti04]	R. Maiti, G.H. Van Domselaar, H. Zhang, D.S. Wishart, "SuperPose: a simple server for sophisticated structural superposition," Nucleic Acids Res, 1, 32, 2004, pp. W590-W594. This is the main reference for SuperPose (http://wishart.biology.ualberta.ca/SuperPose/).
[Lessel94]	U. Lessel, D. Schomburg, "Similarities between protein 3-D structures," Protein Engineering, 7, 10, 1994, pp. 1175-1187. This is the reference for Protein3Dfit (http://biotool.uni-koeln.de:8080/3dalign_neu/cgi-bin/3daligner.py).
[Szustakowski00]	J.D. Szustakowski, Z. Weng, "Protein structure alignment using a genetic algorithm," Proteins, 38, 4, 2000, pp. 428-440. This is the main reference for K2/K2SA (http://zlab.bu.edu/k2sa/).
[Chen05]	L. Chen, T., T. Zhou, Y. Tang, "Protein structure alignment by deterministic annealing," Bioinformatics, 21, 2005, pp. 51-62.
[Ilyin04]	V. A. Ilyin, A. Abyzov, C. M. Leslin, "Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point," Protein Sci, 13, 7, 2004, pp. 1865-1874.

Pairwise structure alignment (comparisons)

[Kolodny05]	Rachel Koldny, Patrice Koehl, Michael Levitt, "Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures," J Mol Biol, 346, 2005, pp. 1173-1188.
[Novotny04]	M. Novotny, D. Madsen, G.J. Kleywegt, "Evaluation of protein fold comparison servers," Proteins, 54, 2004, pp. 260-270. Watson05: The authors perform a wide-ranging evaluation of 11 publicly available fold comparison servers They use the CATH database as a reference for their tests. The results show that no one server provides 100\% accuracy and therefore multiple methods should be used to assess similarities to known structures.
[Leplae02]	R. Leplae, T.J.P. Hubbard, "MaxBench: evaluation of sequence and structure comparison methods," Bioinformatics, 18, 3, 2002, pp. 494-495. Compares alignment methods with ROC curves on SCOP database.

Multiple structure alignment

[Russell92]	R.B. Russell, G.J. Barton, "Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels," Proteins, 14, 1992, pp. 309-323. This is the main reference for STAMP (bioinfo.ucr.edu/pise/stamp.html).
[Ye05]	Y. Ye, A. Godzik, "Multiple flexible structure alignment using partial order graphs," Bioinformatics, 21, 10, 2005, pp. 2362-2369.
[Shatsky04]	M. Shatsky, R. Nussinov, H.J. Wolfson, "A method for simultaneous alignment of multiple protein structures," Proteins, 56, 1, 2004, pp. 143-156. This is the main reference for MultiProt (http://bioinfo3d.cs.tau.ac.il/MultiProt/).
[Dror03]	O. Dror, H. Benyamini, R. Nussinov, H.J. Wolfson, "Multiple structural alignment by secondary structures: Algorithm and applications," Protein Sci, 12, 11, 2003, pp. 2492-2507.
[Gud04]	C. Guda, S. Lu, E.D. Scheeff, P.E. Bourne, I.N. Shindyalov, "CE-MC: a multiple protein structure alignment server," Nucleic Acids Res, 32, 2004, pp. W100-W103. This is the multiple alignment version of CE (http://cemc.sdsc.edu/).
[Lupyan05]	D. Lupyan, A. Leo-Macias, A.R. Ortiz, "A new progressive-iterative algorithm for multiple structure alignment," Bioinformatics, 21, 15, 2005, pp. 3255-3263.
[Leibowitz01]	N. Leibowitz, R. Nussinov, H.J. Wolfson, "MUSTA - a general, efficient, automated method for multiple structure alignment and detection of common motifs: application to proteins," J Comput Biol, 8, 2, 2001, pp. 93-121.
[Taylor94]	W.R. Taylor, T.P. Flores, C.A. Orengo, "Multiple protein structure alignment," Protein Science, 3, 10, 1994, pp. 1858-1870.

Protein-ligand binding site representation overviews

[Campbell03]	S.J. Campbell, N.D. Gold, R.M. Jackson, D.R. Westhead, "Ligand binding functional site location, similarity and docking," Curr Opin Struct Biol, 13, 2003, pp. 389-395. Overview of ways to find and compare protein-ligand binding sites
[Sotriffer02]	C. Sotriffer, G. Klebe, "Identification and mapping of small-molecule binding sites in proteins: computational tools for structure-based drug design," Il Farmaco, 57, 2002, pp. 243-251.
[Via00]	A. Via, F. Ferre, B. Brannetti, M. Helmer-Citterich, "Protein surface similarities: a survey of methods to describe and compare protein surfaces," Cellular and Molecular Life Sciences, 57, 2000, pp. 1970-1977.

Protein-ligand binding site analysis

[Stockwell05]	Gareth Stockwell, "Structural Diversity of Biological Ligands and their Binding Sites in Proteins," 2005.
[Bartlett02]	G.J. Bartlett, C.T. Porter, N.Borkakoti, J.M. Thornton, "Analysis of catalytic residues in enzyme active sites," J. Mol. Biol, 324, 1, 2002, pp. 105-121.
[Puvanendrampillai03]	D. Puvanendrampillai, J. Mitchell, "Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein-ligand complexes," Bioinformatics, 19, 14, 2003, pp. 1856-1857. This is the main reference for the Protein Ligand Database (PLD).
[Ringe95]	D. Ringe, "What makes a binding site a binding site?," Curr Opin Struct Biol, 5, 6, 1995, pp. 825-829.
[Vajda06]	S. Vajda, F. Guarnieri, "Characterization of protein-ligand interaction sites using experimental and computational methods," Curr Opin Drug Discov Devel, 9, 3, 2006, pp. 354-362.
[Kelly05]	M.S. Kelly, R.L. Mancera, "A new method for estimating the importance of hydrophobic groups in the binding site of a protein," J Med Chem, 48, 4, 2005, pp. 1069-1078.
[Lian94]	L.Y. Lian, I.L. Barsukov, M.J. Sutcliffe, K.H. Sze, G.C. Roberts, "Protein-ligand interactions: exchange processes and determination of ligand conformation and protein-ligand contacts," Methods Enzymol, 239, 1994, pp. 657-700.

Protein-ligand binding site detection from geometry

[Weisel07]	M. Weisel, E. Proschak, G. Schneider, "PocketPicker: analysis of ligand binding-sites with shape descriptors," Chemistry Central Journal, 1, 7, 2007.
[Brady00]	G.P. Brady, Jr., P.F.W. Stouten, "Fast prediction and visualization of protein binding pockets with PASS," Journal of Computer-Aided Molecular Design, 14, 4, 2000, pp. 383-401. This is the main reference for PASS, a system for detecting pockets in proteins that successively constructs layers of points starting at the surface of the protein and working towards the middle of voids. Points are rejected if they are ``too'' solvent accessible, thus leaving points only inside pockets.
[Peters96]	K.P. Peters, J. Fauck, C. Frommel, "The automatic search for ligand binding sites in proteins of known three-dimensional structure using only geometric criteria," J Mol Biol, 256, 1996, pp. 201-213. This is the main reference for APROPOS, a system for detecting protein pockets with alpha shapes.
[Hendlich97]	M. Hendlich, F. Rippman, G. Barnickel, "LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins," J. Mol. Graph., 15, 1997, pp. 359-363. This paper is the main reference for LIGSITE, a method to detect binding site pockets. Following POCKET [Levitt], it fills a grid with values representing the number of angles from which every point is visible to the outside (sampling only 7 angles), thereby providing a measure of how deeply a point is embedded in a concave pocket.
[Levitt92]	D. Levitt, L. Banaszak, "POCKET: A computer graphics method for identifying and displaying protein cavities and their surrounding amino acids," J. Mol. Graphics, 10, 1992, pp. 229-234. This is the main reference for POCKET, a system for identifying free-space points deeply buried in pockets by counting the number of axial directions for which the point is occluded from both directions. This method is followed-up by LIGSITE, which considers more than just 3 axial directions.
[Nayal06]	M. Nayal, B. Honig, "On the nature of cavities on protein surfaces: Application to the identification of drug-binding sites," Proteins: Structure, Function, and Bioinformatics, 63, 4, 2006, pp. 892-906.
[Ho90]	C.M.W. Ho, G.R. Marshall, "Cavity Search: an algorithm for the isolation and display of cavity-like binding regions," J Comput-Aided Mol Des, 1990, pp. 337-354. This is the main reference for Cavity Search.",
[Coleman06]	R.G. Coleman, K.A.Sharp, "Travel depth, a new shape descriptor for macromolecules: application to ligand binding," J Mol Biol, 362, 2006, pp. 441-458. This is the main reference for Travel detph.
[Kim06]	D. Kim, C. Cho, D. Kim, Y. Cho, "Recognition of docking sites on a protein using [beta]-shape based on Voronoi diagram of atoms," Computer-Aided Design, 38, 5, 2006, pp. 431-443.
[Frommel96]	C. Frommel, K.P. Peters, J. Fauck, "The automatic search for ligand binding sites in proteins of known three dimentional structure using only geometric criteria," J. Mol. Biol., 256, 1996, pp. 201-213.
[Pettit99]	F.K. Pettit, J.U. Bowie, "Protein surface roughness and small molecular binding sites," J. Mol. Biol., 285, 1999, pp. 1377-1382.
[Laskowski96a]	R.A. Laskowski, N.M. Luscombe, M.B. Swindells, J.M. Thornton, "Protein clefts in molecular recognition and function," Prot. Sci, 5, 12, 1996, pp. 2438-2452. This paper analyzes the properties of binding sites predicted with Surfnet
[Laskowski95]	R.A. Laskowski, "Surfnet: a program for visualizing molecular surfaces, cavities, and intermolecular interactions," J Mol Graph, 13, 1995, pp. 323-330. This is the main reference for Surfnet, a program that detects binding site pockets by constructing spheres whose diameters are chords between solvent accessible residues of the protein - spheres are rejected if the center of the chord lies with a certain distance (4 angstroms) of the protein surface or if the chord is more than a certain length (10 angstroms). The pocket is predicted to be the volume covered by the union of the spheres.
[Masuya95]	M. Masuya, J. Doi, "Detection and geometric modeling of molecular surfaces and cavities using digital mathematical morphological operations," J Mol Graph, 13, 1995, pp. 331-336. Uses mathematical morphology operations (erode, dilate, close) to detect cavities in a protein surface as the difference between the closure of the protein surface using a certain radius and the molecule itself. The method is demonstrated for two proteins.
[DelCarpio93]	C.A. Del Carpio, Y. Takahashi, S. Sasaki, "A New Approach to the Automatic Identification of Candidates for Ligand Receptor Sites in Proteins: (I) Search for Pocket Regions," J. Mol. Graph., 11, 1993, pp. 23-29.
[Chang04]	D.T. Chang, C.Y. Chen, W.C. Chung, Y.J. Oyang, H.F. Juan, H.C. Huang, "ProteMiner-SSM: a web server for efficient analysis of similar protein tertiary substructures," Nucleic Acids Res, 32, 2004, pp. W76-W82. Uses probability distributions derived from splats at atoms to detect binding sites.
[Halperin03]	I. Halperin, H. Wolfson, R. Nussinov, "SiteLight: Binding-site prediction using phage display libraries," Protein Science, 12, 2003, pp. 1344-1359.
[BenShimon05]	A. Ben-Shimon, M. Eisenstein, "Looking at Enzymes from the Inside out: The Proximity of Catalytic Residues to the Molecular Centroid can be used for Detection of Active Sites and Enzyme-Ligand Interfaces," J. Mol. Biol., 351, 2005, pp. 309-326.

Protein-ligand binding site detection from conservation

[Pils06]	B. Pils, R.R. Copley, J. Schultz, "Variation in structural location and amino acid conservation of functional sites in protein domain families," BMC Bioinformatics, 6, 2005.
[Cheng05]	G. Cheng, B. Qian, R. Samudrala, D. Baker, "Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design," Nucleic Acids Res, 33, 18, 2005, pp. 5861-5867.
[Huang06]	B. Huang, M. Schroeder, "LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation," BMC Struct Biol, 6, 2006, pp. 19-29.
[Glaser06]	F. Glaser, R. Morris, R. Najmanovich, R. Laskowski, J. Thornton, "A method for localizing ligand binding pockets in protein structures," Proteins, 62, 2006, pp. 479-488.
[Nimrod05]	G. Nimrod, F. Glaser, D. Steinberg, N. Ben-Tal, T. Pupko, "In silico identification of functional regions in proteins," Bioinformatics, 21 Suppl., 2005, pp. i328-i337.
[Chelliah04]	V. Chelliah, L. Chen, T. Blundell, S. Lovell, "Distinguishing structural and functional restraints in evolution in order to identify interaction sites," J Mol Biol, 342, 2004, pp. 1487-1504. Watson05: This method distinguishes residues conserved for functional reasons from those that are highly conserved because they are constrained by the structure. By comparing the observed sequence conservation with the predicted conservation (based on amino acid type and environmental constraints), the authors construct environment-specific substitution tables for use in identifying functionally conserved residues
[Innis04]	C.A. Innis, A.P. Anand, R. Sowdhamini, "Prediction of functional sites in proteins using conserved functional group analysis," J Mol Biol, 337, 2004, pp. 1053-1068. Watson05: This new method describes the conservation of a protein-surface using chemical groups rather than the amino acids A multiple sequence alignment is used to identify conserved functional group clusters, the size of which is determined by the number of proteins contributing to it. These are mapped onto the surface to identify active sites
[Lichtarge03]	O. Lichtarge, H. Yao, D.M. Kristensen, S. Madabushi, I. Mihalek, "Accurate and scalable identification of functional sites by evolutionary tracing," J Struct Funct Genomics, 4, 2003, pp. 159-166.
[Lichtarge02]	O. Lichtarge, M.E. Sowa, "Evolutionary predictions of binding surfaces and interactions," Curr Opin Struct Biol, 12, 2002, pp. 21-27.
[Joachimiak02]	M.P. Joachimiak, F.E. Cohen, "JEvTrace: refinement and variations of the evolutionary trace in JAVA," Genome Biol, 3, 2002, pp. RESEARCH0077.
[DelSolMesa03]	A. Del Sol Mesa, F. Pazos, A. Valencia, "Automatic methods for predicting functionally important residues," J Mol Biol, 326, 2003, pp. 1289-1302.
[Yao03]	H. Yao, D.M. Kristensen, I. Mihalek, M.E. Sowa, C. Shaw C, M. Kimmel, L. Kavraki, O. Lichtarge, "An accurate, sensitive and scalable method to identify functional sites in protein structures," J Mol Biol, 334, 2003, pp. 387-401.
[Sjolander04]	K. Sjolander, "Phylogenomic inference of protein molecular function: advances and challenges," Bioinformatics, 20, 2004, pp. 170-179.
[La05]	D. La, B. Sutch, D.R. Livesay, "Predicting protein functional sites with phylogenetic motifs," Proteins, 58, 2005, pp. 309-320.
[Abhiman05]	S. Abhiman, E.L.L. Sonnhammer, "FunShift: a database of function shift analysis on protein subfamilies," Nucleic Acids Res, 33, 2005, pp. D197-D200.
[Zhang99]	B. Zhang, L. Rychlewski, K. Pawlowski, J.S. Fetrow, J. Skolnick, A. Godzik, "From fold predictions to function predictions: automation of functional site conservation analysis for functional genome predictions," Protein Sci, 8, 5, 1999, pp. 1104-1115. SITE/Site Match

Protein-ligand binding site detection from probe energetics

[Laurie05]	A.T.R. Laurie, R.M. Jackson, "Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites," Bioinformatics, 2005. Predicts binding site location by filling grid with van der Waals interaction potentials for a methyl probe. Evaluated with data set of 134 proteins.
[An05]	J. An, M. Totrov, R. Abagyan, "Pocketome via comprehensive identification and classification of ligand binding envelopes," Mol Cell Proteomics, 4, 6, 2005, pp. 752-761. This is a reference for PocketFinder, a system for detecting binding site pockets by filling a grid with values representing a van der Waals force field (according to Lennard-Jones formula). A suitable theshold is chosen, and the grid points with value above the threshold are considered ``inside'' the binding pocket. The paper contains an evaluation of the method for a large number of PDB files, both with and without bound ligands.
[An04]	J. An, M. Totrov, R. Abagyan, "Comprehensive Identification of ``Druggable'' Protein Ligand Binding Sites," Genome Informatics, 15, 2, 2004, pp. 31-41. This is very similar to [An05]

Protein-ligand binding site detection from electrostatic potential

[Bate04]	P. Bate, J. Warwicker, "Enzyme/non-enzyme discrimination and prediction of enzyme active site location using charge-based methods," J Mol Biol, 340, 2, 2004, pp. 263-276.
[Shanahan04]	H.P. Shanahan, M.A. Garcia, S. Jones, J.M. Thornton, "Identifying DNA-binding proteins using structural motifs and the electrostatic potential," Nucleic Acids Res, 32, 2004, pp. 4732-4741.

Protein-ligand binding site detection from residue instability

[Elcock01]

A.H. Elcock,
"Prediction of functionally important residues based solely on the computed energetics of protein structure,"
J. Mol. Biol., 312, 4, 2001, pp. 885-896.

Uses instability of residues to predict which ones are in active binding site - among conserved residues, predicts that stable ones are in core, and instable ones are in active site.

Protein-ligand binding site detection from residue packing

[Amitai04]

G. Amitai, A. Shemesh, E. Sitbon, M. Shklar, D. Netanely, I. Venger, S. Pietrokovski,
"Network analysis of protein structures identifies functional residues,"
J Mol Biol, 344, 2004, pp. 1135-1146.

``We transformed protein structures into residue interaction graphs (RIGs), where amino acid residues are graph nodes and their interactions with each other are the graph edges. We found that active site, ligand-binding and evolutionary conserved residues, typically have high closeness values. Residues with high closeness values interact directly or by a few intermediates with all other residues of the protein. Combining closeness and surface accessibility identified active site residues in 70\% of 178 representative structures.''

Protein-ligand binding site detection from microscopic titration curves

[Ko05]	J. Ko, J.L.F. Murga, Y. Wei, M.J. Ondrechen, "Prediction of active sites for protein structures from computed chemical properties," Bioinformatics, 21, 1, 2005, pp. i258-i265. Uses microscopic titration curves to detect functional residues
[Ondrechen01]	M.J. Ondrechen, J.G. Clifton, D. Ringe, "Thematics: a simple computational predictor of enzyme function from structure," Proc. Natl Acad. Sci., 98, 2001, pp. 12473-12478.
[Ringe04]	D. Ringe D, Y. Wei, K.R. Boino, M.J. Ondrechen, "Protein structure to function: insights from computation," Cell Mol Life Sci, 61, 2004, pp. 387-392. Finds binding sites using theoretical microscopic titration curves

Protein-ligand binding site detection from docking analyses

[Silberstein03]	Michael Silberstein, Sheldon Dennis, Lawrence Brown III, Tamas Kortvelyesi, Karl Clodfelter, Sandor Vajda, "Identification of Substrate Binding Sites in Enzymes by Computational Solvent Mapping," J. Mol. Biol., 332, 2003, pp. 1095-1113. Docks many small molecule fragments and then predicts that active residues are the ones closest to the docked positions of the fragments
[Dennis02]	Sheldon Dennis, Tamas Kortvelyesi, Sandor Vajda, "Computational mapping identifies the binding sites of organic solvents on proteins," PNAS, 99, 7, 2002, pp. 4290-4295.
[Bliznyuk99]	A. Bliznyuk, J. Gready, "Simple method for locating possible ligand binding sites on protein surfaces," J. Comput. Chem., 9, 1999, pp. 983-988. Uses FFT to dock rigid ligand using a simple shape correlation function in order to find the correct binding site, which will later be analyzed by more detailed (energetic) docking methods.

Protiein-ligand binding site analysis and prediction from multiple properties

[Guo05]	T. Guo, Y. Shi, Z. Sun, "A novel statistical ligand-binding site predictor: application to ATP-binding sites," Protein Engineering, Design and Selection, 18, 2, 2005, pp. 65-70.
[Rossi06]	A. Rossi , M.A. Marti-Renom, A. Sali, "Localization of binding sites in protein structures by optimization of a composite scoring function," Protein Sci, 15, 10, 2006, pp. 2366-2380.
[Zvelebil88]	M.J.J.M. Zvelebil, M.J.E. Sternberg, "Analysis and prediction of the location of catalytic residues in enzymes," Protein Engineering, 2, 2, 1988, pp. 127-138.
[Huan-Xiang01]	Z. Huan-Xiang, S. Yibing, "Prediction of protein interaction sites from sequence profile and residue neighbor list," Proteins, 44, 2001, pp. 336-343.
[Cilia07]	Elisa Cilia, "Protein Active Site Detection using SVMs and Kernel Methods," Contribution to the Learning and Intelligent Optimization Workshop (LION), Andalo (TN), Italy, 2007. This looks like a report to the author's research group
[Cilia06]	Elisa Cilia, Alessandro Moschitti, Sergio Ammendola, Roberto Basili, "Structured Kernels for Automatic Detection of Protein Active Sites," Mining and Learning with Graphs Workshop (MLG), 2006.
[Petrova06]	N.V. Petrova, C.H. Wu, "Prediction of catalytic residues using support vector machine with selected protein sequence and structural properties," BMC Bionformatics, 7, 2006, pp. 312-324.
[Keil04]	M. Keil, T.E. Exner, J. Brickmann, "Pattern recognition strategies for molecular surfaces: III. Binding site prediction with a neural network," J Comput Chem, 25, 6, 2004, pp. 779-789.
[Gutteridge03]	A. Gutteridge, G.J. Bartlett, J.M. Thornton, "Using a neural network and spatial clustering to predict the location of active sites in enzymes," J Mol Biol, 330, 2003, pp. 719-734.
[Bradford04]	J.R. Bradford, D.R. Westhead, "Improved prediction of protein-protein binding sites using a support vector machines approach," Bioinformatics, 2004.
[Chen04]	S-C. Chen, I. Bahar, "Mining frequent patterns in protein structures: a study of protease families," Bioinformatics, 20, 1, 2004, pp. i1-i9.
[Alexandrov94]	N.N. Alexandrov, N. Go, "Biological meaning, statistical significance, and classification of local spatial similarities in nonhomologous proteins," Protein Sci, 3, 1994, pp. 866-875.
[Bagley95a]	S.C. Bagley, R.B. Altman, "Characterizing the microenvironment surrounding protein sites," Protein Sci, 4, 4, 1995, pp. 622-635. This is the main reference for Feature. ``Sites are microenvironments within a biomolecular structure, distinguished by their structural or functional role. A site can be defined by a three-dimensional location and a local neighborhood around this location in which the structure or function exists. We have developed a computer system to facilitate structural analysis (both qualitative and quantitative) of biomolecular sites. Our system automatically examines the spatial distributions of biophysical and biochemical properties, and reports those regions within a site where the distribution of these properties differs significantly from control nonsites. The properties range from simple atom-based characteristics such as charge to polypeptide-based characteristics such as type of secondary structure. Our analysis of sites uses non-sites as controls, providing a baseline for the quantitative assessment of the significance of the features that are uncovered. In this paper, we use radial distributions of properties to study three well-known sites (the binding sites for calcium, the milieu of disulfide bridges, and the serine protease active site). We demonstrate that the system automatically finds many of the previously described features of these sites and augments these features with some new details. In some cases, we cannot confirm the statistical significance of previously reported features. Our results demonstrate that analysis of protein structure is sensitive to assumptions about background distributions, and that these distributions should be considered explicitly during structural analyses.''
[Bagley95b]	S.C. Bagley, L. Wei, C. Cheng, R.B. Altman, "Characterizing oriented protein structural sites using biochemical properties," Proc Int Conf Intell Syst Mol Biol, 3, 1995, pp. 12-20. ``A protein site is a region of a three-dimensional protein structure with a distinguishing functional or structural role. Certain sites recur in different protein structures (for example catalytic sites, calcium binding sites, and some types of turns), but maintain critical shared features. To facilitate the analysis of such protein sites, we have developed a computer system for analyzing the spatial distributions of biochemical properties around a site. The system takes a set of similar sites and a set of control nonsites, and finds differences between them. Specifically, it compares distributions of the properties surrounding the sites with those surrounding the nonsites, and reports statistically significant differences. In this paper, we use our method to analyze the features in the active site of the serine protease enzymes. We compare the use of radial distributions (shells) with 3-D grids (blocks) in the analysis of the active site. We demonstrate three different strategies for focusing attention on significant findings, based on properties of interest, spatial volumes of interest, and on the level of statistical significance. Finally, we show that the program automatically identifies conserved sequential, secondary structural and biophysical features of the serine protease active site, using noncatalytic histidine residues as a control environment.''
[Bagley96]	S.C. Bagley, R.B. Altman, "Conserved features in the active site of nonhomologous serine proteases," Fold Des, 1, 5, 1996, pp. 371-379. ``BACKGROUND: Serine protease activity is critical for many biological processes and has arisen independently in a few different protein families. It is not clear, though, the degree to which these protease families share common biochemical and biophysical properties. We have used a computer program to study the properties that are shared by four serine protease active sites with no overall structural or sequence homology. The program systematically compares the region around the catalytic histidines from the four proteins with a set of noncatalytic histidines, used as controls. It reports the three-dimensional locations and level of statistical significance for those properties that distinguish the catalytic histidines from the noncatalytic ones. The method of analysis is general and can be applied easily to other active sites of interest. RESULTS: As expected, some of the reported properties correspond to previously known features of the serine protease active site, including the catalytic triad and the oxyanion hole. Novel properties are also found, including the spatial distribution of charged, polar, and hydrophobic groups arranged to stabilize the catalytic residues, and a relative abundance of some residues (Val, Tyr, Leu, and Gly) around the active site. CONCLUSIONS: Our findings show that in addition to some properties common to all the proteases examined, there are a set of preferred, but not required, properties that can be reliably observed only by aligning the sites and comparing them with carefully selected statistical controls.''
[Banatao03]	D.R. Banatao, R.B. Altman, T.E. Klein, "Microenvironment analysis and identification of magnesium binding sites in RNA," Nucleic Acids Res, 31, 15, 2003, pp. 4450-4460. Used the FEATURE algorithm to determine `` novel physicochemical descriptions of site-bound and diffusely bound Mg2+ ions in RNA that are useful for prediction. Electrostatic calculations using the Non-Linear Poisson Boltzmann (NLPB) equation provided further evidence for the locations of site-bound ions. We confirmed the locations of experimentally determined sites and further differentiated between classes of ion binding. We also identified potentially important, high scoring sites in the group I intron that are not currently annotated as Mg2+ binding sites.''
[Wei03]	L. Wei, R.B. Altman, "Recognizing Complex, Asymmetric functional sites in protein structures using a Bayesian scoring function," Journal of Bioinformatics and Computational Biology, 1, 1, 2003, pp. 119-138.
[Liang03a]	M.P. Liang, D.R. Banatao, T.E. Klein, D.L. Brutlag, R.B. Altman, "WebFEATURE: An interactive web tool for identifying and visualizing functional sites on macromolecular structures," Nucleic Acids Res, 31, 13, 2003, pp. 3324-3327. ``WebFEATURE (http://feature.stanford.edu/webfeature/) is a web-accessible structural analysis tool that allows users to scan query structures for functional sites in both proteins and nucleic acids. WebFEATURE is the public interface to the scanning algorithm of the FEATURE package, a supervised learning algorithm for creating and identifying 3D, physicochemical motifs in molecular structures. Given an input structure or Protein Data Bank identifier (PDB ID), and a statistical model of a functional site, WebFEATURE will return rank-scored ``hits'' in 3D space that identify regions in the structure where similar distributions of physicochemical properties occur relative to the site model. Users can visualize and interactively manipulate scored hits and the query structure in web browsers that support the Chime plug-in. Alternatively, results can be downloaded and visualized through other freely available molecular modeling tools, like RasMol, PyMOL and Chimera. A major application of WebFEATURE is in rapid annotation of function to structures in the context of structural genomics.''
[Banatao01]	D.R. Banatao, C.C. Huang, P.C. Babbitt, R.B. Altman, T.E. Klein, "ViewFeature: integrated feature analysis and visualization," Pac Symp Biocomput, 2001, pp. 240-250. ``We have developed an extension to the molecular visualization program Chimera that integrates Feature's statistical models and site predictions with 3-dimensional structures viewed in Chimera. We call this extension ViewFeature, and it is designed to help users understand the structural Features that define a site of interest. We applied ViewFeature in an analysis of the enolase superfamily; a functionally distinct class of proteins that share a common fold, the alpha/beta barrel, in order to gain a more complete understanding of the conserved physical properties of this superfamily. In particular, we wanted to define the structural determinants that distinguish the enolase superfamily active site scaffold from other alpha/beta barrel superfamilies and particularly from other metal-binding alpha/beta barrel proteins. Through the use of ViewFeature, we have found that the C-terminal domain of the enolase superfamily does not differ at the scaffold level from metal-binding alpha/beta barrels. We are, however, able to differentiate between the metal-binding sites of alpha/beta barrels and those of other metal-binding proteins. We describe the overall architectural Features of enolases in a radius of 10 Angstroms around the active site.''
[Liang03b]	M.P. Liang, D.L. Brutlag, R.B. Altman, "Automated construction of structural motifs for predicting functional sites on protein structures," Pac Symp Biocomput, 2003, pp. 204-215. ``We describe a method to predict functional sites by automatically creating three dimensional structural motifs from amino acid sequence motifs. These structural motifs perform comparably well with manually generated structural motifs and perform better than sequence motifs.''
[Waugh01]	A. Waugh, G.A. Williams, L. Wei, R.B. Altman, "Using meta computing tools to facilitate large-scale analyses of biological databases," Pac Symp Biocomput, 2001, pp. 360-371. ``We use a distributed computing environment, Legion, to enable large-scale computations on the Protein Data Bank (PDB). In particular, we employ the Feature program to scan all protein structures in the PDB in search for unrecognized potential cation binding sites. We evaluate the efficiency of Legion's parallel execution capabilities and analyze the initial biological implications that result from having a site annotation scan of the entire PDB. We discuss four interesting proteins with unannotated, high-scoring candidate cation binding sites.''
[Wei98]	L. Wei, R.B. Altman, "Recognizing protein binding sites using statistical descriptions of their 3D environments," Pac Symp Biocomput, 1998, pp. 497-508. This is the main reference for matching of binding sites with Feature. ``We have developed a new method for recognizing sites in three-dimensional protein structures. Our method is based on our previously reported algorithm for creating descriptions of protein microenvironments using physical and chemical properties at multiple levels of detail (including features at the atomic, chemical group, residue, and secondary structural levels). The recognition method takes three inputs: a set of sites that share some structural or functional role, a set of control nonsites that lack this role, and a single query site. The values of properties for the query site are compared to the distributions of values for both sites and nonsites to determine the group to which it is most similar. A log-odds scoring function, based on Bayes' Rule, computes a score that indicates the likelihood that the query region is a site of interest. In this paper, we apply the method to the task of identifying calcium binding sites in proteins. Cross-validation analysis shows that this recognition approach has high sensitivity and specificity. We also describe the results of scanning four calcium binding proteins (with the calcium removed) using a three-dimensional grid of probe points at 2 A spacing. The probe points that have high scores cluster around the true calcium binding sites, with the highest scoring points at or near the binding sites. The method fails in only one case where a calcium binding site is created by four proteins in the crystal lattice, and is thus not recognizable within the crystallographic asymmetric unit. Our results show that property-based descriptions can be used for recognizing protein sites in unannotated structures.''
[Wei97]	L. Wei, R.B. Altman, J.T. Chang, "Using the radial distributions of physical features to compare amino acid environments and align amino acid sequences," Pac Symp Biocomput, 1997, pp. 465-476. ``We have performed a comprehensive analysis of the microenvironments surrounding the twenty amino acids. Our analysis includes comparison of amino acid environments with random control environments as well as with each of the other amino acid environments. We describe the amino acid environments with a set of 21 features summarizing atomic, chemical group, residue, and secondary structural features. The environments are divided into radial shells of 1 A thickness to represent the distance of the features from the amino acid C beta atoms. We make the results of our analysis available graphically over the world wide web. To illustrate the validity and utility of our analysis, we used the amino acid comparative profiles to construct a substitution matrix, the WAC matrix, based on a simple summary of the computed environmental differences. We compared our matrix to BLOSUM62 and PAM250 in BLAST searches with query sequences selected from 39 protein families found in the PROSITE database. Although BLOSUM62 was the most sensitive matrix overall, our matrix was more sensitive for some families, and exhibited overall performance similar to PAM250. Our results suggest that the radial distribution of biochemical and biophysical features is useful for comparing amino acid environments, and that similarity matrices based on the geometric distribution of features around amino acids may produce improved search sensitivity.''
[Yoon07]	S. Yoon, J.C. Ebert, E-Y. Chung, G. De Micheli, R.B. Altman, "Clustering protein environments for function prediction: finding PROSITE motifs in 3D," BMC Bioinformatics, 8, 4, 2007.
[Ota03]	M. Ota, K. Kinoshita, K. Nishikawa, "Prediction of Catalytic Residues in Enzymes Based on Known Tertiary Structure, Stability Profile, and Sequence Conservation," Journal of Molecular Biology, 327, 5, 2003, pp. 1053-1064.
[Kinoshita05]	K. Kinoshita,, H. Nakamura, "Identification of the ligand binding sites on the molecular surface of proteins," Protein Science, 14, 17, 2005, pp. 711-718. This paper contains results of an experiment for comparison of a few binding site surfaces to a large database (almost all binding site surfaces in the PDB) using the method described in [Kinoshita03]. Since partial surfaces are matched, a similarity scoring method is introduced that considers both a normalized score for the match of the geometry and electrostatics (Z-score) and the ``coverage'' of the match (fractions of the surfaces found to be in correspondence). Results are presented for 18 hypothetical proteins.

Protein-ligand binding site representations with pseudo-atoms and matching with association graphs

[Artymiuk94]	P.J. Artymiuk, A.R. Poirrette, H.M. Grindley, D.W. Rice, P. Willett, "A Graph-Theoretic Approach to the Identification of 3-Dimensional Patterns of Amino-Acid Side-Chains in Protein Structures," Journal of Molecular Biology, 243, 1994, pp. 327-344. ``This paper discusses the use of graph-theoretic methods for the representation and searching of three-dimensional patterns of side-chains in protein structures. The position a side-chain is represented by pseudo-atoms, and the relative positions of pairs of side-chains by the distances between them. This description of the geometry can be represented by a labelled graph in which the nodes and the edges of the graph represent the pseudo-atoms and the sets of inter-pseudo-atomic distances, respectively. Given such a representation, a protein can be searched for the presence of a user-defined query pattern of side-chains by means of a subgraph-isomorphism algorithm which is implemented in the program ASSAM.''
[Schmitt02]	S. Schmitt, D. Kuhn, G. Klebe, "A new method to detect related function among proteins independent of sequence and fold homology," J Mol Biol, 323, 2002, pp. 387-406. This is the main reference for pseudo-centers. Also, describes Cavbase. Finds maximal clique in association graph to match sets of pseudo-centers.
[Weskamp04]	N. Weskamp, D. Kuhn, E. Hullermeier, G. Klebe, "Efficient similarity search in protein structure databases by k-clique hashing," Bioinformatics, 20, 2004, pp. 1522-1526. Describes search of Cavbase (sites represented by pseudo-centers) combining clique detection in association graphs with geometric hashing.
[Weskamp03]	N. Weskamp, D. Kuhn, E Hellermeier, G. Klebe, "Efficient Similarity Search in Protein Structure Databases: Improving Clique-Detection through Clique Hashing," German Conference on Bioinformatics, Munich, Germany, 2003.
[Kupas04]	K. Kupas, A. Ultsch, G. Klebe, "An algorithm for finding similarities in protein active sites," ICBA, Fort Lauderdale, FL, 2004. This paper is uses pseudo-centers to compare binding sites. ``The binding-site exposed physicochemical characteristics are described by assigning generic pseudocenters to the functional groups of the amino acids flanking a particular active site. These pseudocenters are assembled into small substructures. To find substructures with spatial similarity and appropriate chemical properties, an emergent self-organizing map is used for clustering. Two substructures which are found to be similar form the basis for an expanded comparison of the complete cavities. Preliminary results with four pairs of binding cavities show that similarities are detected correctly and motivatefurther studies.''
[Spriggs03]	R.V. Spriggs, P.J. Artymiuk, P. Willett, "Searching for patterns of amino acids in 3D protein structures," J Chem Inf Comput Sci, 43, 2003, pp. 412-421. Uses pseuedo-centers and distance subgraph isomorphism. ASSAM represents an amino acid by a vector drawn from the main chain towards the functional part of the amino acid and then computes a graph representation of a protein in which the individual side-chain vectors are the nodes and the intervector distances are the edges. The presence of a query pattern in a Protein Data Bank structure can then be searched for by means of a subgraph isomorphism algorithm.
[Brakoulias04]	A. Brakoulias, R.M. Jackson, "Towards a structural classification of phosphate binding sites in protein-nucleotide complexes: an automated all-against-all structural comparison using geometric matching," Proteins, 56, 2, 2004, pp. 250-260.
[Pickering01]	S.J. Pickering, A.J. Bulpitt, N. Efford, N.D. Gold, D.R. Westhead, "AI-based algorithms for protein surface comparisons," Comput Chem, 26, 2001, pp. 79-84. This paper uses surface points and association graphs to match ligand binding sites.
[Milik03]	M. Milik, S. Szalma, K.A. Olszewski, "Common Structural Cliques: a tool for protein structure and function analysis," Protein Eng, 16, 2003, pp. 543-552. ``The compared protein structures are condensed to a graph representation, with atoms as nodes and distances as edge labels. Protein graphs are then compared to extract all possible Common Structural Cliques. These cliques are merged to create Structural Templates: graphs that describe structural analogies between compared proteins. Structures of serine endopeptidases were compared in pairs using the presented algorithm with different geometrical parameters.''
[Wangikar03]	P.P. Wangikar, A.V. Tendulkar, S. Ramya, D.N. Mail, S. Sarawagi, "Functional sites in protein families uncovered via an objective and automated graph theoretic approach," Journal of Molecular Biology, 326, 2003, pp. 955-978. ``We report a method for detection of recurring side-chain patterns (DRESPAT) using an unbiased and automated graph theoretic approach. We first list all structural patterns as sub-graphs where the protein is represented as a graph. The patterns from proteins are compared pair-wise to detect patterns common to a protein pair based on content and geometry criteria. The recurring pattern is then detected using an automated search algorithm from the all-against-all pair-wise comparison data of proteins. Intra-protein pattern comparison data are used to enable detection of patterns recurring within a protein. A method has been proposed for empirical calculation of statistical significance of recurring pattern. The method was tested on 17 protein sets of varying size, composed of non-redundant representatives from SCOP superfamilies.''
[Jambon03]	M. Jambon, A. Imberty, G. Deleage, C. Geourjon, "A new bioinformatic approach to detect common 3D sites in protein structures," Proteins, 52, 2003, pp. 137-145. ``The basis for this method is a representation of the protein structure by a set of stereochemical groups that are defined independently from the notion of amino acid. An efficient heuristic for finding similarities that uses graphs of triangles of chemical groups to represent the protein structures has been developed.''
[Barrow76]	H.G. Barrow, R.M. Burstall, "Subgraph isomorphism, matching relational structures and maximal cliques," Inf. Process. Lett., 4, 1976, pp. 83-84. This is the classical paper about using association graphs to match rigid point sets

Protein-ligand binding site representations with pseudo-atoms and matching with geometric hashing

[Shulman-Peleg04]	A. Shulman-Peleg, R. Nussinov, H.J. Wolfson, "Recognition of functional sites in protein structures," J Mol Biol, 339, 2004, pp. 607-633. Watson05: SiteEngine uses modified pseudo-centres and geometric hashing to compare surfaces with the aim of identifying conserved chemistry in similar pockets, which might indicate similar function. ``We achieve high efficiency and speed by introducing a low-resolution surface representation via chemically important surface points, by hashing triangles of physico-chemical properties and by application of hierarchical scoring schemes for a thorough exploration of global and local similarities. We proceed to rigorously apply this method to functional site recognition in three possible ways: first, we search a given functional site on a large set of complete protein structures. Second, a potential functional site on a protein of interest is compared with known binding sites, to recognize similar features. Third, a complete protein structure is searched for the presence of an a priori unknown functional site, similar to known sites.''
[Pennec98]	X. Pennec,, N. Ayache, "A geometric algorithm to find small but highly similar 3D substructures in proteins," Bioinformatics, 14, 1998, pp. 516-522. ``We propose a new 3D substructure matching algorithm based on geometric hashing techniques. The key feature of the method is the introduction of a 3D reference frame attached to each residue.''
[Wolfson97]	H.J. Wolfson, I. Rigoutsos, "Geometric hashing: an overview," IEEE Computational Science \& Engineering, 4, 4, 1997, pp. 10-21.
[Lamdan88]	Y. Lamdan, H. Wolfson, "Geometric hashing: a general and efficient recognition scheme," 2nd International Conference on Computer Vision, Tarpon Springs, FL, 1988, pp. 238-251. This is the main reference for geometric hashing
[Norel93]	D. Fischer, R. Norel, H. Wolfson, R. Nussinov, "Surface motifs by a computer vision technique: Searches, detection, and implications for protein-ligand recognition," Proteins: Structure, Function, and Genetics, 16, 3, 1993, pp. 278-292.

Protein-ligand binding site representations with atoms/residues and matching with combinatorial extension

[Ferre04]

F. Ferre, G. Ausiello, A. Zanzoni, M. Helmer-Citterich,
"SURFACE: a database of protein surface regions for functional annotation,"
Nucleic Acids Res, 32, 2004, pp. D240-D244.

Describes a database of binding sites, each represented by a set of points (two per residue - CA and center of side chain). Matching is performed by a combinatoral expansion algorithm. The database is available at http://cbm.bio.uniroma2.it/surface/.

[Ivanisenko04]

V.A. Ivanisenko, S.S. Pintus, D.A. Grigorovich, N.A. Kolchanov,
"PDBSiteScan: a program for searching for active, binding and posttranslational modification sites in the 3D structures of proteins,"
Nucleic Acids Res, 32, 2004, pp. W549-W554.

Alignment of point sets with combinatorial extension (like CE).

Protein-ligand binding site representations with atoms/residues and matching with ???

[Kobayashi97]

N. Kobayashi,, N. Go,
"A method to search for similar protein local structures at ligand binding sites and its application to adenine recognition,"
Eur Biophys J, 26, 1997, pp. 135-144.

Utilizes bound ligand to define region of interest and align. ``We have developed a method of searching for similar spatial arrangements of atoms around a given chemical moiety in proteins that bind a common ligand. The first step in this method is to consider a set of atoms that closely surround a given chemical moiety. Then, to compare the spatial arrangements of such surrounding atoms in different proteins, they are translated and rotated so that the chemical moieties are superposed on each other. Spatial arrangements of surrounding atoms in a pair of proteins are judged to be similar, when there are many corresponding atoms occupying similar spatial positions.''

Protein-ligand binding site representations with templates

[Jones04]	S. Jones, J.M. Thornton, "Searching for functional sites in protein structures," Curr Opin Chem Biol, 8, 2004, pp. 3-7. Contains brief overview of template-based methods
[Torrance05]	J.W. Torrance, G.J. Bartlett, C.T. Porter, J.M. Thornton, "Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families," J Mol Biol, 347, 2005, pp. 565-581. Watson05: The authors present a library of catalytic site structural templates based on information from the scientific literature. In an extension of previous work, a new web server is released that allows users to search the CSA using the JESS algorithm. The user can investigate a specific PDB code or submit a three-dimensional protein structure for analysis. (http://www.ebi.ac.uk/thornton-srv/databases/CSS).
[Porter04]	C.T. Porter, G.J. Bartlett, J.M. Thornton, "The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data," Nucleic Acids Res, 32, 2004, pp. D129-133. This is the main reference for the Catalytic Site Atlas (CSA) (http://www.ebi.acu.k/thornton-srv/databases/CSA/index.html), a database of catalytic sites as determined by scanning the literature. A case is made that the SITE records of PDB files are used in an inconsistent fashion. So, the authors have gone through the literature and identified the catalytic residues for 177 proteins (as of the time of writing). They have also transfered annotations to 2608 proteins homologous to the originals. The paper does not demonstrate applications enabled by the database.
[Stark03a]	A. Stark,, R.B. Russell, "Annotation in three dimensions PINTS: Patterns in Non-homologous Tertiary Structures," Nucleic Acids Res, 31, 2003, pp. 3341-3344. This is the main reference for PINTS.
[Stark03b]	A. Stark, S. Sunyaev, R.B. Russell, "A model for statistical significance of local similarities in structure," J Mol, Biol, 2003, pp. . This provides analysis for the statistical significance of matches based on the RMSD between atom pairs. It is applied to evaluate matches for protein alignments.
[Stark04]	A. Stark, A. Shkumatov, R.B. Russell, "Finding functional sites in structural genomics proteins," Structure, 12, 2004, pp. 1405-1412. Watson05: The authors report the use of fold similarity and template methods to identify functional sites in a selection of proteins solved by structural genomics projects. The authors compare their method (PINTS) with two other template-based methods, PROCAT and RIGOR
[Pazos04]	F. Pazos, M.L.E. Sternberg, "Automated prediction of protein function and detection of functional sites from structure," Proc Natl Acad Sci USA, 101, 2004, pp. 14754-14759. Watson05: The authors describe Phunctioner, a method for the automatic prediction of function. An initial structural alignment is split into functionally specific subalignments using GO annotation. The conserved residues in each subalignment are interpreted as functionally important residues and are used to construct PSSMs for scanning against a query sequence to find the best-fitting functional match. An additional benefit is that the method can identify functionally important residues for GO terms for which no such information is currently known.
[Laskowski05a]	R.A. Laskowski, J.D. Watson, J.M. Thornton, "Protein function prediction using local 3D templates," J. Mol. Biol., 351, 2005, pp. 614-626.
[Preissner98]	R. Preissner, A. Goede, C. Frommel, "Dictionary of interfaces in proteins (DIP) Data bank of complementary molecular surface patches," J Mol Biol, 280, 1998, pp. 535-550. ``We defined interfaces as pairs of matching molecular surface patches between neighboring secondary structural elements. All such interfaces from known protein structures were collected in a comprehensive data bank of interfaces in proteins (DIP).The up-to-date DIP contains interface files for 351 selected Brookhaven Protein Data Bank entries with a total of about 160,000 surface elements formed by 12,475 secondary structures. ... The existing retrieval system for the DIP allows selection (out of the set of molecular patches) according to different criteria, such as geometric features, atomic composition, type of secondary structure, contacts, etc. A fast, sequence-independent 3-D superposition procedure was developed for automatic searches for geometrically similar surface areas. Using this procedure, we found a large number of structurally similar interfaces of up to 30 atoms in completely unrelated protein structures.''
[Frommel03]	C. Frommel, C. Gille, A. Goede, C. Gropl, S. Hougardy, T. Nierhoff, R. Preissner, M. Thimm, "Accelerating screening of 3D protein data with a graph theoretical approach," Bioinformatics, 19, 2003, pp. 2442-2447. ``The Dictionary of Interfaces in Proteins (DIP) is a database collecting the 3D structure of interacting parts of proteins that are called patches. It serves as a repository, in which patches similar to given query patches can be found. In this work we address the question of how the patches similar to a given query can be identified by scanning only a small part of DIP. The answer to this question requires the investigation of the distribution of the similarity of patches.''
[Kleywegt99]	GJ. Kleywegt, "Recognition of spatial motifs in protein structures," J Mol Biol, 285, 1999, pp. 1887-1897. This paper describes two programs: SPASM and RIGOR. SPASM matches a single structural motif (spatial arrangement of points) to a database of proteins, while RIGOR matches a single protein to many structural motifs. Each residue is represented by its CA atom and/or the centroid of its side chain. Exhaustive enumeration of possible point correspondences are enumerated exhaustively, considering points for correspondence when their residue types are within some threshold in a substitution matrix. Constraints may also be added that matching residues be in the same order in the sequence, separated by the same size gaps in the sequences, etc. For every possible set of correspondences, the point sets are superposed, and the RMSD is checked to see if it is below a threshold. Structural motifs are constructed in three ways: 1) manually, 2) all sets of residues in spatial proximity that contain only hydrophobic, only polar and charged, or mixed hydrophobic and polar/charged residues, and 3) sets of residues that all contact a single hetero-compound. Applications are shown for a few cases of main-chain recognition, active-site recongition, and metal-binding site recognition. (http://alpha2.bmc.uu.se/usf)."
[Singh03]	R. Singh, M. Saha, "Identifying structural motifs in proteins," Pacific Symposium on Biocomputing, 8, 2003, pp. 228-239.
[Masden02]	D. Masden, J. Kleywegt, "Interactive motif and fold recognition in protein structures," J. Appl. Cryst., 35, 2002, pp. 137-139.
[Dawe03]	J.H. Dawe, C.T. Porter, J.M. Thornton, A.B. Tabor, "A template search reveals mechanistic similarities and differences in -ketoacyl synthases (KAS) and related enzymes," Proteins, 52, 2003, pp. 427-435.
[Hamelryck03]	T. Hamelryck, "Efficient identification of side-chain patterns using a multidimensional index tree," Proteins, 51, 1, 2003, pp. 96-108.
[Russell98]	RB. Russell, "Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution," J Mol Biol, 279, 1998, pp. 1211-1227.
[Barker03]	J.A. Barker,, J.M. Thornton, "An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis," Bioinformatics, 19, 2003, pp. 1644-1649. This is the main reference for JESS, an improved version of TESS. It includes an empirical measure of statistical significance for every match.
[Wallace97]	A.C. Wallace, N. Borkakoti, J.M. Thornton, "TESS: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites," Prot. Sci, 6, 11, 1997, pp. 2308-2323. This is the main reference for TESS, the most commonly cited template method for representing binding sites. Templates are spatial arrangements of attributed points typical of a particular binding site. In this paper, they seem to be constructed manually with some knowledge of the binding specificity of particular enzyme classes. The paper describes a geometric hashing strategy for searching a protein for matches to a given template - where a coordinate frame is considered for each residue (rather than for every possible triple) with axes chosen in a manner specific to every amino acid. The method is shown for proteins having HIS-based catalytic triads, ribonucleases, and lysosomes.
[Wallace96]	A.C. Wallace, R.A. Laskowski, J.M. Thornton, "Derivation of 3D coordinate templates for searching structural databases: Application to Ser-His-Asp catalytic triads in the serine proteinases and lipases," Protein Science, 5, 1996, pp. 1001-1013. This paper is a precursor to Wallace97.
[Jonassen02]	I. Jonassen, I. Eidhammer, D. Conklin, W.R. Taylor, "Structure motif discovery and mining the PDB," Bioinformatics, 18, 2002, pp. 362-367.
[Jonassen99]	I. Jonassen, I. Eidhammer, W.R. Taylor, "Discovery of local packing motifs in protein structures," Proteins, 34, 1999, pp. 206-219.
[Bradley02]	P. Bradley, P.S. Kim, B. Berger, "TRILOGY: Discovery of sequence-structure patterns across diverse proteins," Proc Natl Acad Sci U S A, 99, 2002, pp. 8500-8505.
[Oldfield02]	T.J. Oldfield, "Data mining the protein data bank residue interactions," Proteins, 49, 2002, pp. 510-528.
[Binkowski03a]	T.A. Binkowski, P. Freeman, J. Liang, "pvSoar," http://pvsoar.bioengr.uic.edu, 2003.
[Binkowski03c]	T.A. Binkowski, L. Adamian, J. Liang, "Inferring functional relationships of proteins from local sequence and spatial surface patterns," J Mol Biol, 332, 2003, pp. 505-526.
[Binkowski04]	T.A. Binkowski, P. Freeman, J. Liang, "pvSOAR: detecting similar surface patterns of pocket and void surfaces of amino acid residues on proteins," Nucleic Acids Res, 32, 2004, pp. W555-W558. Watson05: The pvSOAR web server identifies similar surface regions among proteins, and takes advantage of the CASTp database of pockets and cavities. The authors describe how the server can be used to predict the function of hypothetical proteins, illustrating this with the E coli BioH protein (PDB code 1m33) as an example
[Hamelryck03]	T. Hamelryck, "Efficient identification of side-chain patterns using a multidimensional index tree," Proteins, 51, 1, 2003, pp. 96-108.

Protein-ligand binding site representations with surfaces and matching with association graphs

[Hofbauer04]	C. Hofbauer, H. Lohninger, A. Aszodi, "SURFCOMP: A Novel Graph-Based Approach to Molecular Surface Comparison," J. Chem. Inf. Comput. Sci., 44, 2004, pp. 837-847. This paper presents ``an approach that uses maximal common subgraph comparison and harmonic shape image matching to detect locally similar regions between two molecular surfaces augmented with properties such as the electrostatic potential or lipophilicity.'' The complexity of the problem is reduced by a set of filters that eliminate potential correpsondences between vertices with different intramolecular distances, different electrostatic potentials, different lipophilic potentials, different principal curvatures on the Connolly surfaces, different harmonic shape images for their neighborhoods on the Connolly surfaces, and different orientations of the aligned harmonic shape images with respect to the line segment between any pair of points. ``The approach was tested on dihydrofolate reductase and thermolysin inhibitors and was shown to recover the correct alignments of the compounds bound in the active sites.''
[Kinoshita05]	K. Kinoshita,, H. Nakamura, "Identification of the ligand binding sites on the molecular surface of proteins," Protein Science, 14, 17, 2005, pp. 711-718. This paper contains results of an experiment for comparison of a few binding site surfaces to a large database (almost all binding site surfaces in the PDB) using the method described in [Kinoshita03]. Since partial surfaces are matched, a similarity scoring method is introduced that considers both a normalized score for the match of the geometry and electrostatics (Z-score) and the ``coverage'' of the match (fractions of the surfaces found to be in correspondence). Results are presented for 18 hypothetical proteins.
[Kinoshita04]	K. Kinoshita, H. Nakamura, "eF-site and PDBViewer: database and viewer for protein functional sites," Bioinformatics, 20, 2004, pp. 1329-1330. http://ef-site.hgc.jp/eF-site/
[Kinoshita03]	K. Kinoshita,, H. Nakamura, "Identification of protein biochemical functions by similarity search using the molecular surface database eF-site," Protein Science, 12, 2003, pp. 1589-1595. This paper describes matching of surfaces stored in the eF-site database of binding sites. Each binding site is represented by a mesh with electrostatic potential and the 2 principal curvatures at every vertex. The meshes are matched using association graphs, where the electrostatic potentials and principal curvatures have to match within some threshold, as well as the intramolecular distances. No reduction of the point set is performed (e.g., using critical points). Results are shown for matching examples of two SCOP folds and for predicting the biochemical function of one hypothetical protein.
[Kinoshita02]	K. Kinoshita, J. Furui, H. Nakamura, "Identification of Protein Functions from a Molecular Surface Database, eF-site," J. Struct. Func. Genomics, 2, 1, 2002, pp. 9-Binding.

Protein-ligand binding site representations with surfaces and matching with geometric hashing

[Lin94]	S.L. Lin, R. Nussinov, D. Fischer, H.J. Wolfson, "Molecular-Surface Representations By Sparse Critical-Points," Proteins-Structure Function and Genetics, 18, 1994, pp. 94-101. This paper describes a surface representation consisting of ``a limited number of critical points disposed at key locations over the surface. These points adequately represent the shape and the important characteristics of the surface, despite the fact that they are modest in number.'' Using this representation, they investigate protein-protein and protein-small molecule docking.
[Rosen98]	M. Rosen, S.L. Lin, H. Wolfson, R. Nussinov, "Molecular shape comparisons in searches for active sites and functional similarity," Protein Engineering, 11, 1998, pp. 263-277. This paper uses geometric hashing to examine ``the reliability of surface comparisons in searches for active sites in proteins. Specifically, we compare the efficacy of molecular surface comparisons with comparisons of surface atoms and of C(alpha) backbone atoms. We further investigate comparisons of specific atoms, belonging to a predefined pattern of catalytic residues versus comparisons of molecular surfaces and, separately, of surface atoms. We also explore active site comparisons versus comparisons in which the entire molecular surfaces are scanned. While here we focus on the geometrical aspect of the problem, we also investigate the effect of adding residue labels in these comparisons. Our extensive studies cover the serine proteases, containing the highly conserved triad motif, and the chorismate mutases. Our results show that molecular surface comparisons work best when the similarity is high. As the similarity deteriorates, the number of potential solutions increases rapidly, making their ranking difficult, particularly when scanning entire molecular surfaces. Utilizing atomic coordinates directly appears more adequate under such circumstances.''
[Fischer93]	D. Fischer, R. Norel, H. Wolfson, R. Nussinov, "Surface motifs by a computer vision technique: searches, detection, and implications for protein-ligand recognition," Proteins, 16, 1993, pp. 278-292. This paper uses geometric hashing to perform ``4 types of comparisons between pairs of molecules: (1) comparison of the backbones of two protein domains; (2) search for a predefined 3-D C alpha motif within the full backbone of a domain; and in particular, (3) comparison of the surfaces of two receptor proteins; and (4) comparison of the surface of a receptor to the surface of a ligand. ... Searches for 3-D surface motifs can be carried out on either receptors or on ligands.''
[Bachar93]	O. Bachar, D. Fischer, R. Nussinov, H. Wolfson, "A Computer Vision-Based Technique For 3-D Sequence-Independent Structural Comparison Of Proteins," Protein Engineering, 6, 1993, pp. 279-288. Uses geometric hashing

Protein-ligand binding site representations with surfaces and matching with genetic algorithms

[Poirrette97]

A.R. Poirrette, P.J. Artymiuk, D.W. Rice, P. Willett,
"Comparison of protein surfaces using a genetic algorithm,"
Journal of Computer-Aided Molecular Design, 11, 1997, pp. 557-569.

``A genetic algorithm (GA) is described which is used to compare the solvent-accessible surfaces of two proteins or fragments of proteins, represented by a dot surface calculated using the Connolly algorithm. The GA is used to move one surface relative to the other to locate the most similar surface region between the two. The matching process is enhanced by the use of the surface normals and shape terms provided by the Connolly program and also by a simple hydrogen-bonding descriptor and an additional shape descriptor. The algorithm has been tested in applications ranging from the comparison of small surface patches to the comparison of whole protein surfaces. Examples of the matches are given and a quantitative analysis of the quality of the matches is performed.''

Protein-ligand binding site representations with radial extents and matching with spherical harmonic surfaces

[Kahraman07]	A. Kahraman, R.J. Morris, R. Laskowski, J.M. Thornton, "Shape Variation in Protein Binding Pockets and their Ligands," J. Mol. Biol., 368, 2007, pp. 283-301. A common assumption about the shape of protein binding
[Morris05a]	R.J. Morris, R.J. Najmanovich, A. Kahraman, J.M. Thornton, "Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons," Bioinformatics, 21, 10, 2005, pp. 2347-2355. This paper ``uses the coefficients of a real spherical harmonics expansion to describe the shape of a protein's binding pocket.'' Binding sites are represented by the radial extent of surfnet spheres within 3.5 angstroms of a conserved residue. The resulting spherical functions are aligned with PCA, only the highest order spherical harmonic coefficients are retained, and shape similarity is computed as the L2 distance between corresponding spherical harmonic coefficients.
[Morris05b]	R.J. Morris, A. Kahraman, T. Funkhouser, R. Najmanovich, G. Stockwell, F. Glaser, R. Laskowski, J.M. Thornton, "Binding Pocket Shape Analysis for Protein Function Prediction," LASR Workshop on Quantitative Biology, Shape Analysis, and Wavelets, Leeds England, 2005.
[Cai02]	W. Cai, X. Shao, B. Maigret, "Protein-ligand recognition using spherical harmonic molecular surfaces: towards a fast and efficient filter for large virtual throughput screening," Journal of Molecular Graphics and Modeling, 20, 2002, pp. 313-328. ``In this paper, we present an extension of our work to spherical harmonic surfaces in order to approximate molecular surfaces of both ligands and receptor-cavities and to easily check the surface-shape complementarity. The method consists of (1) finding lobes and holes on both ligand and cavity surfaces using contour maps of radius functions with spherical harmonic expansions, (2) superposing the surfaces around a given binding site by minimizing the distance between their respective expansion coefficients. This docking procedure capabilities was demonstrated by application to 35 protein-ligand complexes of known crystal structures.''
[Cai98]	W. Cai, M. Zhang, B. Maigret, "New approach for representation of molecular surface," J. Comput. Chem, 19, 1998, pp. 1805-1815.
[Ritchie99]	D.W. Ritchie, G.J.L. Kemp, "Fast computation, rotation, and comparison of low resolution spherical harmonic molecular surfaces," J. Comput. Chem, 20, 1999, pp. 383-395. Describes Fourier search algorithm for optimal rotational alignment
[Duncan93a]	B.S. Duncan, A.J. Olson, "Shape analysis of molecular surfaces," Biopolymers, 33, 1993, pp. 231-238.
[Duncan93b]	B.S. Duncan, A.J. Olson, "Approximation and characterization of molecular surfaces," Biopolymers, 33, 1993, pp. 219-229.
[Leicester88]	S. Leicester, J.L. Finney, R.P. Bywater, "Description of molecular surface shape using Fourier descriptors," J. Mol. Graph, 6, 1988, pp. 104-108.
[Max88]	N.L. Max, E.D. Getzoff, "Spherical harmonic molecular surfaces," IEEE Comput. Graph. Appl, 8, 1988, pp. 42-50.

Protein-ligand binding site representations and matching with other methods

[Goldman00]	B.B. Goldman, W.T. Wipke, "Quadratic Shape Descriptors 1. Rapid Superposition of Dissimilar Molecules Using Geometrically Invariant Surface Descriptors," J. Chem. Inf. Model., 40:, 3, 2000, pp. 644-658.
[Funkhouser05b]	T. Funkhouser, F. Glaser, R. Laskowski, R. Morris, R. Najmanovich, G. Stockwell, J.Thornton, "Shape-Based Classification of Bound Ligands," LASR Workshop on Quantitative Biology, Shape Analysis, and Wavelets, Leeds England, 2005.
[Exner02a]	T.E. Exner, M. Keil, J. Brickmann, "Pattern recognition strategies for molecular surfaces. I. Pattern generation using fuzzy set theory," Journal of Computational Chemistry, 23, 2002, pp. 1176-1187.
[Exner02b]	T.E. Exner, M. Keil, J. Brickmann, "Pattern recognition strategies for molecular surfaces. II. Surface complementarity," Journal of Computational Chemistry, 23, 2002, pp. 1188-1197. ``Fuzzy logic based algorithms for the quantitative treatment of complementarity of molecular surfaces are presented. ... The algorithms are applied to 33 biomolecular complexes. ... After the optimization with a downhill simplex method, for all these complexes one structure was found, which is in very good agreement with the experimental results.''
[Gu03]	X. Gu, S.-T. Yau, "Surface Classification Using Conformal Structures," Ninth IEEE International Conference on Computer Vision (ICCV'03), 1, 2003, pp. 701. This paper provides a way to map a surface from 3D to 2D (flatten it) while retaining the angles between edges of the mesh as best as possible (a conformal map). The surfaces are compared/classified in the 2D domain.

Protein-ligand binding site representations with alpha-shapes

[Liang98a]	J. Liang, H. Edelsbrunner, P. Fu, P.V. Sudhakar, S. Subramaniam, "Analytical shape computing of macromolecules I: molecular area and volume through alpha shape," Proteins, 33, 1998, pp. 1-17.
[Liang98b]	J. Liang, H. Edelsbrunner, P. Fu, P.V. Sudhakar, S. Subramaniam, "Analytical shape computing of macromolecules II: identification and computation of inaccessible cavities inside proteins," Proteins, 33, 1998, pp. 18-29.
[Liang98c]	J. Liang, H. Edelsbrunner,, C. Woodward, "Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design," Protein Science, 7, 1998, pp. 1884-1897.
[Edelsbrunner98]	H. Edelsbrunner, M. Facello, J. Liang, "On the definition and the construction of pockets in macromolecules," Disc. Appl. Math, 88, 1998, pp. 83-102.
[Edelsbrunner95]	H. Edelsbrunner, M. Facello, R. Fu, J. Liang, "Measuring Proteins and Voids in Proteins," Proceedings of the 28th Annual Hawaii International Conference on Systems Science, 1995, pp. 256-264.
[Binkowski03b]	T.A. Binkowski, S. Naghibzadeh, J. Liang, "CASTp: Computed Atlas of Surface Topography of proteins," Nucleic Acids Res, 31, 2003, pp. 3352-3355.

Protein-ligand binding site representation with grids and matching with correlation

[Katchalski-Katzir92]

E. Katchalski-Katzir, I. Shariv, M. Eisenstein, A.A. Friesem, C. Aflalo, I.A. Vakser,
"Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques,"
Proc. Natl. Acad. Sci. U.S.A, 89, 1992, pp. 2195-2199.

Rasterizes molecules into grid. Discretely samples rotations. Uses correlation in Fourier domain to search for best translation.

Representation of binding sites with flexible structures

[Pitman01]

M.C. Pitman, W.K. Huber, H. Horn, A. Kramer, J.E. Rice, W.C. Swope,
"FLASHFLOOD: A 3D field-based similarity search and alignment method for flexible molecules,"
J Comput Aided Mol Des, 15, 7, 2001, pp. 587-612.

This is Wolfgang's paper

Protein-ligand binding site mapping with probes

[Goodford85]	P.J. Goodford, "A computational procedure for determining energetically favorable binding sites on biologically important macromolecules," J. Med. Chem., 28, 1985, pp. 849-857. Uses GRID
[Kastenholz00]	M.A. Kastenholz, M. Pastor, G. Cruciani, E.E. Haaksma, T. Fox, "GRID/CPCA: a new computational tool to design selective ligands," J Med Chem, 43, 2000, pp. 3033-3044. Uses GRID to understand similarities/differences between binding sites
[Reynolds89]	C.A. Reynolds, R.C. Wade, P.J. Goodford, "Identifying targets for bioreductive agents: using GRID to predict selective binding regions of proteins," J Mol Graph., 7, 2, 1989, pp. 103-108.
[Ruppert97]	J. Ruppert, W. Welch, A. Jain, "Automatic identification and representation of protein binding sites for molecular docking," Protein Science, 6, 1997, pp. 524-533. This paper presents an algorithm for representing a protein's binding site in a way that is specifically suited to molecular docking applications. Initially the protein's surface is coated with a collection of molecular fragments that could potentially interact with the protein. Each fragment, or probe, serves as a potential alignment point for atoms in a ligand, and is scored to represent that probe's affinity for the protein. Probes are then clustered by accumulating their affinities, where high affinity clusters are identified as being the ``stickiest'' portions of the protein surface. The stickiest cluster is used as a computational binding ``pocket'' for docking.
[Pastor97]	Manuel Pastor, Gabriele Cruciani, Kimberly A. Watson, "A Strategy for the Incorporation of Water Molecules Present in a Ligand Binding Site into a Three-Dimensional Quantitative Structure-Activity Relationship Analysis," J. Med. Chem, 40, 25, 1997, pp. 4089-4102. Uses GRID descriptors input in statistical procedures like CoMFA, GOLPE or SIMCA for QSAR or 3D-QSAR analyses
[GRID]	Molecular Discovery, "GRID," http://www.moldiscovery.com/soft_grid.php, 2005.

Protein-ligand binding site mapping with multiple copy simultaneous search (MCSS)

[Miranker91]	A. Miranker, M. Karplus, "Functionality maps of binding sites: a multiple copy simultaneous search method," Proteins, 11, 1991, pp. 29-34. This is the main reference for the multiple copy simultaneous search (MCSS) method. ``A new method is proposed for determining energetically favorable positions and orientations for functional groups on the surface of proteins with known three-dimensional structure. From 1,000 to 5,000 copies of a functional group are randomly placed in the site and subjected to simultaneous energy minimization and/or quenched molecular dynamics. The resulting functionality maps of a protein receptor site, which can take account of its flexibility, can be used for the analysis of protein ligand interactions and rational drug design. Application of the method to the sialic acid binding site of the influenza coat protein, hemagglutinin, yields functional group minima that correspond with those of the ligand in a cocrystal structure.''
[Kortvelyesi03]	T. Kortvelyesi, M. Silberstein, S. Dennis, S. Vajda, "Improved mapping of protein binding sites," J Comput Aided Mol Des, 17, 2003, pp. 173-186.
[Mattos96]	Carla Mattos, Dagmar Ringe, "Locating and characterizing binding sites on proteins," Nature Biotechnology, 14, 1996, pp. 595-599. ``This review article begins with a discussion of fundamental differences between substrates and inhibitors, and some of the assumptions and goals underlying the design of a new ligand to a target protein. An overview is given of the methods currently used to locate and characterize ligand binding sites on protein surfaces, with focus on a novel approach: multiple solvent crystal structures (MSCS). In this method, the X-ray crystal structure of the target protein is solved in a variety of organic solvents. Each type of solvent molecule serves as a probe for complementary binding sites on the protein. The probe distribution on the protein surface allows the location of binding sites and the characterization of the potential ligand interactions within these sites. General aspects of the application of the MSCS method to porcine pancreatic elastase is discussed, and comparison of the results with those from X-ray crystal structures of elastase/inhibitor complexes is used to illustrate the potential of the method in aiding the process of rational drug design.''
[Stultz99]	C.M. Stultz, Martin Karplus, "MCSS Functionality Maps for a Flexible Protein," Proteins, Structure Function and Gentetics, 37, 1999, pp. 512-529.
[Evensen97]	E. Evensen, D. Joseph-McCarthy, M. Karplus, "MCSS version 2.1," Harvard University, Cambridge, MA USA, 1997.
[Caflisch93]	A. Caflisch, A. Miranker, M. Karplus, "Multiple copy simultaneous search and construction of ligands in binding sites: application to inhibitors of HIV-1 aspartic proteinase," J. Med. Chem., 36, 1993, pp. 2142-2167.

Protein-ligand binding site mapping with knowledge-based algorithms

[Evers03]	A. Evers, H.Gohlke, G. Klebe, "Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials," J. Mol. Biol., 334, 2003, pp. 327-345.
[Sotriffer02a]	C. Sotriffer, G. Klebe, "Identification and mapping of smallmolecule binding sites in proteins: computational tools for structure-based drug design," Farmaco, 57, 2002, pp. 243-251.
[Verdonk01]	M.L. Verdonk, J.C. Cole, P. Watson, V. Gillet, P. Willett, "Superstar: improved knowledge-based interaction fields for protein binding sites," Journal of Molecular Biology, 307, 3, 2001, pp. 841-859.
[Laskowski96]	R.A. Laskowski, J.M. Thornton, C. Humblet, J. Singh, "X-SITE: use of empirically derived atomic packing preferences to identify favourable interaction regions in the binding sites of proteins," J. Mol. Biol, 259, 1996, pp. 175-201. This is the main reference for XSITE

Protein-ligand binding site representations with strings

[Karlin96]

S. Karlin,, Z.Y. Zhu,
"Characterizations of diverse residue clusters in protein three-dimensional structures,"
Proc Natl Acad Sci U S A, 93, 1996, pp. 8344-8349.

Protein-ligand docking overviews

[Halperin02]	I. Halperin, B. Ma, Haim Wolfson, Ruth Nussinov, "Principles of Docking: An Overview of Search Algorithms and a Guide to Scoring Functions," Proteins: Structure, Function, and Genetics, 47, 2002, pp. 409-443.
[Taylor02b]	R.D. Taylor, P.J. Jewsbury, J.W. Essex, "A review of protein-small molecule docking methods," Journal of Computer-Aided Molecular Design, 16, 2002, pp. 151-166.
[Krovat05]	E.M. Krovat, T. Steindl, T. Langer, "Recent Advances in Docking and Scoring," Current Computer-Aided Drug Design, 1, 1, 2005, pp. 93-102.
[Brooijmans03]	N. Brooijmans, I.D. Kuntz, "Molecular recognition and docking algorithms," Annu Rev Biophys Biomol Struct, 32, 2003, pp. 335-373.
[Kroemer03]	R.T. Kroemer, "Molecular modelling probes: docking and scoring," Biochemical Society Transactions, 31, 5, 2003, pp. 980-984.
[Kitchen04]	D.B. Kitchen, H. Decornez, J.R. Furr, J.D.B. Bajorath, "Docking and scoring in virtual screening for drug discovery: methods and applications," Nature Rev. Drug Discov., 3, 11, 2004, pp. 935-949.

Protein-ligand docking with Monte Carlo simulated annealing

[Friesner04]	R.A. Friesner, J.L. Banks, R.B. Murphy, T.A. Halgren, J.J. Klicic, D.T. Mainz, M.P. Repasky, E.H. Knoll, M. Shelley, J.K. Perry, D.E. Shaw, P. Francis, P.S. Shenkin, "Glide: A New Approach for Rapid, Accurate Docking and Scoring.," J. Med. Chem, 47, 2004, pp. 1739-1749. This is the main reference for GLIDE
[Liu99]	M. Liu, S. Wang, "MCDOCK: A Monte Carlo simulation approach to the molecular docking problem," Journal of Computer-Aided Molecular Design, 13, 5, 1999, pp. 435-451. This is the main reference for MCDock.
[Goodsell90]	D.S. Goodsell, A.J. Olson, "Automated Docking of Substrates to Proteins by Simulated Annealing," Proteins: Str. Func. and Genet., 8, 1990, pp. 195-202. This is the main reference for AutoDock 1.0.
[Mcmartin97]	C. Mcmartin, R.S. Bohacek, "QXP: Powerful, rapid computer algorithms for structure-based drug design," Journal of Computer-Aided Molecular Design, 11, 4, 1997, pp. 333-344. This is the main reference for QXP.

Protein-ligand docking with genetic algorithms

[Jones97]	S. Jones, J.M. Thornton, "Analysis of protein-protein interaction sites using surface patches," Journal of Molecular Biology, 272, 1, 1997, pp. 121-132. This is the main reference for GOLD.
[Verdonk03]	M.L. Verdonk, J.C. Cole, M.J. Hartshorn, C.W. Murray, R. D. Taylor, "Improved Protein-Ligand Docking Using GOLD," Proteins, 52, 2003, pp. 609-623. This is a more recent paper about GOLD.
[Morris98]	G.M. Morris, D.S. Goodsell, R.S. Halliday, R. Huey, W.E. Hart, R.K. Belew, A.J. Olson, "Automated Docking Using a Lamarckian Genetic Algorithm and and Empirical Binding Free Energy Function," J. Computational Chemistry, 19, 1998, pp. 1639-1662. This is the main reference for AutoDock 3.0 (http://www.scripps.edu/mb/olson/dock/autodock/).
[Oshiro95]	C.M. Oshiro, I.D. Kuntz, "Flexible ligand docking using a genetic algorithm," J. Comput-Aided Mol. Design, 9, 1995, pp. 113-130. DOCK
[Yang04]	J.M. Yang, C.C. Chen, "GEMDOCK: A generic evolutionary method for molecular docking," Proteins: Structure, Function, and Bioinformatics, 2004, pp. 288-304. This is the main reference for GemDock.

Protein-ligand docking with incremental construction

[Rarey96]	M. Rarey, B. Kramer, T. Lengauer, G. Klebe, "A Fast Flexible Docking Method using an Incremental Construction Algorithm," Journal of Molecular Biology, 261, 3, 1996, pp. 470-489. This is the main reference for FlexX (http://www.biosolveit.de/FlexX/).
[Zavodsky02]	M.I. Zavodszky, P.C. Sanschagrin, R.S. Korde,, L.A. Kuhn, "Distilling the essential features of a protein surface for improving protein-ligand docking, scoring, and virtual screening," J. Comput. Aided Mol. Des., 16, 2002, pp. 883-902. This is the main reference for SLIDE.
[Jain03]	A.N. Jain, "Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine," J Med Chem, 46, 2003, pp. 499-511. This is the main reference for Surflex.

Protein-ligand docking with systematic search

[FRED]

Open Eye Scientific Software,
"FRED: Fast Rigid Exhaustive Docking,"
http://www.eyesopen.com/docs/html/fred/, 2005.

This is the main reference for FRED, which docks ligands in proteins using precomputed ligand conformations and systematic search over translations and rotations.

Protein-ligand docking with tabu search methods

[Baxter97]

C.A. Baxter, C.W. Murray, D.E. Clark, D.R. Westhead, M.D. Eldridge,
"Flexible docking using TABU search and an empirical estimate of binding affinity,"
Proteins, 33, 1997, pp. 367-382.

Protein-ligand docking with multiconformers

[McGann03]	M. McGann, H. Almond, A. Nicholls, J.A. Grant, F. Brown, "Gaussian Docking Functions," Biopolymers, 68, 2003, pp. 76-90. FRED
[Choi05]	V. Choi, "Yucca: An Efficient Algorithm for Small Molecule Docking," Algorithms in Molecular Biology (AlgBio2005), 2005, pp. to appear. This is the main reference for Yucca.

Protein-ligand docking by consensus

[Paul02]

N. Paul, D. Rognan,
"ConsDock: A new program for the consensus analysis of protein-ligand interactions,"
Proteins, 47, 4, 2002, pp. 521-533.

Protein-ligand docking with ...

[Kuntz82]	I.D. Kuntz, J.M. Blaney, S.J. Oatley, R. Langridge, T.E. Ferrin, "A geometric approach to macromolecule-ligand interactions," J. Mol. Biol, 161, 1982, pp. 269-288. This is the main reference for the first docking program (Dock 1.0).
[Jackson02]	R.M. Jackson, "Q-fit: a probabilistic method for docking molecular fragments by sampling low energy conformational space," J Comput Aided Mol Des, 16, 2002, pp. 43-57.
[Welch96]	W. Welch, J. Ruppert, A.N. Jain, "Hammerhead: fast, fully automated docking of flexible ligands to protein binding sites," Chemistry \& Biology, 3, 6, 1996, pp. 449-462. This is the main reference for Hammerhead.
[Abagyan94]	R. Abagyan, M. Totrov, D. Kuznetsov, "ICM - A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation," Journal of Computational Chemistry, 15, 5, 1994, pp. 488-506. This is the main reference for ICM
[Schoichet92]	B.K. Shoichet, D.L. Bodian, I.D. Kuntz, "Molecular docking using shape descriptors," J. Comp. Chem., 13, 3, 1992, pp. 380-397. DOCK
[Meng92]	E.C. Meng, B.K. Shoichet, I.D. Kuntz, "Automated docking with grid-based energy evaluation," J. Comp. Chem., 13, 1992, pp. 505-524. DOCK (http://www.cmpharm.ucsf.edu/kuntz/dockinfo.html)
[Meng93]	E.C. Meng, D.A. Gschwend, J.M. Blaney, I.D. Kuntz, "Orientational sampling and rigid-body minimization in molecular docking," Proteins, 17, 3, 1993, pp. 266-278. DOCK
[Gschwend96]	D.A. Gschwend, I.D. Kuntz, "Orientational sampling and rigid-body minimization in molecular docking, revisited: On-the-fly optimization and degeneracy removal," J. Comput-Aided Mol. Design, 1996. DOCK
[Shoichet93]	B.K. Shoichet, I.D. Kuntz, "Matching chemistry and shape in molecular docking," Protein Engineering, 6, 1993, pp. 223-232. DOCK
[Ewing01]	T.J.A. Ewing, S. Makino, A.G. Skillman, I.D. Kuntz, "Dock 4.0: Search strategies for automated molecular docking of flexible molecule databases," J. Comp. Aided Mol. Design, 15, 2001, pp. 411-428. This is the main reference for Dock 4.0.
[Roche01]	O. Roche, R. Kiyama, C.L. Brooks, III, "Ligand-protein database: Linking protein-ligand complex structures to binding data," J. Med. Chem., 44, 2001, pp. 3592-3598.
[Marai04]	C. Marai, "Accommodating Protein Flexibility in Computational Drug Design," Mol Pharmacol, 57, 2, 2004, pp. 213-218.

Protein-ligand docking evaluations

[Kellenberger04]	E. Kellenberger, J. Rodrigo, P. Muller, D. Rognan, "Comparative evaluation of eight docking tools for docking and virtual screening accuracy," Proteins, 57, 2, 2004, pp. 225-242.
[Kontoyianni04a]	M. Kontoyianni, L.M. McClellan et al., "Evaluation of Docking Performance: Comparative Data on Docking Algorithms," J Med Chem, 47, 3, 2004, pp. 558-565.
[Kontoyianni04b]	M. Kontoyianni, G.S. Sokol, L.M.McClellan, "Evaluation of library ranking efficacy in virtual screening," Journal of Computational Chemistry, 26, 1, 2004, pp. 11-22.
[Perola04]	E. Perola, W.P. Walters, P.S. Charifson, "A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance," Proteins, 56, 2, 2004, pp. 235-249.
[Warren05]	G.L. Warren, C.W. Andrews, A.M. Capelli, B. Clarke, J. LaLonde, M.H. Lambert, M. Lindvall, N. Nevins, S.F. Semus, S. Senger, G. Tedesco, I.D. Wall, J.M. Woolven, C.E. Peishoff, M.S. Head, "A critical assessment of docking programs and scoring functions," J. Med. Chem., ASAP Article 10.1021/jm050362n, 2005.
[Erickson04]	J.A. Erickson, M. Jalaie, D.H. Robertson, R.A. Lewis, M. Vieth, "Lessons in Molecular Recognition: The Effects of Ligand and Protein Flexibility on Molecular Docking Accuracy," J. Med. Chem., 47, 1, 2004, pp. 45 -55.
[Zavodszky05]	M.I. Zavodszky, L. Kuhn, "Lessons from Docking Validation," submitted for publication, 2005.
[Bursulaya03]	B.D. Bursulaya, M. Totrov, R. Abagyan, C.L. Brooks, III, "Comparative study of several algorithms for flexible ligand docking," J Comput. Aided Mol. Des, 17, 2003, pp. 755-763.
[Nissink02]	J.W.M. Nissink, C. Murray, M. Hartshorn, M.L. Verdonk, J.C. Cole, R. Taylor, "A new test set for validating predictions of protein-ligand interaction," Proteins, 49, 4, 2002, pp. 457-471.
[Bissantz00]	C. Bissantz, G. Folkers, D. Rognan, "Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations," J. Med. Chem., 43, 2000, pp. 4759-4767.
[Perez01]	C. Perez, A.R. Ortiz, "Evaluation of docking functions for protein-ligand docking," J. Med. Chem., 44, 2001, pp. 3768-3785.
[Vieth98a]	M. Vieth, J. Hirst, B.N. Dominy, H. Daigler, C.L. Brooks, III, "Assessing search strategies for flexible docking," J. Comput. Chem., 19, 1998, pp. 1623-1631.
[Ha00]	S. Ha, R. Andreani, A. Robbins, I. Muegge, "Evaluation of docking/scoring approaches: A comparative study based on MMP3 inhibitors," Journal of Computer-Aided Molecular Design, 14, 5, 2000, pp. 435-448.
[Schulz-Gasch03]	T. Schulz-Gasch, M. Stahl, "Binding site characteristics in structure-based virtual screening: evaluation of current docking tools," Journal of Molecular Modeling, 9, 1, 2003, pp. 47-57.
[Merlitz02]	H. Merlitz, W. Wenzel, "Comparison of stochastic optimization methods for receptor-ligand docking," Chemical Physics Letters, 362, 3, 2002, pp. 271-277.
[McConkey02]	B. McConkey, V. Sobolev, M. Edelman, "The performance of current methods in ligand-protein docking," Current Science, 83, 7, 2002, pp. 845-856.
[Cummings05]	M.D. Cummings, R.L. DesJarlais, A.C. Gibbs, V. Mohan, E.P. Jaeger, "Comparison of Automated Docking Programs as Virtual Screening Tools," J. Med. Chem., 48, 2005, pp. 962-976. Related to data set provided by Joe Corkery

Protein-ligand scoring overviews

[Gohlke02]	H.Gohlke, G. Klebe, "Approaches to the description and prediction of the binding affinity of small-molecule ligands to macromolecular receptors," Angew. Chem., Int. Ed., 41, 2002, pp. 2644-2676.
[Buhm02]	H.J. Buhm, M. Stahl, "The use of scoring functions in drug discovery applications," Reviews in Computational Chemistry, 18, Wiley-VCH, New York, 2002, pp. 41-87.
[Tame05]	J. Tame, "Scoring Functions - the First 100 Years," Journal of Computer-Aided Molecular Design, 19, 6, 2005, pp. 445-451.

Protein-ligand scoring force field methods

[Brooks83]	Bernhard R. Brooks, Robert E. Bruccoleri, Barry D. Olafson, David J. States, S. Swaminathan, Martin Karplus, "CHARMM: A program for macromolecular energy, minimization, and dynamics calculations," J. Comp. Chem, 4, 2, 1983, pp. 187-217. This is the main reference for CHARMM.
[Cornell95]	W.D. Cornell, P. Cieplak, C.I. Bayly, I.R. Gould, K.M. Merz, Jr., D.M. Ferguson, D.C. Spellmeyer, T. Fox, J.W. Caldwell, P.A. Kollman, "A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules," Journal of the American Chemical Society, 117, 19, 1995, pp. 5179-5197. This is the main reference for AMBER(?)

Protein-ligand scoring with empirical methods

[Eldridge97]	M.D. Eldridge, C.W. Murray, T.R. Auton, G.V. Paolini, R.P. Mee, "Empirical scoring functions. I: The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes," J. Comput.-Aided Mol. Des., 11, 1997, pp. 425-445. This is the main reference for ChemScore.
[Boehm94]	H.J. Boehm, "The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure," J. Comput.-Aided Mol. Des., 8, 1994, pp. 243-256.
[Buhm98]	H.J. Buhm, "Prediction of binding constants of protein ligands: A fast method for the prioritization of hits obtained from de novo design or 3d database search programs," J. Comput.-Aided Mol. Des., 12, 1998, pp. 309-323.

Protein-ligand scoring with knowledge-based methods

[Gohlke00]	H. Gohlke, M. Hendlich, G. Klebe, "Knowledge-based scoring function to predict protein-ligand interactions," J. Mol. Biol., 295, 2000, pp. 337-356. This is the main reference for DrugScore, a knowledge-based scoring method.
[Mitchell99a]	J.B.O. Mitchell, R. Laskowski, A. Alex, J.M. Thornton, "BLEEP - potential of mean force describing protein-ligand interactions: I. Generating potential," J. Comput. Chem., 20, 11, 1999, pp. 1165-1176. This is the main reference for BLEEP.
[Mitchell99b]	J.B.O. Mitchell, R. Laskowski, A. Alex, J.M. Thornton, "BLEEP - potential of mean force describing protein-ligand interactions: II. Calculation of binding energies and comparison with experimental data," J. Comput. Chem., 20, 11, 1999, pp. 1177-1185.
[Nobeli01]	I. Nobeli, J.B.O. Mitchell, A. Alex, J.M. Thornton, "Evaluation of a Knowledge-Based Potential of Mean Force for Scoring Docked Protein-Ligand Complexes," Journal of Computational Chemistry, 22, 7, 2001, pp. 673-688.
[Muegge99]	I. Muegge, Y.C. Martin, "A general and fast scoring function for protein-ligand interactions: A simplified potential approach," J. Med. Chem., 42, 1999, pp. 791-804.
[Tanaka76]	S. Tanaka, H.A. Scheraga, "Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins," Macromolecules, 9, , pp. 945-950. Early paper on data-driven scoring
[Ge05]	W. Ge, B. Schneider, W.K. Olson, "Knowledge-Based Elastic Potentials for Docking Drugs or Proteins with Nucleic Acids," Biophysical Journal, 88, 2005, pp. 1166-1190.

Protein-ligand scoring by consensus

[Charifson99]	P.S. Charifson, J.J. Corkery, M.A. Murcko, W.P. Walters, "Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins," J. Med. Chem., 42, 1999, pp. 5100-5109.
[Clark02]	R.D. Clark, A. Strizhev, J.M. Leonard, J.F. Blake, J.B. Matthew, "Consensus scoring for ligand/protein interactions," Journal of Molecular Graphics and Modelling, 20, 4, 2002, pp. 281-295.
[Paul02]	N. Paul, D. Rognan, "ConsDock: A new program for the consensus analysis of protein-ligand interactions," Proteins, 47, 4, 2002, pp. 521-533.

Protein-ligand scoring evaluations

[Marsden04]	P.M. Marsden, D. Puvanendrampillai, J.B.O. Mitchell, R.C. Glen, "Predicting protein ligand binding affinities: a low scoring game?," Organic Biomolecular Chemistry, 2, 2004, pp. 3267-3273. Compares binding affinities predicted by several scoring functions to measured values and finds poor correlations.
[Ferrara04]	P. Ferrara, H. Gohlke, D.J. Price, G. Klebe, C.L. Brooks, III, "Assessing Scoring Functions for Protein-Ligand Interactions," J. Med. Chem., 47, 2004, pp. 3032-3047.
[Xing04]	L. Xing, E. Hodgkin, Q. Liu, D. Sedlock, "Evaluation and application of multiple scoring functions for a virtual screening experiment," J Comput. Aided Mol. Des, 18, 2004, pp. 333-344.
[Wang03]	R. Wang, Y. Lu, S. Wang, "Comparative evaluation of 11 scoring functions for molecular docking," J. Med. Chem., 46, 2003, pp. 2287-2303.
[Wei02]	B.Q. Wei, W.A. Baase, L.H. Weaver, B.W. Matthews, B.K. Shoichet, "A model binding site for testing scoring functions in molecular docking," J. Mol. Biol., 322, 2002, pp. 339-355.
[Stahl01]	M. Stahl, M. Rarey, "Detailed analysis of scoring functions for virtual screening," J. Med. Chem., 44, 2001, pp. 1035-1042.
[Vieth98b]	M. Vieth, J. Hirst, A. Kolinski, C.L. Brooks, III, "Assessing energy functions for flexible docking," J. Comput. Chem., 19, 1998, pp. 1612-1622.
[Sotriffer02b]	C.A. Sotriffer, H. Gohlke, G. Klebe, "Docking into knowledge-based potential fields: A comparative evaluation of DrugScore," J. Med. Chem., 45, 2002, pp. 1967-1970.

Protein-protein binding site analysis

[Chakrabarti02]	P. Chakrabarti, J. Janin, "Dissecting protein-protein recognition sites," Proteins: Structure, Function, and Genetics, 47, 3, 2002, pp. 334-343.
[Jones00]	D.T. Jones, "Protein Structure Prediction in the Postgenomic Era," Current Opinion in Structural Biology, 10, 3, 2000, pp. 371-379.
[Bogan98]	A.A. Bogan, K.S. Thorn, "Anatomy of hot spots in protein interfaces," J. Mol. Biol., 280, 1998, pp. 1-9.
[DeLano02]	W.L. DeLano, "Unraveling hot spots in binding interfaces: Progress and challenges," Curr. Opin. Struct. Biol., 12, 2002, pp. 14-20.
[Hu00]	Z. Hu, B. Ma, H. Wolfson, R. Nussinov, "Conservation of polar residues as hot spots at protein-protein interfaces," Proteins, 39, 2000, pp. 331-342.
[Jones97]	S. Jones, J.M. Thornton, "Analysis of protein-protein interaction sites using surface patches," Journal of Molecular Biology, 272, 1, 1997, pp. 121-132. This is the main reference for GOLD.
[Jones96]	S. Jones, J.M. Thornton, "Principles of protein-protein interactions," PNAS, 93, 1, 1996, pp. 13-20.
[LoConte98]	L. Lo Conte, C. Chothia, J. Janin, "The atomic structure of protein-protein recognition sites," J Mol Biol, 285, 1998, pp. 2177-2198.
[Norel99]	R. Norel, D. Petrey, H.J. Wolfson, R. Nussinov, "Examination of shape complementarity in docking of unbound proteins," Proteins, 36, 1999, pp. 307-317.
[Larsen98]	T.A. Larsen, A.J. Olson, D.S. Goodsell, "Morphology of protein-protein interfaces," Structure, 6, 4, 1998, pp. 421-427.

Protein-protein binding site prediction

[Neuvirth04]	H. Neuvirth, R. Raz, G. Schreiber, "ProMate: A structure based prediction program to indentify the location of protein-protein binding sites," J Mol Biol, 338, 2004, pp. 181-199.
[Espadaler05]	J. Espadaler , O. Romero-Isart, R.M. Jackson, B. Oliva, "Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships," Bioinformatics, 21, 2005, pp. 3360-3368.

Protein-protein docking overviews

[Szilagyi05]	A. Szilagyi, V. Grimm, A.K. Arakaki, J. Skolnick, "Prediction of physical protein-protein interactions," Phys. Biol., 2, 2005, pp. S1-S16.
[Salwinski03]	L. Salwinski, D. Eisenberg, "Computational methods of analysis of protein-protein interactions," Curr Opin Struct Biol, 13, 2003, pp. 377-382.
[Valencia02]	A. Valencia, F. Pazos, "Computational methods for the prediction of protein interactions," Curr Opin Struct Biol, 12, 2002, pp. 368-373.
[Smith02]	G.R. Smitth, M.J.E. Sternberg, "Prediction of protein-protein interactions by docking methods," Current Opinion in Structural Biology, 12, 2002, pp. 28-35.

Protein-protein docking methods

[Schueler-Furman05]	O. Schueler-Furman, C. Wang, D. Baker, "Progress in protein-protein docking: Atomic resolution predictions in the CAPRI experiment using RosettaDock with an improved treatment of side-chain flexibility," Proteins: Structure, Function, and Bioinformatics, 60, 2, 2005, pp. 187-194.
[Choi04]	V. Choi, N. Goyal, "A Combinatorial Shape Matching Algorithm for Rigid Protein Docking," The Fifteenth Annual Symposium on Combinatorial Pattern Matching (CPM 2004), LNCS 3109, 2004, pp. 285-296.
[Meyer96]	M. Meyer, P. Wilson, D. Schomburg, "Hydrogen bonding and molecular surface shape complementarity as a basis for protein docking," J. Mol. Biol, 264, 1996, pp. 199-210.
[Sobolev96]	V. Sobolev, R.C. Wade, G. Vrien, M. Edelman, "Molecular docking using surface complementarity," Proteins Struct. Func. Genet, 25, 1996, pp. 120-129.
[Helmer94]	M. Helmer-Citterich, A. Tramontano, "Puzzle: a new method for automated protein docking based on surface shape complementarity," J. Mol. Biol, 235, 1994, pp. 1021-1031.
[Norel95]	R. Norel, S.L. Lin, H.J. Wolfson, R. Nussinov, "Molecular surface complementarity at protein-protein interfaces: the critical role played by surface normals at well placed, sparse, points in docking," J. Mol. Biol, 252, 1995, pp. 263-273.
[Young94]	L. Young, R.L. Jernigan, D.G. Covell, "A role for surface hydrophobicity in protein-protein recognition," Protein Sci, 3, 5, 1994, pp. 717-29.
[Gabb97]	H. Gabb, R. Jackson, M. Sternberg, "Modelling protein docking using shape complementarity, electrostatics, and biochemical information," J. Mol. Bio, 272, 1997, pp. 106-120. ``A protein docking study was performed for two classes of biomolecular complexes: six enzyme/inhibitor and four antibody/antigen. Biomolecular complexes for which crystal structures of both the complexed and uncomplexed proteins are available were used for eight of the ten test systems. Our docking experiments consist of a global search of translational and rotational space followed by refinement of the best predictions. Potential complexes are scored on the basis of shape complementarity and favourable electrostatic interactions using Fourier correlation theory. Since proteins undergo conformational changes upon binding, the scoring function must be sufficiently soft to dock unbound structures successfully. Some degree of surface overlap is tolerated to account for side-chain flexibility. Similarly for electrostatics, the interaction of the dispersed point charges of one protein with the Coulombic field of the other is measured rather than precise atomic interactions. We tested our docking protocol using the native rather than the complexed forms of the proteins to address the more scientifically interesting problem of predictive docking. In all but one of our test cases, correctly docked geometries (interface Calpha RMS deviation

Protein-protein docking evaluations

[Janin05]	J. Janin, "Assessing predictions of protein-protein interaction: The CAPRI experiment," Protein Science, 14, 2005, pp. 278-283.
[Wodak04]	S.J. Wodak, R. Mendez, "Prediction of protein-protein interactions: the CAPRI experiment, its evaluation and implications," Current Opinion in Structural Biology, 14, 2004, pp. 242-249. Provides an overview of the status of methods for predicting protein-protein interactions
[Janin03]	J. Janin, K. Henrick, J. Moult, L.T. Eyck, M.J.E. Sternberg, S. Vajda, I. Vakser, S.J. Wodak, "CAPRI: A Critical Assessment of PRedicted Interactions," Proteins: Structure, Function, and Genetics, 52, 1, 2003, pp. 2-9.
[Mendez03]	R. Mendez, R. Leplae, L. De Maria, S.J. Wodak, "Assessment of blind predictions of protein-protein interactions: current status of docking methods," Proteins, 52, 1, 2003, pp. 51-67.
[Chen03a]	R. Chen, J. Mintseris, J. Janin, Z. Weng, "A protein-protein docking benchmark," Proteins: Structure, Function, and Genetics, 52, 1, 2003, pp. 88-91.

Alignment of point sets with least-squares fit of corresponding points

[Horn87]	B.K.P. Horn, "Closed-Form Solution of Absolute Orientation using Unit Quaternions," Journal of the Optical Society A, 4, 4, 1987, pp. 629-642.
[Arun87]	K.S. Arun, T.S. Huang, S.D. Blostein, "Least-Squares Fitting Of 2 3-D Point Sets," IEEE Transactions On Pattern Analysis And Machine Intelligence, 9, 1987, pp. 699-700.
[Lesk98]	AM. Lesk, "Extraction of geometrically similar substructures: least-squares and Chebyshev fitting and the difference distance matrix," Proteins, 33, 1998, pp. 320-328.

Alignment of grids with fast rotational matching

[Kovacs02]

J.A. Kovacs, W. Wriggers,
"Fast rotational matching,"
Acta Cryst., D58, 2002, pp. 1282-1286.

``A computationally efficient method is presented - `fast rotational matching' or FRM - that significantly accelerates the search of the three rotational degrees of freedom (DOF) in biomolecular matching problems. This method uses a suitable parametrization of the three-dimensional rotation group along with spherical harmonics, which allows efficient computation of the Fourier Transform of the rotational correlation function. Previous methods have used Fourier techniques only for two of the rotational DOFs, leaving the remaining angle to be determined by an exhaustive search. Here for the first time a formulation is presented that makes it possible to Fourier transform all three rotational DOFs, resulting in notable improvements in speed. Applications to the docking of atomic structures into electron-microscopy maps and the molecular-replacement problem in X-ray crystallography are considered.''

[Kovacs03]

J.A. Kovacs, P. Chacón, Y. Cong, E. Metwally, W. Wriggers,
"Fast rotational matching of rigid bodies by fast Fourier transform acceleration of five degrees of freedom,"
Acta Cryst., D59, 2003, pp. 1371-1376.

``The `fast rotational matching' method is extended to the full six-dimensional (rotation and translation) matching scenario between two three-dimensional objects. By recasting this problem into a formulation involving five angles and just one translational parameter, it was possible to accelerate, by means of fast Fourier transforms, five of the six degrees of freedom of the problem. This method was successfully applied to the docking of atomic structures of components into three-dimensional low-resolution density maps. Timing comparisons performed with our method and with `fast translational matching' (the standard way to accelerate the translational parameters utilizing fast Fourier transforms) demonstrates that the performance gain can reach several orders of magnitude, especially for large map sizes. This gain can be particularly advantageous for spherical- and toroidal-shaped maps, since the scanning range of the translational parameter would be significantly constrained in these cases. The method can also be harnessed to the complementary surface (or `exterior docking') problem and to pattern recognition in image processing.''

Alignment of point sets with XXX

[Barequet97]

Barequet, Sharir,
"Partial Surface and Volume Matching in Three Dimensions,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 9, 1997, pp. 929-948.

This paper alignes surfaces, treating ``separately the rotation and the translation components of the Euclidean motion. The algorithm steps through a sequence of rotations, in a steepest-descent style, and uses a novel technique for scoring the match for any fixed rotation. Experimental results on various examples, involving data from industrial applications, medical imaging, and molecular biology, are presented.''

Algorithms for clique detection (in any graph)

[Bron73]	C. Bron,, J. Kerbosch, "Algorithm 457: finding all cliques of an undirected graph," Communications of the ACM, 16, 1973, pp. 575-577.
[Kresher98]	D. Kresher,, D. Stinson, "Combinatorial Algorithms: Generation, Enumeration and Search," CRC Press, Boca Raton, Florida, 1998, pp. .

Molecular dynamics

[Leckband01]	D. Leckband,, J. Israelachvili, "Intermolecular forces in biology," Quarterly Reviews of Biophysics, 34, 2001, pp. 105-267.
[McCammon87]	J.A. McCammon, S.C. Harvey, "Dynamics of Proteins and Nucleic Acids," Cambridge University Press, 1987.
[Sharp99]	K. Sharp, "Electrostatic Interactions in Proteins," International Tables for Crystallography, International Union of Crystallography, Chester, UK, 1999.
[Karplus86]	M. Karplus, J.A. McCammon, "The dynamics of proteins," Sci. Am., 254, 1986, pp. 42-51. A good reference

Protein folding

[Levitt83]	M. Levitt, "Protein folding by restrained energy minimization and molecular dynamics," J Mol Biol, 170, 1983, pp. 723-764.
[Dill95]	K.A. Dill, S. Bromberg, K. Yue, K.M. Fiebig, D.P. Yee, P.D. Thomas, H.S. Chan, "Principles of protein folding - a perspective from simple exact models," Protein Sci, 4, 1995, pp. 561-602.
[Schulze-Kremer96]	S. Schulze-Kremer, "Genetic Algorithms and Protein Folding," http://www.techfak.uni-bielefeld.de/bcd/Curric/ProtEn/proten.html, 1996.

Computation of electrostatic charge

[Honig95]	B. Honig, A. Nicholls, "Classical electrostatics in biology and chemistry," Science, 268, 1995, pp. 1144-1149.
[Liang97]	J. Liang, S. Subranmaniam, "Computation of molecular electrostatics with boundary element methods," Biophys. J, 73, 1997, pp. 1830-1841.
[Cao02]	J. Cao, D.K. Pham, L. Tonge, D.V. Nicolau, "Predicting surface properties of proteins on the Connolly molecular surface," Smart Mater. Struct., 11, 2002, pp. 772-777. This paper describes a method for computing electron charge, hydrophobicity as well as á-helix and â-sheet structural indices on the Connolly surface.
[Rocchia01]	W. Rocchia, E. Alexov, B. Honig, "Extending the Applicability of the Nonlinear Poisson-Boltzmann Equation: Multiple Dielectric Constants and Multivalent Ions," J. Phys. Chem. B, 105(28), 2001, pp. 6507-6514.
[Yang93]	A-S. Yang, M.R. Gunner, R. Sampogna, K. Sharp, B. Honig, "On the Calculation of pKas in Proteins," Proteins, 15, 3, 1993, pp. 252-265.
[Nicholls91]	A. Nicholls, B. Honig, "A Rapid Finite Difference Algorithm, Utilizing Successive Over-Relaxation to Solve the Poisson-Boltzmann Equation," J. Comp. Chem, 12, 1991, pp. 435-445.
[Gilson88]	M.K. Gilson, B. Honig, "Calculation of the Total Electrostatic Energy of a Macromolecular System: Solvation Energies, Binding Energies and Conformational Analysis," Proteins, 4, 1988, pp. 7-18.
[Gilson87]	M.K. Gilson, K. Sharp, B. Honig, "Calculating the Electrostatic Potential of Molecules in Solution: Method and error assessment," J. Comp. Chem, 9, 1987, pp. 327-335.
[Klapper86]	I. Klapper, R. Hagstrom, R. Fine, K. Sharp, B. Honig, "Focussing of Electric Fields in the Active Site of Cu-Zn Superoxide Dismutase: Effects of Ionic Strength and Amino Acid Modification," Proteins, 1, 1986, pp. 47-59.

Protein flexibility

[Echols03]	N. Echols, D. Milburn, M Gerstein, "MolMovDB: analysis and visualization of conformational change and structural flexibility," Nucleic Acids Res, 31, 2003, pp. 478-482. Database of Macromolecular Movements (http://www.molmovdb.org/)
[Gerstein99]	M. Gerstein, R. Jansen, T. Johnson, J. Tsai, W. Krebs, "Motions in a Database Framework: from Structure to Sequence," Rigidity Theory and Applications, 1999, pp. 401-442.

Protein function classifications

[Ashburner00]

M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, G. Sherlock,
"Gene Ontology tool for the unification of biology. The Gene Ontology Consortium,"
Nat Genet, 25, 2000, pp. 25-29.

This is the main reference for the Gene Ontology

[Camon04]

E. Camon, M. Magrane, D. Barrell, V. Lee, E. Dimmer, J. Maslen, D. Binns, N. Harte, R. Lopez, R. Apweiler,
"The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology,"
Nucleic Acids Res, 32, 2004, pp. D262-D266.

Protein function prediction

[Watson07]	J.D. Watson, S. Sanderson, A. Ezersky, A. Savchenko, A. Edwards, C. Orengo, A. Joachimiak, R.A. Laskowski, J.M. Thornton, "Towards Fully Automated Structure-based Function Prediction in Structural Genomics: A Case Study," Journal of Molecular Biology, 367, 5, 2007, pp. 1511-1522.
[Watson05]	James D Watson, Roman A Laskowski, Janet M Thornton, "Predicting protein function from sequence and structural data," Current Opinion in Structural Biology, 15, 2005, pp. 275-284.
[Wild04]	D.L. Wild, M.A.S. Saqi, "Structural proteomics: inferring function from protein structure," Current Proteomics, 1, 2004, pp. 59-65.
[Wilson00]	C.A. Wilson, J. Kreychman, M. Gerstein, "Assessing Annotation Transfer for Genomics: Quantifying the Relations between Protein Sequence, Structure and Function through Traditional and Probabilistic Scores," J. Molecular Biology, 297, 2000, pp. 233-249. Compares how effectively functional information can be transfered between proteins based on similar sequences (%ID and Smiwth-Waterman) vs. similar structures (RMS). Aligned 30,000 pairs of scop domains and studied correlation between functional classification (EC and FLY) and sequence/structure similarity.
[Skolnick00]	J. Skolnick, J.S. Fetrow, A. Kolinski, "Structural genomics and its importance for gene function analysis," Nature Biotechnol, 18, 3, 2000, pp. 283-287.
[Moult00]	J. Moult, E. Melamud, "From fold to function," Curr. Opin. Struct. Biol, 10, 2000, pp. 384-389.
[Norin02]	M. Norin, M. Sundstrom, "Structural proteomics: developments in structure-to-function predictions," Trends Biotech, 20, 2002, pp. 79-84.
[Laskowski03]	R.A. Laskowski, J.D. Watson, J.M. Thornton, "From protein structure to biochemical function?," J. Struct. Func. Genomics, 4, 2003, pp. 167-177.
[Whisstock03]	J.C. Whisstock, A.M. Lesk, "Prediction of protein function from protein sequence and structure," Q Rev Biophys 2003, 36, 2003, pp. 307-340.
[Zhang03]	C. Zhang, S.H. Kim, "Overview of structural genomics: from structure to function," Curr Opin Chem Biol, 7, 2003, pp. 28-32.
[Martin98]	A.C. Martin, C.A. Orengo, E.G. Hutchinson, S. Jones, M. Karmirantzou, R.A. Laskowski, J.B. Mitchell, C. Taroni, J.M. Thornton, "Protein folds and functions," Structure, 6, 1998, pp. 875-884.
[Shrager03]	J. Shrager, "The fiction of function," Bioinformatics, 19, 2003, pp. 1934-1936.
[Thornton00]	J.M. Thornton, A.E. Todd, D. Milburn, N. Borkakoti, C.A. Orengo, "From structure to function: Approaches and limitations," Nat. Struct. Biol., 7(Suppl), 2000, pp. 991-994.
[Jackson01]	R.M. Jackson, R.B. Russell, "Predicting function from structure: examples of the serine protease inhibitor canonical loop conformation found in extracellular proteins," Comput. \& Chem, 26, 2001, pp. 31-39.
[Pal05]	D. Pal D, D. Eisenberg, "Inference of protein function from protein structure," Structure, 13, 2005, pp. 1-10. Watson05: The ProKnow server uses several structural methods for function identification using GO terminology. The method's most important feature is that it attempts to weight the evidence using Bayes' theorem for each of the functional predictions, and provides details to the user concerning the confidence of each method and the confidence of each GO term assignment This allows annotations to be ranked by significance An overall success rate of 70\% correct prediction is reported
[Fetrow01]	J.S. Fetrow, N. Siew, J.A. Di Gennaro, M. Martinez-Yamout, H.J. Dyson, J. Skolnick, "Genomic-scale comparison of sequence- and structure-based methods of function prediction: does structure provide additional insight?," Protein Sci, 10, 5, 2001, pp. 1005-1014. Fuzzy Functional Forms
[Cammer03]	S.A. Cammer, B.T. Hoffman, J.A. Speir, M.A. Canady, M.R. Nelson, S. Knutson, M. Gallina, S.M. Baxter, J.S. Fetrow, "Structure-based Active Site Profiles for Genome Analysis and Functional Family Subclassification," Journal of Molecular Biology, 334, 3, 2003, pp. 387-401.
[Hegyi99]	H. Hegyi, M. Gerstein, "The Relationship between Protein Structure and Function: a Comprehensive Survey with Application to the Yeast Genome," J Mol. Biol., 228, 1999, pp. 147-164.
[Gold06]	N.D. Gold, R.M. Jackson, "Fold Independent Structural Comparisons of Protein-Ligand Binding Sites for Exploring Functional Relationships," Journal of Molecular Biology, 355, 5, 2006, pp. 1112-1124.

Protein function prediction with machine learning

[Dobson04a]	P.D. Dobson, A.J. Doig, "Predicting enzyme class from protein structure without alignments," J Mol Biol, 345, 2004, pp. 187-199.
[Dobson04b]	P.D. Dobson, Y.D. Cai, B.J. Stapley, A.J. Doig, "Prediction of protein function in the absence of significant sequence similarity," Curr Med Chem, 11, 2004, pp. 2135-2142.
[Cai04]	C.Z. Cai, L.Y. Han, Z.L. Ji, Y.Z. Chen, "Enzyme family classification by support vector machines," Proteins, 55, 2004, pp. 66-76.
[Lu04]	X. Lu, C. Zhai, V. Gopalakrishnan, B.G. Buchanan, "Automatic annotation of protein motif function with Gene Ontology terms," BMC Bioinformatics, 5, 2004, pp. 122.
[Vinayagam04]	A. Vinayagam, R. Konig, J. Moorman, F. Schubert, R. Eils, K-H. Glatting, S. Suhai, "Applying support vector machines for Gene Ontology based gene function prediction," BMC Bioinformatics, 5, 2004, pp. 116.
[Perez04]	A.J. Perez, G. Thode, O. Trelles, "AnaGram: protein function assignment," Bioinformatics, 20, 2004, pp. 291-292.
[Stahl00]	M. Stahl, C. Taroni, G. Schneider, "Mapping of protein surface cavities and prediction of enzyme class by a self-organizing neural network," Protein Engineering, 13, 2, 2000, pp. 83-88.
[Ben-Hur06]	A. Ben-Hur, W. Noble, "Choosing negative examples for the prediction of protein-protein interactions," BMC Bioinformatics, 7, 2006.

Function prediction with interaction networks

[Martin04]	D.M.A. Martin, M. Berriman, G.J. Barton, "Gotcha: a new method for prediction of protein function assessed by the annotation of seven genomes," BMC Bioinformatics, 5, 2004, pp. 178.
[Smid04]	M. Smid, L.C.J. Dorssers, "GO-Mapper: functional analysis of gene expression data using the expression level as a score to evaluate Gene Ontology terms," Bioinformatics, 20, 2004, pp. 2618-2625.
[Deng04]	M. Deng, Z. Tu , F. Sun, T. Chen, "Mapping Gene Ontology to proteins based on protein-protein interaction data," Bioinformatics, 20, 2004, pp. 895-902.
[Huynen04]	T. Gabaldon, M.A. Huynen, "Prediction of protein function and pathways in the genomic era," Cell Mol Life Sci, 61, 2004, pp. 930-944.
[VonMeering05]	C. Von Meering, L.J. Jensen, B. Snel, S.D. Hooper, M. Krupp, M. Foglierini, N. Jouffre N, M.A. Huynen, P. Bork, "STRING: known and predicted protein-protein associations, integrated and transferred across organisms," Nucleic Acids Res, 33, 2005, pp. D433-D437. Watson05: The STRING database takes data from protein-protein interaction experiments, microarray expression data, genome organization and co-occurrence to identify functional associations. One of the key features of the database is the use of text processing of PubMed to identify potential interacting partners from the literature
[Oinn04]	T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M.Greenwood, T. Carver, K. Glover, M.R. Pocock, A. Wipat A, P. Li, "Taverna: a tool for the composition and enactment of bioinformatics workflows," Bioinformatics, 20, 2004, pp. 3045-3054.

Drug screening

[Lengauer04]	T. Lengauer, C. Lemmen, M. Rarey, M. Zimmermann, "Novel technologies for virtual screening," Drug Discovery Today, 9, 1, 2004, pp. 27-34.
[Mestres00]	J. Mestres, R.M.A. Knegtel, "Similarity versus docking in 3D virtual screening," Perspectives in Drug Discovery and Design, 20, 2000, pp. 191-207. Compares ligand-based screening with receptor-based screening for thrombin ligands.
[Jansen04]	J.M. Jansen, E.J. Martin, "Target-biased scoring approaches and expert systems in structure-based virtual screening," Current Opinion in Chemical Biology, 8, 4, 2004, pp. 359-364.
[Hou04]	T. Hou, X. Xu, "Recent Development and Application of Virtual Screening in Drug Discovery: An Overview," Current Pharmaceutical Design, 10, 9, 2004, pp. 1011-1033.
[Bender05]	A Bender, R.C. Glen, "A Discussion of Measures of Enrichment in Virtual Screening: Comparing the Information Content of Descriptors with Increasing Levels of Sophistication," J. Chem. Inf. Model., 45, 5, 2005, pp. 1369-1375.
[Shoichet02]	B.K. Shoichet, S.L. McGovern, B. Wei, J.J. Irwin, "Lead discovery using molecular docking," Curr Opin Chem Biol, 6, 2002, pp. 439-446.
[Lyne02]	P.D. Lyne, "Structure-based virtual screening: an overview," DDT, 7, 20, 2002, pp. 1047-1055.
[Abagyan01]	R. Abagyan, M. Totrov, "High-throughput docking for lead generation," Curr Opin Chem Biol, 5, 2001, pp. 375-382.
[Walters98]	W.P.Walters, M.T. Stahl, M.A. Murcko, "Virtual screening - an overview," Drug Discov Today, 3, 1998, pp. 160-178.
[Shirai01]	H. Shirai, J. Shi, T.L. Blundell, K. Mizuguchi, "Structural bioinformatics as an approach to genomics-based drug discovery," Global Outsourcing Review, 3, 2001, pp. 48-53.
[Bajorath02]	J. Bajorath, "Virtual screening in drug discovery: methods, expectations and reality," 2002. Provides overview of virtual screening, points to successes
[Barril04]	X. Barril, R.E. Hubbard, S.D. Morley, "Virtual Screening in Structure-Based Drug Discovery," Mini Reviews in Medicinal Chemistry, 4, 7, 2004, pp. 779-791.

Ligand-based drug screening methods

[Paul04]	N. Paul, E. Kellenberger, G. Bret, P. Muller, Didier Rognan, "Recovering the True Targets of Specific Ligands by Virtual Screening of the Protein Data Bank," PROTEINS: Structure, Function, and Bioinformatics, 54, 2004, pp. 671-680.
[Zauhar03]	R.J. Zauhar, G. Moyna, L. Tian, Z. Li, W.J. Welsh, "Shape Signatures: A New Approach to Computer-Aided Ligand- and Receptor-Based Drug Design," J. Med. Chem., 46, 2003, pp. 5674-5690.
[Chen01]	Y.Z. Chen, D.G. Zhi, "Ligand-Protein Inverse Docking and Its Potential Use in the Computer Search of Protein Targets of a Small Molecule," PROTEINS: Structure, Function, and Genetics, 43, 2001, pp. 217-226.
[Labute05]	P. Labute, "On the perception of molecules from 3D atomic coordinates," J Chem Inf Model, 45, 2, 2005, pp. 215-221.

Structure-based drug design overviews

[Beavers02]	M.P. Beavers, X. Chen, "Structure-based combinatorial library design: methodologies and applications," Journal of Molecular Graphics and Modelling, 20, 2002, pp. 463-468.
[Veselovsky03]	A.V. Veselovsky, A.S. Ivanov, "Strategy of Computer-Aided Drug Design," Current Drug Targets - Infectious Disorders, 3, 1, 2003, pp. 33-40.
[Klebe00]	G. Klebe, "Recent developments in structure-based drug design," J Mol Med, 78, 2000, pp. 269-281.
[Gane00]	P.J. Gane, P.M. Dean, "Recent advances in structure-based rational drug design," Curr Opin Struct Biol, 10, 2000, pp. 401-404.
[Ooms00]	F. Ooms, "Molecular Modeling and Computer Aided Drug Design. Examples of their Applications in Medicinal Chemistry," Current Medicinal Chemistry, 7, 2000, pp. 141-158.
[Anderson02]	S. Anderson, J. Chiplin, "Structural genomics: shaping the future of drug design?," Drug Discov Today, 7, 2002, pp. 105-107.
[Marrone97]	T.J. Marrone, J.M. Briggs, J.A. McCammon, "Structure-based drug design: Computational Advances," Annu. Rev. Pharmacol. Toxicol., 37, 1997, pp. 71-90.
[Bohacek97]	R.S. Bohacek, C. McMartin, "Modern computational chemistry and drug discovery: structure generating programs," Curr. Opin. Chem. Biol., 1, 1997, pp. 157-161. ``During 1996 and 1997, the first reports were disclosed of active enzyme inhibitors based entirely on novel structures created by de novo methods. De novo methods have also been used to modify and significantly improve the binding affinity of an HIV protease inhibitor. Work continues in the improvement of methods for the de novo design of compounds which fit and chemically complement a binding site. De novo algorithms that generate only synthetically feasible structures have also been reported. In addition, methods are being developed for the automatic computer generation of virtual molecular libraries which can be searched to identify molecules to match a pharmacophore or fit into a binding site.''
[Charifson97]	P. Charifson, I.D. Kuntz, "Recent Successes and Continuing Limitations in Computer-Aided Drug Design," Practical Application of Computer-Aided Drug Design, Marcel-Dekker, New York, 1997, pp. 1-37.
[Kuntz92]	D. Kuntz, "Structure-based Strategies for Drug Design and Discovery," Science, 257, 1992, pp. 1078-1082.

Structure-based drug design with fragment-based methods

[Buhm92]

H.J. Buhm,
"The Computer Program Ludi: A New Method for the De Novo Design of Enzyme Inhibitors,"
J. Comp. Aided Molec. Design, 6, 1992, pp. 61-78.

Structure-based drug design with knowledge-based methods

[Grzybowski02]

B.A. Grzybowski, A.V. Ishchenko, J. Shimada, E.I Shakhnovich,
"From Knowledge-Based Potentials to Combinatorial Lead Design in Silico,"
Acc. Chem. Res., 35, 2002, pp. 261-262.

Structure-based drug design with ...

[Eisen94]

M.B. Eisen, D.C. Wiley, M. Karplus, R.E. Hubbard,
"HOOK: A Program for finding novel molecular architectures that satisfy the chemical and steric requirements of a macromolecule binding site,"
Proteins, 19, 1994, pp. 199-221.

[vonItzstein93]

Mark von Itzstein, Wen-Yang Wu, Gaik B. Kok, Michael S. Pegg, Jeffrey C. Dyason, Betty Jin, Tho Van Phan, Mark L. Smythe, Hume F. White, Stuart W. Oliver, Peter M. Colman, Joseph N. Varghese, D. Michael Ryan, Jacqueline M. Woods, Richard C. Bethell, Vanessa J. Hotham, Janet M. Cameron, Charles R. Penn,
"Rational design of potent sialidase-based inhibitors of influenza virus replication,"
Nature, 363, 1993, pp. 418-423.

Uses GRID for drug design

Quantitative structure activity relationship (QSAR) overviews

[Winkler01]

D.A. Winkler,
"The role of quantitative structure-activity relationships (QSAR) in biomolecular discovery,"
Briefings in Bioinformatics, 3, 1, 2002, pp. 73-86.

Protein-DNA binding

[Havranek04]

J.J. Havranek, C.M. Duarte, D. Baker,
"A simple physical model for the prediction and design of protein-DNA interactions,"
J Mol Biol, 344, 2004, pp. 59-70.

Molecular surfaces

[Connolly83]	M.L. Connolly, "Solvent-accessible surfaces of proteins and nucleic acids," Science, 221, 1983, pp. 709-713. This is the main reference for the Connolly surface
[Connolly83b]	M. L. Connolly, "Analytical Molecular Surface Calculation," Journal of Applied Crystallography, 16, 1983, pp. 548-558.
[Connolly85]	M.L. Connolly, "Molecular surface triangulation," J. Appl. Crystallogr., 18, 1985, pp. 499-505.
[Connolly86a]	M.L. Connolly, "Measurement of protein surface shape by solid angles," J. Mol. Graphics, 4, 1986, pp. 3-6.
[Connolly86b]	M.L. Connolly, "Plotting protein surfaces," J. Mol. Graphics, 4, 1986, pp. 93-96.
[Connolly93]	M.L. Connolly, "The molecular surface package," J. Mol. Graphics, 11, 1993, pp. 139-141. http://www.biohedron.com/
[Sanner96]	M.F. Sanner, J.C. Spehner, A.J. Olson, "Reduced surface: an efficient way to compute molecular surfaces," Biopolymers, 38, 3, 1996, pp. 305-320. http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html
[Eisenhaber93]	F. Eisenhaber, P. Argos, "Improved Strategy in Analytic Surface Calculation for Molecular Systems: Handling of Singularities and Computational Efficiency," Journal of Computational Chemistry, 14, 11, 1993, pp. 1272-1280. http://mendel.imp.univie.ac.at/SURFACE/ASC/asc2.html
[Eisenhaber95]	F. Eisenhaber, P. Lijnzaad, P. Argos, C. Sander, M. Scharf, "The Double Cubic Lattice Method: Efficient approaches to numerical integration of surface area and volume and to dot surface contouring of molecular assemblies," J. Comp. Chem., 16, 3, 1995, pp. 273-284.
[Lee71]	B. Lee, F.M. Richards, "The Interpretation of Protein Structures: Estimation of Static Accessibility," Journal of Molecular Biology, 55, 1971, pp. 379-400. This is the main reference for the solvent accessible surface
[Greer78]	J. Greer, B. Bush, "Macromolecular Shape and SurfaceMaps by Solvent Exclusion," Proceedings of the National Academy of Sciences USA, 75, 1978, pp. 303-307. Early method for computing solvent accessible surfaces
[ODonnell92]	T.J. O'Donnell, "Interactive Computation and Display of Molecular Surfaces," Journal of Molecular Graphics, 10, 1992, pp. 39-40.
[Klein90]	T. Klein, C. Huang, E. Pettersen, G. Couch, T. Ferrin, R. Langridge, "A Real-Time Malleable Molecular Surface," Journal of Molecular Graphics, 8, 1990, pp. 16-24 and 26-27.

Secondary Structure Prediction

[Kabsch83]	W. Kabsch, C. Sander, "Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features," Biopolymers, 22, 1983, pp. 2577-2637. This is a paper about DSSP, a method for predicting secondary structure from sequence
[Jones99]	D.T. Jones, "Protein secondary structure prediction based on position-specific scoring matrices," J. Mol. Biol., 292, 1999, pp. 195-202. PSIPRED: One of the leading protein secondary structure prediction methods
[King96]	R.D. King, M.J.E. Sternberg, "Identification and application of the concepts important for accurate and reliable protein secondary structure prediction," Prot. Sci., 5, 1996, pp. 2298-2310.
[Garnier96]	J. Garnier, J.F. Gibrat, B. Robson, "GOR method for predicting protein secondary structure from amino acid sequence," Methods Enzymol, 266, 1996, pp. 540-553.

Tertiary structure prediction overviews

[Sternberg97]	M.J.E. Sternberg, "Protein Structure Prediction - A practical approach," Oxford University Press, 1997.
[Baker01]	D. Baker, A. Sali, "Protein Structure Prediction and Structural Genomics," Science, 294, 5540, 2001, pp. 93-96.
[Jones00]	D.T. Jones, "Protein Structure Prediction in the Postgenomic Era," Current Opinion in Structural Biology, 10, 3, 2000, pp. 371-379.
[Simons01]	K.T. Simons, C. Strauss,, D. Baker, "Prospects for ab initio Protein Structural Genomics," J. Molecular Biology, 306, 5, 2001, pp. 1191-1199.

Tertiary structure prediction evaluations

[Moult03]

J. Moult, K. Fidelis, A. Zemla, T. Hubbard,
"Critical assessment of methods of protein structure prediction (CASP) - round V,"
Proteins: Structure, Function, and Genetics, 53, S6, , pp. 334-339.

Tertiary structure prediction with ab initio methods

[Floudas06]	C.A. Floudas, H.K. Fung, S.R. McAllister, M. M\"onnigmann, R. Rajgaria, "Advances in Protein Structure Prediction and De Novo Protein Design: A review," Chem. Eng. Sci., 61, 2006, pp. 966-988. Review paper for protein structure prediction and de novo protein design.
[Klepeis02]	. Method for predicting helical regions using detailed atomistic level modeling of overlapping oligopeptides.
[Klepeis03a]	J.L. Klepeis, C.A. Floudas, "Prediction of beta-sheet topology and disulfide bridges in polypeptides," J. Comput. Chem., 24, 2003, pp. 191-208. Method for predicting beta-strand locations and b-sheet topology using optimization techniques
[Klepeis03b]	J.L. Klepeis, C.A. Floudas, "ASTRO-FOLD: A Combinatorial and Global Optimization Framework for Ab Initio Prediction of Three-Dimensional Structures of Proteins from the Amino Acid Sequence," Biophys. J., 85, 2003, pp. 2119-2146. First principles framework for protein structure prediction
[Monningmann05]	M. M\"onnigmann, C.A. Floudas, "Protein Loop Structure Prediction With Flexible Stem Geometries," Prot. Struct. Funct. Bioinf., 61, 2005, pp. 748-762. Loop structure prediction method for loops with flexible stems. Uses dihedral angle sampling and introduces a novel use of clustering.
[Liwo02]	A. Liwo, P. Arlukowicz, C. Czaplewski, S. Oldziej, J. Pillardy, H.A. Scheraga, "A method for optimizing potential-energy functions by hieracrchical design of the potential-energy landscape: Application to the UNRES force field," PNAS, 99, 2002, pp. 1937-1942. One of the more recent papers by Scheraga and co-workers detailing the use of the united residue (UNRES) approach for protein tertiary structure prediciton
[Skolnick03]	J. Skolnick, Y. Zhang, A. K. Arakaki, A. Kolinski, M. Boniecki, A. Szil\'agyi, D. Kihara, "TOUCHSTONE: A Unified Approach to Protein Structure Prediction," Prot. Struct. Funct. Bioinf., 53, 2003, pp. 469-479.

Tertiary structure prediction with x-ray crystallography

[David03]

A.M. Davis, S.J. Teague, G.J. Kleywegt,
"Application and limitations of X-ray crystallographic data in structure-based ligand and drug design,"
Angew. Chem., Int. Ed., 42, 2003, pp. 2718-2736.

Tertiary structure prediction with threading

[Lathrop94]	R.H. Lathrop, "The protein threading problem with sequence amino acid intraction preferences is NP-complete," Protein Eng, 7, 9, 1994, pp. 1059-1068.
[Xu03]	J. Xu, M. Li, D. Kim, Y. Xu, "RAPTOR: Optimial Protein Threading by Linear Programming," J.Bioinf. Comput. Biol., 1, 2003, pp. 95-117.

Protein structure evaluation

[Eyal05]

E. Eyal, S. Gerzon, V. Potapov, M. Edelman, V. Sobolev,
"The Limit of Accuracy of Protein Modeling: Influence of Crystal Packing on Protein Structure,"
j. Mol Biol., 351, 2005, pp. 431-442.

[Wei99]

L. Wei, E.S. Huang, R.B. Altman,
"Are predicted structures good enough to preserve functional sites?,"
Structure Fold Des, 7, 6, 1999, pp. 643-650.

``BACKGROUND: A principal goal of structure prediction is the elucidation of function. We have studied the ability of computed models to preserve the microenvironments of functional sites. In particular, 653 model structures of a calcium-binding protein (generated using an ab initio folding protocol) were analyzed, and the degree to which calcium-binding sites were recognizable was assessed. RESULTS: While some model structures preserve the calcium-binding microenvironments, many others, including some with low root mean square deviations (rmsds) from the crystal structure of the native protein, do not. There is a very weak correlation between the overall rmsd of a structure and the preservation of calcium-binding sites. Only when the quality of the model structure is high (rmsd less than 2 A for atoms in the 7 A local neighborhood around calcium) does the modeling of the binding sites become reliable. CONCLUSIONS: Protein structure prediction methods need to be assessed in terms of their preservation of functional sites. High-resolution structures are necessary for identifying binding sites such as calcium-binding sites.''

Prediction of protein side-chain rotamers

[Bower99]

M. J. Bower, F.E. Cowen, R.L. Dunbrack,
"Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool,"
J. Mol. Biol., 267, 1999, pp. 1268-1282.

This is a reference for the Dunbrack rotamer library

Analysis of residue conservation

[Mayrose04]	I. Mayrose, D. Graur, N. Ben-Tal, T. Pupko, "Comparison of site-specific rate-inference methods for protein sequences: Bayesian methods are superior," Mol Biol Evol, 21, 2004, pp. 1781-1791.
[Glaser05]	F. Glaser, Y. Rosenberg, A. Kessel, T. Pupko, N. Ben-Tal, "The ConSurf-HSSP database: The mapping of evolutionary conservation among homologs onto PDB structures," PROTEINS: Structure, Function, and Bioinformatics, 58, 2005, pp. 610-617.

Not classified yet

[Aloy01]	P. Aloy, E. Querol, F.X. Aviles, M.J.E. Sternberg, "Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking," J. Mol. Biol, 311, 2001, pp. 395-408.
[TenEyck95]	L.F. Ten Eyck, J. Mandell, V.A. Roberts, M.E. Pique, "Surveying molecular interactions with DOT," 1995 ACM/IEEE Supercomputing Conference, New York, 1995.
[Zhang05]	Y. Zhang, J. Skolnick, "The protein structure prediction problem could be solved using the current PDB library," PNAS, 102, 4, 2005, pp. 1029-1034.
[Glick02]	M. Glick, D.D. Robinson, G.H. Grant, W.G. Richards, "Identification of ligand binding sites on proteins using a multiscale approach," J. Am. Chem. Soc., 124, 2002, pp. 2337-2344.
[Todd02]	A.E. Todd, C.A. Orengo, J.M. Thornton, "Plasticity of enzyme active sites," Trends in Biochemical Sciences, 27, 2002, pp. 419-426. The expectation is that any similarity in reaction chemistry shared by enzyme homologues is mediated by common functional groups conserved through evolution. However, detailed enzyme studies have revealed the flexibility of many active sites, in that different functional groups, unconserved with respect to position in the primary sequence, mediate the same mechanistic role. Nevertheless, the catalytic atoms might be spatially equivalent. More rarely, the active sites have completely different locations in the protein scaffold. This variability could result from: (1) the hopping of functional groups from one position to another to optimize catalysis; (2) the independent specialization of a low-activity primordial enzyme in different phylogenetic lineages; (3) functional convergence after evolutionary divergence; or (4) circular permutation events.
[Pearl93]	L Pearl, "Similarity Of Active-Site Structures," Nature, 362, 1993, pp. 24-24. This paper observes the similarities in the active sites of serine proteases
[Carlson02]	H.A. Carlson, "Protein Flexibility is an Important Component of Structure-Based Drug Discovery," Current Pharmaceutical Design, 8, 17, 2002, pp. 1571-1578.
[Betz02]	S.F. Betz, S.M. Baxter, J.S. Fetrow, "Function first: a powerful approach to post-genomic drug discovery," DDT, 7, 16, 2002.
[Xie05]	L. Xie, P.E. Bourne, "Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets and Homology Models," PLoS Comp Biol, 1, 3, 2005, pp. e31.
[Nikolova04]	N. Nikolova, J. Jaworska, "Approaches to measure chemical similarity - A review," QSAR Comb. Sci., 22, 2004, pp. 1006-1026.
[Hert04]	J. Hert, P. Willett, D.J. Wilton, "Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures," J. Chem. Inf. Comput. Sci., 44, 2004, pp. 1177-1185.
[Gelly06]	J. Gelly, A.G. de Brevern, S. Hazout, "Protein Peeling: an approach for splitting a 3D protein structure into compact fragments," Bioinformatics, 22, 2, 2006, pp. 129-133.