Structural bioinformatics books

[Bourne03] P.E. Bourne, H. Weissig,
"Structural Bioinformatics,"
Wiley-Liss, Hoboken, NJ, 2003.
This is a good introductory book on structural bioinformatics. It practical rather than theoretical - it reviews the main sources of structural data (e.g., PDB, NDB, etc.) and surveys the most popular methods used for predicting structure, aligning structures, predicting function, etc.
 
[Orengo04] C.A. Orengo, D.T. Jones, J.M. Thornton,
"Bioinformatics: Genes, Proteins, \& Computers,"
BIOS Scientific Publishers, Abingdon, UK, 2004.
This is a good book on bioinformatics, with particular emphasis on structural bioinformatics.
 


Cheminformatics books

[Gasteiger] J. Gasteiger, T. Engel,
"Cheminformatics,"
Wiley-VCH, Weinheim, Germany, 2003.
 
[Leach03] A. Leach, V. Gillet,
"An Introduction to Cheminformatics,"
Springer, 2003.
 
[Bajorath04] J. Bajorath,
"Chemoinformatics: Concepts, Methods, and Tools for Drug Discovery (Methods in Molecular Biology),"
Humana Press, 2004.
 


Molecular modeling books

[Leach97] A. Leach,
"Molecular Modelling: Principles and Applications,"
Longman Pub Group, 1997.
 
[Schlick02] T. Schlick,
"Molecular Modeling and Simulation,"
Springer, 2002.
 
[Holtje03] Holtje, Sippl, Rognan, Folkers,
"Molecular Modeling: Basic Principles and Applications,"
Wiley-VCH, 2003.
 


Structural bioinformatics overviews

[Goldsmith-Fischman03] S. Goldsmith-Fischman, B. Honig,
"Structural genomics: Computational methods for structure analysis,"
Protein Science, 12, 2003, pp. 1813-1821.
 
[Blundell00] T.L. Blundell, K. Mizuguchi,
"Structural genomics: an overview,"
Progress in Biophysics \& Molecular Biology, 73, 2000, pp. 289-295.
 


Sequence databases

[Apweiler04] R. Apweiler, A. Bairoch, C.H. Wu,
"Protein sequence databases,"
Current Opinion in Chemical Biology, 8, 1, 2004, pp. 76-80.
This is the main reference for the UniProt database
 
[Bairoch05] A. Bairoch, R. Apweiler, C.H. Wu, W.C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, M.J. Martin, D.A. Natale, C. O'Donovan, N. Redaschi, L.S. Yeh,
"The Universal Protein Resource (UniProt),"
Nucleic Acids Res., 33, 2005, pp. D154-D159.
This is a paper about the UniProt database
 


Structure databases

[Berman00] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne,
"The Protein Data Bank,"
Nucleic Acids Research, 28, 2000, pp. 235-242.
This is the main reference for the PDB
 
[Berman92] H.M. Berman, W.K. Olson, D.L. Beveridge, J. Westbrook, A. Gelbin, T. Demeny, S.H. Hsieh, A.R. Srinivasan, B. Schneider,
"The Nucleic Acid Database: A Comprehensive Relational Database of Three-Dimensional Structures of Nucleic Acids,"
Biophys. J., 63, 1992, pp. 751-759.
This is the main reference for the NDB
 


Structure database annotations

[Laskowski97] R.A. Laskowski, E.G. Hutchinson, A.D. Michie A.C. Wallace, M.L. Jones, J.M. Thornton,
"PDBsum: A Web-based database of summaries and analyses of all PDB structures,"
Trends Biochem. Sci., 22, 1997, pp. 488-490.
This is the original paper about PDBsum, a web-based service for summarizing known information about every PDB file - very useful (http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/).
 
[Laskowski01] R.A. Laskowski,
"PDBsum: summaries and analyses of PDB structures,"
Nucleic Acids Res, 29, 2001, pp. 221-222.
This is an update to PDBsum paper
 
[Laskowski05b] R.A. Laskowski, V.V. Chistyakov, J.M. Thornton,
"PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids,"
Nucleic Acids Res 33 Database, Issue, 2005, pp. D266-268.
This is another update to PDBsum paper
 
[Velankar05] S. Velankar, P. McNeil, V. Mittard-Runte, A. Suarez, D. Barrell, R. Apweiler, K. Henrick,
"E-MSD: an integrated data resource for bioinformatics,"
Nucleic Acids Res (Database Issue), 33, 2005, pp. D262-D265.
This is the main reference for the MSD, the macromolecular structural database (http://www.ebi.ac.uk/msd/). It provides a relational database with structures (e.g., PDB), quaternary structure predictions (PQS), classifications (e.g., SCOP, CATH, EC, etc.) and results of analyses (e.g., protein-ligand contacts).
 
[Schomburg00] I. Schomburg, O. Hofmann, C. Bansch, A. Chang, D. Schomburg,
"Enzyme data and metabolic information: BRENDA, a resource for research in biology, biochemistry, and medicine,"
Gene Funct. Dis., 3, 4, 2000, pp. 109-118.
This is the original reference for BRENDA, a database with information about enzymes (http://www.brenda.uni-koeln.de/).
 
[Bairoch00] A. Bairoch,
"The ENZYME database in 2000,"
Nucleic Acids Res, 28, 2000, pp. 304-305.
This is the main reference for the ENZYME database, which contains information about binding sites in enzymes (contacts, cofactors, etc.) (http://www.expasy.org/enzyme/).
 
[Hobohm92] U. Hobohm, M. Scharf, R. Schneider, C.Sander,
"Selection of a representative set of structures from the Brookhaven Protein Data Bank,"
Protein Science, 1, 1992, pp. 409-417.
This is the original reference for PDBSelect.
 
[Hobohm94] U. Hobohm, C. Sander,
"Enlarged representative set of protein structures,"
Protein Science, 3, 1994, pp. 522.
This is an update for PDBSelect.
 
[Henrick98] K. Henrick, J.M. Thornton,
"PQS: a protein quaternary structure file server,"
Trends in Biochemical Sciences, 23, 9, 1998, pp. 358-361.
 


Databases of small molecules

[Irwin05] J.J. Irwin, B.K. Shoichet,
"ZINC - A Free Database of Commercially Available Compounds for Virtual Screening,"
J. Chem. Inf. Model, 45, 1, 2005, pp. 177-182.
This is the main reference for ZINC (http://blaster.docking.org/zinc/).
 


Protein-ligand complex databases

[Chalk04] A.J. Chalk, C.L. Worth, J.P. Overington, A.W.E Chan,
"PDBLIG: Classification of Small Molecular Protein Binding in the Protein Data Bank,"
J. Med. Chem., 47, 15, 2004, pp. 3807-3816.
 
[Feng04] Z. Feng, L. Chen, H. Maddula, O. Akcan, R. Oughtred, H.M. Berman, J. Westbrook,
"Ligand Depot: a data warehouse for ligands bound to macromolecules,"
Bioinformatics, 20, 13, 2004, pp. 2153-2155.
 
[Puvanendrampillai03] D. Puvanendrampillai, J. Mitchell,
"Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein-ligand complexes,"
Bioinformatics, 19, 14, 2003, pp. 1856-1857.
This is the main reference for the Protein Ligand Database (PLD).
 
[Golovin05] A. Golovin, D. Dimitropoulos, T. Oldfield, A. Rachedi, K. Henrick,
"MSDsite: A Database Search and Retrieval System for the Analysis and Viewing of Bound Ligands and Active Sites,"
PROTEINS: Structure, Function, and Bioinformatics, 58, 1, 2005, pp. 190-199.
 
[Bergner02] A. Bergner, J. Gunther, M. Hendlich, G. Klebe, M. Verdonk,
"Use of Relibase for Retrieving Complex 3D Interaction Patterns Including Crystallographic Packing Effects,"
Biopolymers (Nucleic Acid Sci.), 61, 2002, pp. 99-110.
 
[Hendlich98] M. Hendlich,
"Databases for Protein-Ligand Complexes,"
Acta Crystallographica, D54, 1998, pp. 1178-1182.
This is the main reference for Relibase, a database of protein-ligand interactions (http://relibase.ebi.ac.uk/).
 
[Sheu05] S.H. Sheu, D.R. Lancia, Jr,, K.H. Clodfelter, M.R. Landon, S. Vajda,
"PRECISE: a Database of Predicted and Consensus Interaction Sites in Enzymes,"
Nucleic Acids Research, 33 (Database issue), 2005, pp. D206-D211.
 


Protein structure fundamentals

[Branden99] Carl-Ivar Branden, John Tooze,
"Introduction to Protein Structure,"
Garland Publishing; 2nd edition, 1999.
This is a classic book on protein structure
 
[Lesk01] Arthur M. Lesk,
"Introduction to Protein Architecture: The Structural Biology of Proteins,"
Oxford University Press, 2001.
This is a good book on protein structure.
 
[Lehninger04] David L. Nelson, Michael M. Cox,
"Lehninger Principles of Biochemistry,"
W.H. Freeman; 4th edition, 2004.
This is a classic book on biochemistry.
 
[Hunter93] L. Hunter,
"Molecular Biology for Computer Scientists,"
Artificial Intelligence and Molecular Biology, AAAI Press, 1993.
This is a high-level review article covering all of molecular biology.
 


Protein structure characterization

[Varrazzo05] D. Varrazzo, A. Bernini, O.Spiga, A. Ciutti, S. Chiellini, V. Venditti, L. Bracci, Neri Niccolai,
"Three-dimensional computation of atom depth in complex molecular structures,"
Bioinformatics, 21, 12, 2005, pp. 2856-2860.
 
[Gerstein00] M. Gerstein, F.M. Richards,
"Protein Geometry: Volumes, Areas, and Distances,"
International Tables for Crystallography (Molecular Geometry and Features in Macromolecular Crystallography), Chapter 22, Volume F, 2000.
 
[Tsai99] J. Tsai, R. Taylor, C. Chothia, M. Gerstein,
"The Packing Density in Proteins: Standard Radii and Volumes,"
J. Mol. Biol., 290, 1999, pp. 253-266.
 
[Singh92] Juswinder Singh, J.M. Thornton,
"Protein Side-Chain Interactions,"
Oxford University Press, 1992.
 
[Sobolev99] V. Sobolev, A. Sorokine, J. Prilusky, E.E. Abola, M. Edelman,
"Automated analysis of interatomic contacts in proteins,"
Bioinformatics, 15, 4, 1999, pp. 327-332.
 


Protein fold classification

[Murzin95] A.G. Murzin, S.E. Brenner, T. Hubbard, C. Chothia,
"SCOP: a structural classification of proteins database for the investigation of sequences and structures,"
J. Mol. Biol, 247, 1995, pp. 536-540.
This is the main reference for the SCOP hierarchy
 
[Andreeva04] A. Andreeva, D. Howorth, S.E. Brenner, T.J.P. Hubbard, C. Chothia, A.G. Murzin,
"SCOP database in 2004: refinements integrate structure and sequence family data,"
Nucleic Acids Research, 32, 2004, pp. D226-D229.
This is a more recent reference for the SCOP hierarchy
 
[Orengo97] C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, J.M. Thornton,
"CATH - A Hierarchic Classification of Protein Domain Structures,"
Structure, 5, 8, 1997, pp. 1093-1108.
This is the original reference for the CATH hierarchy
 
[Pearl05] F. Pearl, A. Todd, I. Sillitoe, M. Dibley, O. Redfern, T. Lewis, C. Bennett, R. Marsden, et al,
"The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis,"
Nucleic Acids Research, 33, 2005, pp. D247-D251.
This is a more recent reference for the CATH hierarchy
 
[Taylor02a] W.R. Taylor,
"A ``periodic table'' for protein structures,"
Nature, 416, 6881, 2002, pp. 657-660.
This paper formalizes both secondary and tertiary links to allow the rigorous and automatic definition of protein topology.
 
[Holm96] L. Holm, C. Sander,
"The FSSP database: fold classification based on structure-structure alignment of proteins,"
Nucleic Acids Research, 24, 1, 1996, pp. 206-209.
This is the main reference for the FSSP classification, which is based on DALI structural alignments.
 


Protein fold space

[Chothia92] C. Chothia,
"One thousand families for the molecular biologist,"
Nature, 357, 1992, pp. 543-544.
This is the classic paper in which Chothia predicted that the number of folds observed in nature is quite small compared to the number of proteins.
 
[Chothia86] C. Chothia, A.M. Lesk,
"The relation between the divergence of sequence and structure in proteins,"
The EMBO Journal, 5, 1986, pp. 823-826.
 
[Orengo94] C.A. Orengo, D.T. Jones, J.M. Thornton,
"Protein superfamilies and domain superfolds,"
Nature, 372, 1994, pp. 631-634.
 
[Sander91] C. Sander, R. Schneider,
"Database of homology-derived protein structures and the structural meaning of sequence alignment,"
Proteins, 9, 1, 1991, pp. 56-68.
 
[Wang96] Z-X. Wang,
"How many fold types of protein are there in nature?,"
Proteins, 26, 1996, pp. 186-191.
 
[Zhang97] C-T. Zhang,
"Relations of the numbers of protein sequences, families and folds,"
Protein Engineering, 10, 7, 1997, pp. 757-761.
 


Pairwise sequence alignment

[Needleman71] S.B. Needleman, C.D. Wunsch,
"A general method applicable to the search for similarities in the amino acid sequence of two proteins,"
J. Mol. Biol., 48, 1971, pp. 443-453.
This is the original paper on global sequence alignment
 
[Smith81] T.F. Smith, M.S. Waterman,
"Identification of common molecular subsequences,"
J. Mol. Biol., 147, 1981, pp. 195-197.
This is the original paper on local sequence alignment. It provides the main reference for the Smith-Waterman alignment score.
 
[McGinnis04] S. McGinnis, T.L. Madden,
"BLAST: at the core of a powerful and diverse set of sequence analysis tools,"
Nucleic Acids Res, 32, 2004, pp. W20-W25.
This is the main reference for BLAST
 
[Pearson90] W.R. Pearson,
"Rapid and sensitive sequence comparison with FASTP and FASTA,"
Methods Enzymol, 183, 1990, pp. 63-98.
This is the main reference for FASTA
 
[Altschul94] S.F. Altschul, M.S. Boguski, W. Gish, J.C. Wootton,
"Issues in searching molecular sequence databases,"
Nature Genetics, 6, 2, 1994, pp. 119-129.
This is an overview of sequence alignment issues and methods. It provides a good reference for sequence alignment methods as a whole.
 


Multiple sequence alignment

[Higgins96] D.G. Higgins, J.D. Thompson, T.J. Gibson,
"Using CLUSTAL for multiple sequence alignments,"
Methods Enzymol, 266, 1996, pp. 383-402.
 


Sequence motifs

[Altshul97] S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman,
"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,"
Nucleic Acids Research, 25, 17, 1997, pp. 3389-3402.
This is the main reference for PSI-BLAST.
 
[Bateman02] A. Bateman, E. Birney, L. Cerruti, R. Durbin, L. Etwiller, S.R. Eddy, S. Griffiths-Jones, K.L. Howe, M. Marshall, E.L.L. Sonnhammer,
"The Pfam protein families database,"
Nucleic Acids Res, 30, 2002, pp. 276-280.
This is the main reference for Pfam.
 
[Soding04] J. Soding,
"Protein homology detection by HMM-HMM comparison,"
Bioinformatics, 21, 2004, pp. 951-960.
 
[Falquet02] L. Falquet, M. Pagni, P. Bucher, N. Hulo, C.J. Sigrist, K. Hofmann, A. Bairoch,
"The PROSITE database, its status in 2002,"
Nucleic Acids Res, 30, 2002, pp. 235-238.
Describes the PROSITE database, which contains HMM profiles.
 
[Jonassen97] I Jonassen,
"Efficient discovery of conserved patterns using a pattern graph,"
Comput Appl Biosci, 13, 1997, pp. 509-522.
 


Pairwise structure alignment (overviews)

[Brown96] N. Brown, C.A. Orengo,
"A protein structure comparison methodology,"
Computers Chem, 20, 1996, pp. 359-380.
This provides a nice review of structural alignment issues and methods.
 
[Sierk04a] M.L. Sierk,, G.J. Kleywegt,
"Deja vu all over again: finding and analyzing protein structure similarities,"
Structure (Camb), 12, 2004, pp. 2103-2111.
``This article is meant to guide the structural biologist in the basics of structural alignment, and to provide an overview of the available software tools. The main purpose is to encourage users to gain some understanding of the strengths and limitations of structural alignment, and to take these factors into account when interpreting the results of different programs.''
 
[Sierk04b] M.L. Sierk, W.R. Pearson,
"Sensitivity and selectivity in protein structure comparison,"
Protein Science, 13, 2004, pp. 773-785.
This paper compares alignment methods with ROC curves on CATH database. ``Seven protein structure comparison methods and two sequence comparison programs were evaluated on their ability to detect either protein homologs or domains with the same topology (fold) as defined by the CATH structure database. The structure alignment programs Dali, Structal, Combinatorial Extension (CE), VAST, and Matras were tested along with SGM and PRIDE, which calculate a structural distance between two domains without aligning them. We also tested two sequence alignment programs, SSEARCH and PSI-BLAST. Depending upon the level of selectivity and error model, structure alignment programs can detect roughly twice as many homologous domains in CATH as sequence alignment programs. ... These results help quantify the statistical distinction between analogous and homologous structures, and provide a benchmark for structure comparison statistics.''
 
[Godzik96] A. Godzik,
"The structural alignment between two proteins: is there a unique answer?,"
Protein Sci, 5, 1996, pp. 1325-1338.
This paper studies ``the problem of uniqueness and stability of structural alignments with the help of visualization of the suboptimal alignments. It is shown that alignments are often degenerate and whole families of alignments can be generated with almost the same score as the optimal alignment.''
 
[Holm94] L. Holm, C. Sander,
"Searching protein structure databases has come of age,"
Proteins, 19, 1994, pp. 165-173.
 
[Lemmen00] C. Lemmen, T. Lengauer,
"Computational methods for the structural alignment of molecules,"
J Comput Aided Mol Des, 14, 3, 2000, pp. 215-32.
This paper reviews ``the past six years of scientific publishing on molecular superposition. Our focus lies on automatic procedures to be performed on arbitrary molecular structures. Methodical aspects are our main concern here ... providing pointers to the recent literature providing important contributions to computational methods for the structural alignment of molecules. Finally we provide a perspective on how superposition methods can effectively be used for the purpose of virtual database screening.''
 
[Eidhammer00] I. Eidhammer, I. Jonassen, W.R. Taylor,
"Structure comparison and structure patterns,"
J. Comput. Biol., 7, 2000, pp. 658-716.
``This article investigates aspects of pairwise and multiple structure comparison, and the problem of automatically discover common patterns in a set of structures. Descriptions and representation of structures and patterns are described, as well as scoring and algorithms for comparison and discovery. A framework and nomenclature is developed for classifying different methods, and many of these are reviewed and placed into this framework.''
 
[Kolodny04] R. Kolodny, N. Linial,
"Approximate protein structural alignment in polynomial time,"
PNAS, 101, 33, 2004, pp. 12201-12206.
Here, we study the structural alignment problem as a family of optimization problems and develop an approximate polynomial-time algorithm to solve them. We argue that such approximate solutions are, in fact, of greater interest than exact ones because of the noisy nature of experimentally determined protein coordinates.
 


Pairwise structure alignment (methods)

[Holm93] Lisa Holm, Chris Sander,
"Protein Structure Comparison by Alignment of Distance Matrices,"
J. Mol. Biol, 233, 1993, pp. 123-138.
This is the main reference for DALI alignment algorithm
 
[Holm95] L. Holm,, C. Sander,
"Dali: a network tool for protein structure comparison,"
Trends Biochem Sci, 20, 1995, pp. 478-480.
This is a reference for DALI website (http://www.ebi.ac.uk/dali/).
 
[Holm00] L. Holm, J. Park,
"DaliLite workbench for protein structure comparison,"
Bioinformatics, 16, 6, 2000, pp. 566-567.
This is the reference for DaliLite (http://ekhidna.biocenter.helsinki.fi:9801/dali_lite/start)
 
[Subbiah93] S. Subbiah, D.V. Laurents, M. Levitt,
"Structural Similarity of DNA-binding Domains of Bacteriophage Repressors and the Globin Core,"
Current Biol, 3, 1993, pp. 141-148.
This is the original reference for STRUCTAL, which uses an EM algorithm that alternates between solving for the best superposition (least squares) and the best correspondences (dynamic programming).
 
[Gerstein98] M. Gerstein, M. Levitt,
"Comprehensive Assessment of Automatic Structural Alignment against a Manual Standard, the SCOP Classification of Proteins,"
Protein Science, 7, 1998, pp. 445-456.
This is the second reference for STRUCTAL
 
[Krissinel04] E. Krissinel, K. Henrick,
"Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions,"
Acta Crystallogr D Biol Crystallogr, D60, 2004, pp. 2256-2268.
This is the main reference for SSM (http://www.ebi.ac.uk/msd-srv/ssm/), which aligns proteins structures in two phases. The first phase aligns the main alpha-helix and beta-sheet secondary structure elements. The second phase aligns the alpha-carbon atoms of residues more precisely.
 
[Harrison03] A. Harrison, F. Pearl, I. Sillitoe, T. Slidel, R. Mott, J.M. Thornton, C. Orengo,
"Recognising the fold of a protein structure,"
Bioinformatics, 19, 2003, pp. 1748-1759.
This is the main reference for GRATH
 
[Taylor89] W.R. Taylor, C.A.Orengo,
"Protein Structure Alignment,"
J. Mol. Biol., 208, 1, 1989.
This is the original reference for SSAP, which employs double dynamic programming.
 
[Orengo90] C.A. Orengo, W.R. Taylor,
"A Rapid Method for Protein Structure Alignment,"
J. Theor Biol, 147, 1990, pp. 517-551.
 
[Orengo92] C.A. Orengo, N.P. Brown, W.R. Taylor,
"Fast structure alignment for protein databank searching,"
Proteins, 14, 1992, pp. 139-167.
This describes a fast version of SSAP suitable for database searching. It is used to build the 2nd (A) and 3rd (T) levels of the CATH hierarchy.
 
[Orengo96] C.A. Orengo, W.R. Taylor,
"SSAP: sequential structure alignment program for protein structure comparison,"
Methods Enzymol, 266, 1996, pp. 617-635.
This is a reference for SSAP (http://www.biochem.ucl.ac.uk/~orengo/ssap.html)
 
[Madej95] T. Madej, J.F. Gibrat, S.H. Bryant,
"Threading a database of protein cores,"
Proteins, 23, 1995, pp. 356-369.
This is the main reference for VAST
 
[Gibrat96] J.F. Gibrat, T. Madej, S.H. Bryant,
"Surprising similarities in structure comparison,"
Curr Opin Struct Biol, 6, 3, 1996, pp. 377-385.
Describes results achieved with VAST
 
[Shindyalov98] I.N. Shindyalov, P.E. Bourne,
"Protein structure alignment by incremental combinatorial extension (CE) of the optimal path,"
Protein Eng, 11, 1998, pp. 739-747.
This is the main reference for CE (http://cl.sdsc.edu/ce.html).
 
[Zhu05] J. Zhu J, Z. Weng,
"FAST: a novel protein structure alignment algorithm,"
Proteins, 58, 2005, pp. 618-627.
This is the main reference for FAST (http://biowulf.bu.edu/FAST/).
 
[Maiti04] R. Maiti, G.H. Van Domselaar, H. Zhang, D.S. Wishart,
"SuperPose: a simple server for sophisticated structural superposition,"
Nucleic Acids Res, 1, 32, 2004, pp. W590-W594.
This is the main reference for SuperPose (http://wishart.biology.ualberta.ca/SuperPose/).
 
[Lessel94] U. Lessel, D. Schomburg,
"Similarities between protein 3-D structures,"
Protein Engineering, 7, 10, 1994, pp. 1175-1187.
This is the reference for Protein3Dfit (http://biotool.uni-koeln.de:8080/3dalign_neu/cgi-bin/3daligner.py).
 
[Szustakowski00] J.D. Szustakowski, Z. Weng,
"Protein structure alignment using a genetic algorithm,"
Proteins, 38, 4, 2000, pp. 428-440.
This is the main reference for K2/K2SA (http://zlab.bu.edu/k2sa/).
 
[Chen05] L. Chen, T., T. Zhou, Y. Tang,
"Protein structure alignment by deterministic annealing,"
Bioinformatics, 21, 2005, pp. 51-62.
 
[Ilyin04] V. A. Ilyin, A. Abyzov, C. M. Leslin,
"Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point,"
Protein Sci, 13, 7, 2004, pp. 1865-1874.
 


Pairwise structure alignment (comparisons)

[Kolodny05] Rachel Koldny, Patrice Koehl, Michael Levitt,
"Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures,"
J Mol Biol, 346, 2005, pp. 1173-1188.
 
[Novotny04] M. Novotny, D. Madsen, G.J. Kleywegt,
"Evaluation of protein fold comparison servers,"
Proteins, 54, 2004, pp. 260-270.
Watson05: The authors perform a wide-ranging evaluation of 11 publicly available fold comparison servers They use the CATH database as a reference for their tests. The results show that no one server provides 100\% accuracy and therefore multiple methods should be used to assess similarities to known structures.
 
[Leplae02] R. Leplae, T.J.P. Hubbard,
"MaxBench: evaluation of sequence and structure comparison methods,"
Bioinformatics, 18, 3, 2002, pp. 494-495.
Compares alignment methods with ROC curves on SCOP database.
 


Multiple structure alignment

[Russell92] R.B. Russell, G.J. Barton,
"Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels,"
Proteins, 14, 1992, pp. 309-323.
This is the main reference for STAMP (bioinfo.ucr.edu/pise/stamp.html).
 
[Ye05] Y. Ye, A. Godzik,
"Multiple flexible structure alignment using partial order graphs,"
Bioinformatics, 21, 10, 2005, pp. 2362-2369.
 
[Shatsky04] M. Shatsky, R. Nussinov, H.J. Wolfson,
"A method for simultaneous alignment of multiple protein structures,"
Proteins, 56, 1, 2004, pp. 143-156.
This is the main reference for MultiProt (http://bioinfo3d.cs.tau.ac.il/MultiProt/).
 
[Dror03] O. Dror, H. Benyamini, R. Nussinov, H.J. Wolfson,
"Multiple structural alignment by secondary structures: Algorithm and applications,"
Protein Sci, 12, 11, 2003, pp. 2492-2507.
 
[Gud04] C. Guda, S. Lu, E.D. Scheeff, P.E. Bourne, I.N. Shindyalov,
"CE-MC: a multiple protein structure alignment server,"
Nucleic Acids Res, 32, 2004, pp. W100-W103.
This is the multiple alignment version of CE (http://cemc.sdsc.edu/).
 
[Lupyan05] D. Lupyan, A. Leo-Macias, A.R. Ortiz,
"A new progressive-iterative algorithm for multiple structure alignment,"
Bioinformatics, 21, 15, 2005, pp. 3255-3263.
 
[Leibowitz01] N. Leibowitz, R. Nussinov, H.J. Wolfson,
"MUSTA - a general, efficient, automated method for multiple structure alignment and detection of common motifs: application to proteins,"
J Comput Biol, 8, 2, 2001, pp. 93-121.
 
[Taylor94] W.R. Taylor, T.P. Flores, C.A. Orengo,
"Multiple protein structure alignment,"
Protein Science, 3, 10, 1994, pp. 1858-1870.
 


Protein-ligand binding site representation overviews

[Campbell03] S.J. Campbell, N.D. Gold, R.M. Jackson, D.R. Westhead,
"Ligand binding functional site location, similarity and docking,"
Curr Opin Struct Biol, 13, 2003, pp. 389-395.
Overview of ways to find and compare protein-ligand binding sites
 
[Sotriffer02] C. Sotriffer, G. Klebe,
"Identification and mapping of small-molecule binding sites in proteins: computational tools for structure-based drug design,"
Il Farmaco, 57, 2002, pp. 243-251.
 
[Via00] A. Via, F. Ferre, B. Brannetti, M. Helmer-Citterich,
"Protein surface similarities: a survey of methods to describe and compare protein surfaces,"
Cellular and Molecular Life Sciences, 57, 2000, pp. 1970-1977.
 


Protein-ligand binding site analysis

[Stockwell05] Gareth Stockwell,
"Structural Diversity of Biological Ligands and their Binding Sites in Proteins,"
2005.
 
[Bartlett02] G.J. Bartlett, C.T. Porter, N.Borkakoti, J.M. Thornton,
"Analysis of catalytic residues in enzyme active sites,"
J. Mol. Biol, 324, 1, 2002, pp. 105-121.
 
[Puvanendrampillai03] D. Puvanendrampillai, J. Mitchell,
"Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein-ligand complexes,"
Bioinformatics, 19, 14, 2003, pp. 1856-1857.
This is the main reference for the Protein Ligand Database (PLD).
 
[Ringe95] D. Ringe,
"What makes a binding site a binding site?,"
Curr Opin Struct Biol, 5, 6, 1995, pp. 825-829.
 
[Vajda06] S. Vajda, F. Guarnieri,
"Characterization of protein-ligand interaction sites using experimental and computational methods,"
Curr Opin Drug Discov Devel, 9, 3, 2006, pp. 354-362.
 
[Kelly05] M.S. Kelly, R.L. Mancera,
"A new method for estimating the importance of hydrophobic groups in the binding site of a protein,"
J Med Chem, 48, 4, 2005, pp. 1069-1078.
 
[Lian94] L.Y. Lian, I.L. Barsukov, M.J. Sutcliffe, K.H. Sze, G.C. Roberts,
"Protein-ligand interactions: exchange processes and determination of ligand conformation and protein-ligand contacts,"
Methods Enzymol, 239, 1994, pp. 657-700.
 


Protein-ligand binding site detection from geometry

[Weisel07] M. Weisel, E. Proschak, G. Schneider,
"PocketPicker: analysis of ligand binding-sites with shape descriptors,"
Chemistry Central Journal, 1, 7, 2007.
 
[Brady00] G.P. Brady, Jr., P.F.W. Stouten,
"Fast prediction and visualization of protein binding pockets with PASS,"
Journal of Computer-Aided Molecular Design, 14, 4, 2000, pp. 383-401.
This is the main reference for PASS, a system for detecting pockets in proteins that successively constructs layers of points starting at the surface of the protein and working towards the middle of voids. Points are rejected if they are ``too'' solvent accessible, thus leaving points only inside pockets.
 
[Peters96] K.P. Peters, J. Fauck, C. Frommel,
"The automatic search for ligand binding sites in proteins of known three-dimensional structure using only geometric criteria,"
J Mol Biol, 256, 1996, pp. 201-213.
This is the main reference for APROPOS, a system for detecting protein pockets with alpha shapes.
 
[Hendlich97] M. Hendlich, F. Rippman, G. Barnickel,
"LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins,"
J. Mol. Graph., 15, 1997, pp. 359-363.
This paper is the main reference for LIGSITE, a method to detect binding site pockets. Following POCKET [Levitt], it fills a grid with values representing the number of angles from which every point is visible to the outside (sampling only 7 angles), thereby providing a measure of how deeply a point is embedded in a concave pocket.
 
[Levitt92] D. Levitt, L. Banaszak,
"POCKET: A computer graphics method for identifying and displaying protein cavities and their surrounding amino acids,"
J. Mol. Graphics, 10, 1992, pp. 229-234.
This is the main reference for POCKET, a system for identifying free-space points deeply buried in pockets by counting the number of axial directions for which the point is occluded from both directions. This method is followed-up by LIGSITE, which considers more than just 3 axial directions.
 
[Nayal06] M. Nayal, B. Honig,
"On the nature of cavities on protein surfaces: Application to the identification of drug-binding sites,"
Proteins: Structure, Function, and Bioinformatics, 63, 4, 2006, pp. 892-906.
 
[Ho90] C.M.W. Ho, G.R. Marshall,
"Cavity Search: an algorithm for the isolation and display of cavity-like binding regions,"
J Comput-Aided Mol Des, 1990, pp. 337-354.
This is the main reference for Cavity Search.",
 
[Coleman06] R.G. Coleman, K.A.Sharp,
"Travel depth, a new shape descriptor for macromolecules: application to ligand binding,"
J Mol Biol, 362, 2006, pp. 441-458.
This is the main reference for Travel detph.
 
[Kim06] D. Kim, C. Cho, D. Kim, Y. Cho,
"Recognition of docking sites on a protein using [beta]-shape based on Voronoi diagram of atoms,"
Computer-Aided Design, 38, 5, 2006, pp. 431-443.
 
[Frommel96] C. Frommel, K.P. Peters, J. Fauck,
"The automatic search for ligand binding sites in proteins of known three dimentional structure using only geometric criteria,"
J. Mol. Biol., 256, 1996, pp. 201-213.
 
[Pettit99] F.K. Pettit, J.U. Bowie,
"Protein surface roughness and small molecular binding sites,"
J. Mol. Biol., 285, 1999, pp. 1377-1382.
 
[Laskowski96a] R.A. Laskowski, N.M. Luscombe, M.B. Swindells, J.M. Thornton,
"Protein clefts in molecular recognition and function,"
Prot. Sci, 5, 12, 1996, pp. 2438-2452.
This paper analyzes the properties of binding sites predicted with Surfnet
 
[Laskowski95] R.A. Laskowski,
"Surfnet: a program for visualizing molecular surfaces, cavities, and intermolecular interactions,"
J Mol Graph, 13, 1995, pp. 323-330.
This is the main reference for Surfnet, a program that detects binding site pockets by constructing spheres whose diameters are chords between solvent accessible residues of the protein - spheres are rejected if the center of the chord lies with a certain distance (4 angstroms) of the protein surface or if the chord is more than a certain length (10 angstroms). The pocket is predicted to be the volume covered by the union of the spheres.
 
[Masuya95] M. Masuya, J. Doi,
"Detection and geometric modeling of molecular surfaces and cavities using digital mathematical morphological operations,"
J Mol Graph, 13, 1995, pp. 331-336.
Uses mathematical morphology operations (erode, dilate, close) to detect cavities in a protein surface as the difference between the closure of the protein surface using a certain radius and the molecule itself. The method is demonstrated for two proteins.
 
[DelCarpio93] C.A. Del Carpio, Y. Takahashi, S. Sasaki,
"A New Approach to the Automatic Identification of Candidates for Ligand Receptor Sites in Proteins: (I) Search for Pocket Regions,"
J. Mol. Graph., 11, 1993, pp. 23-29.
 
[Chang04] D.T. Chang, C.Y. Chen, W.C. Chung, Y.J. Oyang, H.F. Juan, H.C. Huang,
"ProteMiner-SSM: a web server for efficient analysis of similar protein tertiary substructures,"
Nucleic Acids Res, 32, 2004, pp. W76-W82.
Uses probability distributions derived from splats at atoms to detect binding sites.
 
[Halperin03] I. Halperin, H. Wolfson, R. Nussinov,
"SiteLight: Binding-site prediction using phage display libraries,"
Protein Science, 12, 2003, pp. 1344-1359.
 
[BenShimon05] A. Ben-Shimon, M. Eisenstein,
"Looking at Enzymes from the Inside out: The Proximity of Catalytic Residues to the Molecular Centroid can be used for Detection of Active Sites and Enzyme-Ligand Interfaces,"
J. Mol. Biol., 351, 2005, pp. 309-326.
 


Protein-ligand binding site detection from conservation

[Pils06] B. Pils, R.R. Copley, J. Schultz,
"Variation in structural location and amino acid conservation of functional sites in protein domain families,"
BMC Bioinformatics, 6, 2005.
 
[Cheng05] G. Cheng, B. Qian, R. Samudrala, D. Baker,
"Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design,"
Nucleic Acids Res, 33, 18, 2005, pp. 5861-5867.
 
[Huang06] B. Huang, M. Schroeder,
"LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation,"
BMC Struct Biol, 6, 2006, pp. 19-29.
 
[Glaser06] F. Glaser, R. Morris, R. Najmanovich, R. Laskowski, J. Thornton,
"A method for localizing ligand binding pockets in protein structures,"
Proteins, 62, 2006, pp. 479-488.
 
[Nimrod05] G. Nimrod, F. Glaser, D. Steinberg, N. Ben-Tal, T. Pupko,
"In silico identification of functional regions in proteins,"
Bioinformatics, 21 Suppl., 2005, pp. i328-i337.
 
[Chelliah04] V. Chelliah, L. Chen, T. Blundell, S. Lovell,
"Distinguishing structural and functional restraints in evolution in order to identify interaction sites,"
J Mol Biol, 342, 2004, pp. 1487-1504.
Watson05: This method distinguishes residues conserved for functional reasons from those that are highly conserved because they are constrained by the structure. By comparing the observed sequence conservation with the predicted conservation (based on amino acid type and environmental constraints), the authors construct environment-specific substitution tables for use in identifying functionally conserved residues
 
[Innis04] C.A. Innis, A.P. Anand, R. Sowdhamini,
"Prediction of functional sites in proteins using conserved functional group analysis,"
J Mol Biol, 337, 2004, pp. 1053-1068.
Watson05: This new method describes the conservation of a protein-surface using chemical groups rather than the amino acids A multiple sequence alignment is used to identify conserved functional group clusters, the size of which is determined by the number of proteins contributing to it. These are mapped onto the surface to identify active sites
 
[Lichtarge03] O. Lichtarge, H. Yao, D.M. Kristensen, S. Madabushi, I. Mihalek,
"Accurate and scalable identification of functional sites by evolutionary tracing,"
J Struct Funct Genomics, 4, 2003, pp. 159-166.
 
[Lichtarge02] O. Lichtarge, M.E. Sowa,
"Evolutionary predictions of binding surfaces and interactions,"
Curr Opin Struct Biol, 12, 2002, pp. 21-27.
 
[Joachimiak02] M.P. Joachimiak, F.E. Cohen,
"JEvTrace: refinement and variations of the evolutionary trace in JAVA,"
Genome Biol, 3, 2002, pp. RESEARCH0077.
 
[DelSolMesa03] A. Del Sol Mesa, F. Pazos, A. Valencia,
"Automatic methods for predicting functionally important residues,"
J Mol Biol, 326, 2003, pp. 1289-1302.
 
[Yao03] H. Yao, D.M. Kristensen, I. Mihalek, M.E. Sowa, C. Shaw C, M. Kimmel, L. Kavraki, O. Lichtarge,
"An accurate, sensitive and scalable method to identify functional sites in protein structures,"
J Mol Biol, 334, 2003, pp. 387-401.
 
[Sjolander04] K. Sjolander,
"Phylogenomic inference of protein molecular function: advances and challenges,"
Bioinformatics, 20, 2004, pp. 170-179.
 
[La05] D. La, B. Sutch, D.R. Livesay,
"Predicting protein functional sites with phylogenetic motifs,"
Proteins, 58, 2005, pp. 309-320.
 
[Abhiman05] S. Abhiman, E.L.L. Sonnhammer,
"FunShift: a database of function shift analysis on protein subfamilies,"
Nucleic Acids Res, 33, 2005, pp. D197-D200.
 
[Zhang99] B. Zhang, L. Rychlewski, K. Pawlowski, J.S. Fetrow, J. Skolnick, A. Godzik,
"From fold predictions to function predictions: automation of functional site conservation analysis for functional genome predictions,"
Protein Sci, 8, 5, 1999, pp. 1104-1115.
SITE/Site Match
 


Protein-ligand binding site detection from probe energetics

[Laurie05] A.T.R. Laurie, R.M. Jackson,
"Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites,"
Bioinformatics, 2005.
Predicts binding site location by filling grid with van der Waals interaction potentials for a methyl probe. Evaluated with data set of 134 proteins.
 
[An05] J. An, M. Totrov, R. Abagyan,
"Pocketome via comprehensive identification and classification of ligand binding envelopes,"
Mol Cell Proteomics, 4, 6, 2005, pp. 752-761.
This is a reference for PocketFinder, a system for detecting binding site pockets by filling a grid with values representing a van der Waals force field (according to Lennard-Jones formula). A suitable theshold is chosen, and the grid points with value above the threshold are considered ``inside'' the binding pocket. The paper contains an evaluation of the method for a large number of PDB files, both with and without bound ligands.
 
[An04] J. An, M. Totrov, R. Abagyan,
"Comprehensive Identification of ``Druggable'' Protein Ligand Binding Sites,"
Genome Informatics, 15, 2, 2004, pp. 31-41.
This is very similar to [An05]
 


Protein-ligand binding site detection from electrostatic potential

[Bate04] P. Bate, J. Warwicker,
"Enzyme/non-enzyme discrimination and prediction of enzyme active site location using charge-based methods,"
J Mol Biol, 340, 2, 2004, pp. 263-276.
 
[Shanahan04] H.P. Shanahan, M.A. Garcia, S. Jones, J.M. Thornton,
"Identifying DNA-binding proteins using structural motifs and the electrostatic potential,"
Nucleic Acids Res, 32, 2004, pp. 4732-4741.
 


Protein-ligand binding site detection from residue instability

[Elcock01] A.H. Elcock,
"Prediction of functionally important residues based solely on the computed energetics of protein structure,"
J. Mol. Biol., 312, 4, 2001, pp. 885-896.
Uses instability of residues to predict which ones are in active binding site - among conserved residues, predicts that stable ones are in core, and instable ones are in active site.
 


Protein-ligand binding site detection from residue packing

[Amitai04] G. Amitai, A. Shemesh, E. Sitbon, M. Shklar, D. Netanely, I. Venger, S. Pietrokovski,
"Network analysis of protein structures identifies functional residues,"
J Mol Biol, 344, 2004, pp. 1135-1146.
``We transformed protein structures into residue interaction graphs (RIGs), where amino acid residues are graph nodes and their interactions with each other are the graph edges. We found that active site, ligand-binding and evolutionary conserved residues, typically have high closeness values. Residues with high closeness values interact directly or by a few intermediates with all other residues of the protein. Combining closeness and surface accessibility identified active site residues in 70\% of 178 representative structures.''
 


Protein-ligand binding site detection from microscopic titration curves

[Ko05] J. Ko, J.L.F. Murga, Y. Wei, M.J. Ondrechen,
"Prediction of active sites for protein structures from computed chemical properties,"
Bioinformatics, 21, 1, 2005, pp. i258-i265.
Uses microscopic titration curves to detect functional residues
 
[Ondrechen01] M.J. Ondrechen, J.G. Clifton, D. Ringe,
"Thematics: a simple computational predictor of enzyme function from structure,"
Proc. Natl Acad. Sci., 98, 2001, pp. 12473-12478.
 
[Ringe04] D. Ringe D, Y. Wei, K.R. Boino, M.J. Ondrechen,
"Protein structure to function: insights from computation,"
Cell Mol Life Sci, 61, 2004, pp. 387-392.
Finds binding sites using theoretical microscopic titration curves
 


Protein-ligand binding site detection from docking analyses

[Silberstein03] Michael Silberstein, Sheldon Dennis, Lawrence Brown III, Tamas Kortvelyesi, Karl Clodfelter, Sandor Vajda,
"Identification of Substrate Binding Sites in Enzymes by Computational Solvent Mapping,"
J. Mol. Biol., 332, 2003, pp. 1095-1113.
Docks many small molecule fragments and then predicts that active residues are the ones closest to the docked positions of the fragments
 
[Dennis02] Sheldon Dennis, Tamas Kortvelyesi, Sandor Vajda,
"Computational mapping identifies the binding sites of organic solvents on proteins,"
PNAS, 99, 7, 2002, pp. 4290-4295.
 
[Bliznyuk99] A. Bliznyuk, J. Gready,
"Simple method for locating possible ligand binding sites on protein surfaces,"
J. Comput. Chem., 9, 1999, pp. 983-988.
Uses FFT to dock rigid ligand using a simple shape correlation function in order to find the correct binding site, which will later be analyzed by more detailed (energetic) docking methods.
 


Protiein-ligand binding site analysis and prediction from multiple properties

[Guo05] T. Guo, Y. Shi, Z. Sun,
"A novel statistical ligand-binding site predictor: application to ATP-binding sites,"
Protein Engineering, Design and Selection, 18, 2, 2005, pp. 65-70.
 
[Rossi06] A. Rossi , M.A. Marti-Renom, A. Sali,
"Localization of binding sites in protein structures by optimization of a composite scoring function,"
Protein Sci, 15, 10, 2006, pp. 2366-2380.
 
[Zvelebil88] M.J.J.M. Zvelebil, M.J.E. Sternberg,
"Analysis and prediction of the location of catalytic residues in enzymes,"
Protein Engineering, 2, 2, 1988, pp. 127-138.
 
[Huan-Xiang01] Z. Huan-Xiang, S. Yibing,
"Prediction of protein interaction sites from sequence profile and residue neighbor list,"
Proteins, 44, 2001, pp. 336-343.
 
[Cilia07] Elisa Cilia,
"Protein Active Site Detection using SVMs and Kernel Methods,"
Contribution to the Learning and Intelligent Optimization Workshop (LION), Andalo (TN), Italy, 2007.
This looks like a report to the author's research group
 
[Cilia06] Elisa Cilia, Alessandro Moschitti, Sergio Ammendola, Roberto Basili,
"Structured Kernels for Automatic Detection of Protein Active Sites,"
Mining and Learning with Graphs Workshop (MLG), 2006.
 
[Petrova06] N.V. Petrova, C.H. Wu,
"Prediction of catalytic residues using support vector machine with selected protein sequence and structural properties,"
BMC Bionformatics, 7, 2006, pp. 312-324.
 
[Keil04] M. Keil, T.E. Exner, J. Brickmann,
"Pattern recognition strategies for molecular surfaces: III. Binding site prediction with a neural network,"
J Comput Chem, 25, 6, 2004, pp. 779-789.
 
[Gutteridge03] A. Gutteridge, G.J. Bartlett, J.M. Thornton,
"Using a neural network and spatial clustering to predict the location of active sites in enzymes,"
J Mol Biol, 330, 2003, pp. 719-734.
 
[Bradford04] J.R. Bradford, D.R. Westhead,
"Improved prediction of protein-protein binding sites using a support vector machines approach,"
Bioinformatics, 2004.
 
[Chen04] S-C. Chen, I. Bahar,
"Mining frequent patterns in protein structures: a study of protease families,"
Bioinformatics, 20, 1, 2004, pp. i1-i9.
 
[Alexandrov94] N.N. Alexandrov, N. Go,
"Biological meaning, statistical significance, and classification of local spatial similarities in nonhomologous proteins,"
Protein Sci, 3, 1994, pp. 866-875.
 
[Bagley95a] S.C. Bagley, R.B. Altman,
"Characterizing the microenvironment surrounding protein sites,"
Protein Sci, 4, 4, 1995, pp. 622-635.
This is the main reference for Feature. ``Sites are microenvironments within a biomolecular structure, distinguished by their structural or functional role. A site can be defined by a three-dimensional location and a local neighborhood around this location in which the structure or function exists. We have developed a computer system to facilitate structural analysis (both qualitative and quantitative) of biomolecular sites. Our system automatically examines the spatial distributions of biophysical and biochemical properties, and reports those regions within a site where the distribution of these properties differs significantly from control nonsites. The properties range from simple atom-based characteristics such as charge to polypeptide-based characteristics such as type of secondary structure. Our analysis of sites uses non-sites as controls, providing a baseline for the quantitative assessment of the significance of the features that are uncovered. In this paper, we use radial distributions of properties to study three well-known sites (the binding sites for calcium, the milieu of disulfide bridges, and the serine protease active site). We demonstrate that the system automatically finds many of the previously described features of these sites and augments these features with some new details. In some cases, we cannot confirm the statistical significance of previously reported features. Our results demonstrate that analysis of protein structure is sensitive to assumptions about background distributions, and that these distributions should be considered explicitly during structural analyses.''
 
[Bagley95b] S.C. Bagley, L. Wei, C. Cheng, R.B. Altman,
"Characterizing oriented protein structural sites using biochemical properties,"
Proc Int Conf Intell Syst Mol Biol, 3, 1995, pp. 12-20.
``A protein site is a region of a three-dimensional protein structure with a distinguishing functional or structural role. Certain sites recur in different protein structures (for example catalytic sites, calcium binding sites, and some types of turns), but maintain critical shared features. To facilitate the analysis of such protein sites, we have developed a computer system for analyzing the spatial distributions of biochemical properties around a site. The system takes a set of similar sites and a set of control nonsites, and finds differences between them. Specifically, it compares distributions of the properties surrounding the sites with those surrounding the nonsites, and reports statistically significant differences. In this paper, we use our method to analyze the features in the active site of the serine protease enzymes. We compare the use of radial distributions (shells) with 3-D grids (blocks) in the analysis of the active site. We demonstrate three different strategies for focusing attention on significant findings, based on properties of interest, spatial volumes of interest, and on the level of statistical significance. Finally, we show that the program automatically identifies conserved sequential, secondary structural and biophysical features of the serine protease active site, using noncatalytic histidine residues as a control environment.''
 
[Bagley96] S.C. Bagley, R.B. Altman,
"Conserved features in the active site of nonhomologous serine proteases,"
Fold Des, 1, 5, 1996, pp. 371-379.
``BACKGROUND: Serine protease activity is critical for many biological processes and has arisen independently in a few different protein families. It is not clear, though, the degree to which these protease families share common biochemical and biophysical properties. We have used a computer program to study the properties that are shared by four serine protease active sites with no overall structural or sequence homology. The program systematically compares the region around the catalytic histidines from the four proteins with a set of noncatalytic histidines, used as controls. It reports the three-dimensional locations and level of statistical significance for those properties that distinguish the catalytic histidines from the noncatalytic ones. The method of analysis is general and can be applied easily to other active sites of interest. RESULTS: As expected, some of the reported properties correspond to previously known features of the serine protease active site, including the catalytic triad and the oxyanion hole. Novel properties are also found, including the spatial distribution of charged, polar, and hydrophobic groups arranged to stabilize the catalytic residues, and a relative abundance of some residues (Val, Tyr, Leu, and Gly) around the active site. CONCLUSIONS: Our findings show that in addition to some properties common to all the proteases examined, there are a set of preferred, but not required, properties that can be reliably observed only by aligning the sites and comparing them with carefully selected statistical controls.''
 
[Banatao03] D.R. Banatao, R.B. Altman, T.E. Klein,
"Microenvironment analysis and identification of magnesium binding sites in RNA,"
Nucleic Acids Res, 31, 15, 2003, pp. 4450-4460.
Used the FEATURE algorithm to determine `` novel physicochemical descriptions of site-bound and diffusely bound Mg2+ ions in RNA that are useful for prediction. Electrostatic calculations using the Non-Linear Poisson Boltzmann (NLPB) equation provided further evidence for the locations of site-bound ions. We confirmed the locations of experimentally determined sites and further differentiated between classes of ion binding. We also identified potentially important, high scoring sites in the group I intron that are not currently annotated as Mg2+ binding sites.''
 
[Wei03] L. Wei, R.B. Altman,
"Recognizing Complex, Asymmetric functional sites in protein structures using a Bayesian scoring function,"
Journal of Bioinformatics and Computational Biology, 1, 1, 2003, pp. 119-138.
 
[Liang03a] M.P. Liang, D.R. Banatao, T.E. Klein, D.L. Brutlag, R.B. Altman,
"WebFEATURE: An interactive web tool for identifying and visualizing functional sites on macromolecular structures,"
Nucleic Acids Res, 31, 13, 2003, pp. 3324-3327.
``WebFEATURE (http://feature.stanford.edu/webfeature/) is a web-accessible structural analysis tool that allows users to scan query structures for functional sites in both proteins and nucleic acids. WebFEATURE is the public interface to the scanning algorithm of the FEATURE package, a supervised learning algorithm for creating and identifying 3D, physicochemical motifs in molecular structures. Given an input structure or Protein Data Bank identifier (PDB ID), and a statistical model of a functional site, WebFEATURE will return rank-scored ``hits'' in 3D space that identify regions in the structure where similar distributions of physicochemical properties occur relative to the site model. Users can visualize and interactively manipulate scored hits and the query structure in web browsers that support the Chime plug-in. Alternatively, results can be downloaded and visualized through other freely available molecular modeling tools, like RasMol, PyMOL and Chimera. A major application of WebFEATURE is in rapid annotation of function to structures in the context of structural genomics.''
 
[Banatao01] D.R. Banatao, C.C. Huang, P.C. Babbitt, R.B. Altman, T.E. Klein,
"ViewFeature: integrated feature analysis and visualization,"
Pac Symp Biocomput, 2001, pp. 240-250.
``We have developed an extension to the molecular visualization program Chimera that integrates Feature's statistical models and site predictions with 3-dimensional structures viewed in Chimera. We call this extension ViewFeature, and it is designed to help users understand the structural Features that define a site of interest. We applied ViewFeature in an analysis of the enolase superfamily; a functionally distinct class of proteins that share a common fold, the alpha/beta barrel, in order to gain a more complete understanding of the conserved physical properties of this superfamily. In particular, we wanted to define the structural determinants that distinguish the enolase superfamily active site scaffold from other alpha/beta barrel superfamilies and particularly from other metal-binding alpha/beta barrel proteins. Through the use of ViewFeature, we have found that the C-terminal domain of the enolase superfamily does not differ at the scaffold level from metal-binding alpha/beta barrels. We are, however, able to differentiate between the metal-binding sites of alpha/beta barrels and those of other metal-binding proteins. We describe the overall architectural Features of enolases in a radius of 10 Angstroms around the active site.''
 
[Liang03b] M.P. Liang, D.L. Brutlag, R.B. Altman,
"Automated construction of structural motifs for predicting functional sites on protein structures,"
Pac Symp Biocomput, 2003, pp. 204-215.
``We describe a method to predict functional sites by automatically creating three dimensional structural motifs from amino acid sequence motifs. These structural motifs perform comparably well with manually generated structural motifs and perform better than sequence motifs.''
 
[Waugh01] A. Waugh, G.A. Williams, L. Wei, R.B. Altman,
"Using meta computing tools to facilitate large-scale analyses of biological databases,"
Pac Symp Biocomput, 2001, pp. 360-371.
``We use a distributed computing environment, Legion, to enable large-scale computations on the Protein Data Bank (PDB). In particular, we employ the Feature program to scan all protein structures in the PDB in search for unrecognized potential cation binding sites. We evaluate the efficiency of Legion's parallel execution capabilities and analyze the initial biological implications that result from having a site annotation scan of the entire PDB. We discuss four interesting proteins with unannotated, high-scoring candidate cation binding sites.''
 
[Wei98] L. Wei, R.B. Altman,
"Recognizing protein binding sites using statistical descriptions of their 3D environments,"
Pac Symp Biocomput, 1998, pp. 497-508.
This is the main reference for matching of binding sites with Feature. ``We have developed a new method for recognizing sites in three-dimensional protein structures. Our method is based on our previously reported algorithm for creating descriptions of protein microenvironments using physical and chemical properties at multiple levels of detail (including features at the atomic, chemical group, residue, and secondary structural levels). The recognition method takes three inputs: a set of sites that share some structural or functional role, a set of control nonsites that lack this role, and a single query site. The values of properties for the query site are compared to the distributions of values for both sites and nonsites to determine the group to which it is most similar. A log-odds scoring function, based on Bayes' Rule, computes a score that indicates the likelihood that the query region is a site of interest. In this paper, we apply the method to the task of identifying calcium binding sites in proteins. Cross-validation analysis shows that this recognition approach has high sensitivity and specificity. We also describe the results of scanning four calcium binding proteins (with the calcium removed) using a three-dimensional grid of probe points at 2 A spacing. The probe points that have high scores cluster around the true calcium binding sites, with the highest scoring points at or near the binding sites. The method fails in only one case where a calcium binding site is created by four proteins in the crystal lattice, and is thus not recognizable within the crystallographic asymmetric unit. Our results show that property-based descriptions can be used for recognizing protein sites in unannotated structures.''
 
[Wei97] L. Wei, R.B. Altman, J.T. Chang,
"Using the radial distributions of physical features to compare amino acid environments and align amino acid sequences,"
Pac Symp Biocomput, 1997, pp. 465-476.
``We have performed a comprehensive analysis of the microenvironments surrounding the twenty amino acids. Our analysis includes comparison of amino acid environments with random control environments as well as with each of the other amino acid environments. We describe the amino acid environments with a set of 21 features summarizing atomic, chemical group, residue, and secondary structural features. The environments are divided into radial shells of 1 A thickness to represent the distance of the features from the amino acid C beta atoms. We make the results of our analysis available graphically over the world wide web. To illustrate the validity and utility of our analysis, we used the amino acid comparative profiles to construct a substitution matrix, the WAC matrix, based on a simple summary of the computed environmental differences. We compared our matrix to BLOSUM62 and PAM250 in BLAST searches with query sequences selected from 39 protein families found in the PROSITE database. Although BLOSUM62 was the most sensitive matrix overall, our matrix was more sensitive for some families, and exhibited overall performance similar to PAM250. Our results suggest that the radial distribution of biochemical and biophysical features is useful for comparing amino acid environments, and that similarity matrices based on the geometric distribution of features around amino acids may produce improved search sensitivity.''
 
[Yoon07] S. Yoon, J.C. Ebert, E-Y. Chung, G. De Micheli, R.B. Altman,
"Clustering protein environments for function prediction: finding PROSITE motifs in 3D,"
BMC Bioinformatics, 8, 4, 2007.
 
[Ota03] M. Ota, K. Kinoshita, K. Nishikawa,
"Prediction of Catalytic Residues in Enzymes Based on Known Tertiary Structure, Stability Profile, and Sequence Conservation,"
Journal of Molecular Biology, 327, 5, 2003, pp. 1053-1064.
 
[Kinoshita05] K. Kinoshita,, H. Nakamura,
"Identification of the ligand binding sites on the molecular surface of proteins,"
Protein Science, 14, 17, 2005, pp. 711-718.
This paper contains results of an experiment for comparison of a few binding site surfaces to a large database (almost all binding site surfaces in the PDB) using the method described in [Kinoshita03]. Since partial surfaces are matched, a similarity scoring method is introduced that considers both a normalized score for the match of the geometry and electrostatics (Z-score) and the ``coverage'' of the match (fractions of the surfaces found to be in correspondence). Results are presented for 18 hypothetical proteins.
 


Protein-ligand binding site representations with pseudo-atoms and matching with association graphs

[Artymiuk94] P.J. Artymiuk, A.R. Poirrette, H.M. Grindley, D.W. Rice, P. Willett,
"A Graph-Theoretic Approach to the Identification of 3-Dimensional Patterns of Amino-Acid Side-Chains in Protein Structures,"
Journal of Molecular Biology, 243, 1994, pp. 327-344.
``This paper discusses the use of graph-theoretic methods for the representation and searching of three-dimensional patterns of side-chains in protein structures. The position a side-chain is represented by pseudo-atoms, and the relative positions of pairs of side-chains by the distances between them. This description of the geometry can be represented by a labelled graph in which the nodes and the edges of the graph represent the pseudo-atoms and the sets of inter-pseudo-atomic distances, respectively. Given such a representation, a protein can be searched for the presence of a user-defined query pattern of side-chains by means of a subgraph-isomorphism algorithm which is implemented in the program ASSAM.''
 
[Schmitt02] S. Schmitt, D. Kuhn, G. Klebe,
"A new method to detect related function among proteins independent of sequence and fold homology,"
J Mol Biol, 323, 2002, pp. 387-406.
This is the main reference for pseudo-centers. Also, describes Cavbase. Finds maximal clique in association graph to match sets of pseudo-centers.
 
[Weskamp04] N. Weskamp, D. Kuhn, E. Hullermeier, G. Klebe,
"Efficient similarity search in protein structure databases by k-clique hashing,"
Bioinformatics, 20, 2004, pp. 1522-1526.
Describes search of Cavbase (sites represented by pseudo-centers) combining clique detection in association graphs with geometric hashing.
 
[Weskamp03] N. Weskamp, D. Kuhn, E Hellermeier, G. Klebe,
"Efficient Similarity Search in Protein Structure Databases: Improving Clique-Detection through Clique Hashing,"
German Conference on Bioinformatics, Munich, Germany, 2003.
 
[Kupas04] K. Kupas, A. Ultsch, G. Klebe,
"An algorithm for finding similarities in protein active sites,"
ICBA, Fort Lauderdale, FL, 2004.
This paper is uses pseudo-centers to compare binding sites. ``The binding-site exposed physicochemical characteristics are described by assigning generic pseudocenters to the functional groups of the amino acids flanking a particular active site. These pseudocenters are assembled into small substructures. To find substructures with spatial similarity and appropriate chemical properties, an emergent self-organizing map is used for clustering. Two substructures which are found to be similar form the basis for an expanded comparison of the complete cavities. Preliminary results with four pairs of binding cavities show that similarities are detected correctly and motivatefurther studies.''
 
[Spriggs03] R.V. Spriggs, P.J. Artymiuk, P. Willett,
"Searching for patterns of amino acids in 3D protein structures,"
J Chem Inf Comput Sci, 43, 2003, pp. 412-421.
Uses pseuedo-centers and distance subgraph isomorphism. ASSAM represents an amino acid by a vector drawn from the main chain towards the functional part of the amino acid and then computes a graph representation of a protein in which the individual side-chain vectors are the nodes and the intervector distances are the edges. The presence of a query pattern in a Protein Data Bank structure can then be searched for by means of a subgraph isomorphism algorithm.
 
[Brakoulias04] A. Brakoulias, R.M. Jackson,
"Towards a structural classification of phosphate binding sites in protein-nucleotide complexes: an automated all-against-all structural comparison using geometric matching,"
Proteins, 56, 2, 2004, pp. 250-260.
 
[Pickering01] S.J. Pickering, A.J. Bulpitt, N. Efford, N.D. Gold, D.R. Westhead,
"AI-based algorithms for protein surface comparisons,"
Comput Chem, 26, 2001, pp. 79-84.
This paper uses surface points and association graphs to match ligand binding sites.
 
[Milik03] M. Milik, S. Szalma, K.A. Olszewski,
"Common Structural Cliques: a tool for protein structure and function analysis,"
Protein Eng, 16, 2003, pp. 543-552.
``The compared protein structures are condensed to a graph representation, with atoms as nodes and distances as edge labels. Protein graphs are then compared to extract all possible Common Structural Cliques. These cliques are merged to create Structural Templates: graphs that describe structural analogies between compared proteins. Structures of serine endopeptidases were compared in pairs using the presented algorithm with different geometrical parameters.''
 
[Wangikar03] P.P. Wangikar, A.V. Tendulkar, S. Ramya, D.N. Mail, S. Sarawagi,
"Functional sites in protein families uncovered via an objective and automated graph theoretic approach,"
Journal of Molecular Biology, 326, 2003, pp. 955-978.
``We report a method for detection of recurring side-chain patterns (DRESPAT) using an unbiased and automated graph theoretic approach. We first list all structural patterns as sub-graphs where the protein is represented as a graph. The patterns from proteins are compared pair-wise to detect patterns common to a protein pair based on content and geometry criteria. The recurring pattern is then detected using an automated search algorithm from the all-against-all pair-wise comparison data of proteins. Intra-protein pattern comparison data are used to enable detection of patterns recurring within a protein. A method has been proposed for empirical calculation of statistical significance of recurring pattern. The method was tested on 17 protein sets of varying size, composed of non-redundant representatives from SCOP superfamilies.''
 
[Jambon03] M. Jambon, A. Imberty, G. Deleage, C. Geourjon,
"A new bioinformatic approach to detect common 3D sites in protein structures,"
Proteins, 52, 2003, pp. 137-145.
``The basis for this method is a representation of the protein structure by a set of stereochemical groups that are defined independently from the notion of amino acid. An efficient heuristic for finding similarities that uses graphs of triangles of chemical groups to represent the protein structures has been developed.''
 
[Barrow76] H.G. Barrow, R.M. Burstall,
"Subgraph isomorphism, matching relational structures and maximal cliques,"
Inf. Process. Lett., 4, 1976, pp. 83-84.
This is the classical paper about using association graphs to match rigid point sets
 


Protein-ligand binding site representations with pseudo-atoms and matching with geometric hashing

[Shulman-Peleg04] A. Shulman-Peleg, R. Nussinov, H.J. Wolfson,
"Recognition of functional sites in protein structures,"
J Mol Biol, 339, 2004, pp. 607-633.
Watson05: SiteEngine uses modified pseudo-centres and geometric hashing to compare surfaces with the aim of identifying conserved chemistry in similar pockets, which might indicate similar function. ``We achieve high efficiency and speed by introducing a low-resolution surface representation via chemically important surface points, by hashing triangles of physico-chemical properties and by application of hierarchical scoring schemes for a thorough exploration of global and local similarities. We proceed to rigorously apply this method to functional site recognition in three possible ways: first, we search a given functional site on a large set of complete protein structures. Second, a potential functional site on a protein of interest is compared with known binding sites, to recognize similar features. Third, a complete protein structure is searched for the presence of an a priori unknown functional site, similar to known sites.''
 
[Pennec98] X. Pennec,, N. Ayache,
"A geometric algorithm to find small but highly similar 3D substructures in proteins,"
Bioinformatics, 14, 1998, pp. 516-522.
``We propose a new 3D substructure matching algorithm based on geometric hashing techniques. The key feature of the method is the introduction of a 3D reference frame attached to each residue.''
 
[Wolfson97] H.J. Wolfson, I. Rigoutsos,
"Geometric hashing: an overview,"
IEEE Computational Science \& Engineering, 4, 4, 1997, pp. 10-21.
 
[Lamdan88] Y. Lamdan, H. Wolfson,
"Geometric hashing: a general and efficient recognition scheme,"
2nd International Conference on Computer Vision, Tarpon Springs, FL, 1988, pp. 238-251.
This is the main reference for geometric hashing
 
[Norel93] D. Fischer, R. Norel, H. Wolfson, R. Nussinov,
"Surface motifs by a computer vision technique: Searches, detection, and implications for protein-ligand recognition,"
Proteins: Structure, Function, and Genetics, 16, 3, 1993, pp. 278-292.
 


Protein-ligand binding site representations with atoms/residues and matching with combinatorial extension

[Ferre04] F. Ferre, G. Ausiello, A. Zanzoni, M. Helmer-Citterich,
"SURFACE: a database of protein surface regions for functional annotation,"
Nucleic Acids Res, 32, 2004, pp. D240-D244.
Describes a database of binding sites, each represented by a set of points (two per residue - CA and center of side chain). Matching is performed by a combinatoral expansion algorithm. The database is available at http://cbm.bio.uniroma2.it/surface/.
 
[Ivanisenko04] V.A. Ivanisenko, S.S. Pintus, D.A. Grigorovich, N.A. Kolchanov,
"PDBSiteScan: a program for searching for active, binding and posttranslational modification sites in the 3D structures of proteins,"
Nucleic Acids Res, 32, 2004, pp. W549-W554.
Alignment of point sets with combinatorial extension (like CE).
 


Protein-ligand binding site representations with atoms/residues and matching with ???

[Kobayashi97] N. Kobayashi,, N. Go,
"A method to search for similar protein local structures at ligand binding sites and its application to adenine recognition,"
Eur Biophys J, 26, 1997, pp. 135-144.
Utilizes bound ligand to define region of interest and align. ``We have developed a method of searching for similar spatial arrangements of atoms around a given chemical moiety in proteins that bind a common ligand. The first step in this method is to consider a set of atoms that closely surround a given chemical moiety. Then, to compare the spatial arrangements of such surrounding atoms in different proteins, they are translated and rotated so that the chemical moieties are superposed on each other. Spatial arrangements of surrounding atoms in a pair of proteins are judged to be similar, when there are many corresponding atoms occupying similar spatial positions.''
 


Protein-ligand binding site representations with templates

[Jones04] S. Jones, J.M. Thornton,
"Searching for functional sites in protein structures,"
Curr Opin Chem Biol, 8, 2004, pp. 3-7.
Contains brief overview of template-based methods
 
[Torrance05] J.W. Torrance, G.J. Bartlett, C.T. Porter, J.M. Thornton,
"Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families,"
J Mol Biol, 347, 2005, pp. 565-581.
Watson05: The authors present a library of catalytic site structural templates based on information from the scientific literature. In an extension of previous work, a new web server is released that allows users to search the CSA using the JESS algorithm. The user can investigate a specific PDB code or submit a three-dimensional protein structure for analysis. (http://www.ebi.ac.uk/thornton-srv/databases/CSS).
 
[Porter04] C.T. Porter, G.J. Bartlett, J.M. Thornton,
"The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data,"
Nucleic Acids Res, 32, 2004, pp. D129-133.
This is the main reference for the Catalytic Site Atlas (CSA) (http://www.ebi.acu.k/thornton-srv/databases/CSA/index.html), a database of catalytic sites as determined by scanning the literature. A case is made that the SITE records of PDB files are used in an inconsistent fashion. So, the authors have gone through the literature and identified the catalytic residues for 177 proteins (as of the time of writing). They have also transfered annotations to 2608 proteins homologous to the originals. The paper does not demonstrate applications enabled by the database.
 
[Stark03a] A. Stark,, R.B. Russell,
"Annotation in three dimensions PINTS: Patterns in Non-homologous Tertiary Structures,"
Nucleic Acids Res, 31, 2003, pp. 3341-3344.
This is the main reference for PINTS.
 
[Stark03b] A. Stark, S. Sunyaev, R.B. Russell,
"A model for statistical significance of local similarities in structure,"
J Mol, Biol, 2003, pp. .
This provides analysis for the statistical significance of matches based on the RMSD between atom pairs. It is applied to evaluate matches for protein alignments.
 
[Stark04] A. Stark, A. Shkumatov, R.B. Russell,
"Finding functional sites in structural genomics proteins,"
Structure, 12, 2004, pp. 1405-1412.
Watson05: The authors report the use of fold similarity and template methods to identify functional sites in a selection of proteins solved by structural genomics projects. The authors compare their method (PINTS) with two other template-based methods, PROCAT and RIGOR
 
[Pazos04] F. Pazos, M.L.E. Sternberg,
"Automated prediction of protein function and detection of functional sites from structure,"
Proc Natl Acad Sci USA, 101, 2004, pp. 14754-14759.
Watson05: The authors describe Phunctioner, a method for the automatic prediction of function. An initial structural alignment is split into functionally specific subalignments using GO annotation. The conserved residues in each subalignment are interpreted as functionally important residues and are used to construct PSSMs for scanning against a query sequence to find the best-fitting functional match. An additional benefit is that the method can identify functionally important residues for GO terms for which no such information is currently known.
 
[Laskowski05a] R.A. Laskowski, J.D. Watson, J.M. Thornton,
"Protein function prediction using local 3D templates,"
J. Mol. Biol., 351, 2005, pp. 614-626.
 
[Preissner98] R. Preissner, A. Goede, C. Frommel,
"Dictionary of interfaces in proteins (DIP) Data bank of complementary molecular surface patches,"
J Mol Biol, 280, 1998, pp. 535-550.
``We defined interfaces as pairs of matching molecular surface patches between neighboring secondary structural elements. All such interfaces from known protein structures were collected in a comprehensive data bank of interfaces in proteins (DIP).The up-to-date DIP contains interface files for 351 selected Brookhaven Protein Data Bank entries with a total of about 160,000 surface elements formed by 12,475 secondary structures. ... The existing retrieval system for the DIP allows selection (out of the set of molecular patches) according to different criteria, such as geometric features, atomic composition, type of secondary structure, contacts, etc. A fast, sequence-independent 3-D superposition procedure was developed for automatic searches for geometrically similar surface areas. Using this procedure, we found a large number of structurally similar interfaces of up to 30 atoms in completely unrelated protein structures.''
 
[Frommel03] C. Frommel, C. Gille, A. Goede, C. Gropl, S. Hougardy, T. Nierhoff, R. Preissner, M. Thimm,
"Accelerating screening of 3D protein data with a graph theoretical approach,"
Bioinformatics, 19, 2003, pp. 2442-2447.
``The Dictionary of Interfaces in Proteins (DIP) is a database collecting the 3D structure of interacting parts of proteins that are called patches. It serves as a repository, in which patches similar to given query patches can be found. In this work we address the question of how the patches similar to a given query can be identified by scanning only a small part of DIP. The answer to this question requires the investigation of the distribution of the similarity of patches.''
 
[Kleywegt99] GJ. Kleywegt,
"Recognition of spatial motifs in protein structures,"
J Mol Biol, 285, 1999, pp. 1887-1897.
This paper describes two programs: SPASM and RIGOR. SPASM matches a single structural motif (spatial arrangement of points) to a database of proteins, while RIGOR matches a single protein to many structural motifs. Each residue is represented by its CA atom and/or the centroid of its side chain. Exhaustive enumeration of possible point correspondences are enumerated exhaustively, considering points for correspondence when their residue types are within some threshold in a substitution matrix. Constraints may also be added that matching residues be in the same order in the sequence, separated by the same size gaps in the sequences, etc. For every possible set of correspondences, the point sets are superposed, and the RMSD is checked to see if it is below a threshold. Structural motifs are constructed in three ways: 1) manually, 2) all sets of residues in spatial proximity that contain only hydrophobic, only polar and charged, or mixed hydrophobic and polar/charged residues, and 3) sets of residues that all contact a single hetero-compound. Applications are shown for a few cases of main-chain recognition, active-site recongition, and metal-binding site recognition. (http://alpha2.bmc.uu.se/usf)."
 
[Singh03] R. Singh, M. Saha,
"Identifying structural motifs in proteins,"
Pacific Symposium on Biocomputing, 8, 2003, pp. 228-239.
 
[Masden02] D. Masden, J. Kleywegt,
"Interactive motif and fold recognition in protein structures,"
J. Appl. Cryst., 35, 2002, pp. 137-139.
 
[Dawe03] J.H. Dawe, C.T. Porter, J.M. Thornton, A.B. Tabor,
"A template search reveals mechanistic similarities and differences in -ketoacyl synthases (KAS) and related enzymes,"
Proteins, 52, 2003, pp. 427-435.
 
[Hamelryck03] T. Hamelryck,
"Efficient identification of side-chain patterns using a multidimensional index tree,"
Proteins, 51, 1, 2003, pp. 96-108.
 
[Russell98] RB. Russell,
"Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution,"
J Mol Biol, 279, 1998, pp. 1211-1227.
 
[Barker03] J.A. Barker,, J.M. Thornton,
"An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis,"