``It is God's privilege to conceal things, but the kings' pride is
to research them.''
(Proverbs 25:2; ascribed to King Solomon of Israel, ca. 1000 B.C.)
The protein folding problem entails the mathematical prediction of (tertiary, 3-dimensional) protein structure given the (primary, linear) structure defined by the sequence of amino acids of the protein. It is one of the most challenging problems in current biochemistry, and is a very rich source of interesting problems in mathematical modeling and numerical analysis, requiring an interplay of techniques in eigenvalue calculations, stiff differential equations, stochastic differential equations, local and global optimization, nonlinear least squares, multidimensional approximation of functions, design of experiment, and statistical classification of data. Even topological concepts like the Morse index and invariants in knot theory (Jones polynomials) have been discussed in this context. An extensive recent report from the U.S. National Research Council on the mathematical challenges from theoretical and computational chemistry shows the protein folding problem embedded into a large variety of other mathematical challenges in chemistry.
Molecular biology is mankind's attempt to figure out how God engineered His greatest invention - life. As with all great inventions, details are top secret; however, even top secrets may become known. I find it a great privilege to live in a time where God allows us to gain some insight into His construction plans, only a short step away from giving us the power to control life processes genetically. I hope it will be to the benefit of mankind, and not to its destruction.
This document is updated only sporadically.
Molecular modeling of proteins and mathematical prediction of
``The aims of the present paper are to introduce mathematicians to the subject, to provide enough background that the problems in the mathematical modeling of proteins become transparent, to expose the merits and deficiencies of current models, to describe the numerical difficulties in structure prediction when a model is specified, and to point out possible ways of improving model formulation and prediction techniques.''
New techniques for the construction of residue potentials for protein
A smooth empirical potential is constructed for use in off-lattice protein folding studies. Our potential is a function of the amino acid labels and of the distances between the C(alpha) atoms of a protein. The potential is a sum of smooth surface potential terms that model solvent interactions and of pair potentials that are functions of a distance, with a smooth cutoff at 12 Ångstrøm. Techniques include the use of a fully automatic and reliable estimator for smooth densities, of cluster analysis to group together amino acid pairs with similar distance distributions, and of quadratic programming to find appropriate weights with which the various terms enter the total potential.
Hydrophobicity Analysis of Amino Acids
``Based on a principal component analysis of 47 published attempts to quantify hydrophobicity in terms of a single scale, we define a representation of the 20 amino acids as points in a 3-dimensional hydrophobicity space and display it by means of a minimal spanning tree. The dominant scale is found to be close to two scales derived from contact potentials.''
Mathematics and Molecules
(movies and images on molecular modeling)
``The objectives of MathMol are: 1) to provide students, teachers and the general public with information about the rapidly growing fields of molecular modeling and related areas; 2) to provide K-12 students with basic concepts in mathematics and their connection to molecular modeling...''
The Principles of Protein Structure
An internet course with many links in the course material
A Guide to Structure Prediction (by Robert B. Russell)
A red thread for the practitioner as a guide through the available techniques
Sisyphus and protein structure prediction (by Burkhard Rost)
Molecular Surfaces: A Review (Network Science, April 1996)
Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology (A report by the National Research Council USA)
Directory containing all PDB Entries (directory is huge!!)
PDB Structures: Summary Information (from University College London)
STRIDE, secondary structure assignment from atomic coordinates, based on H-bond patterns and mainchain dihedral angles (Frishman and Argos)
DEFINE_S secondary and first level supersecondary structure from C_alpha trace (Richards and Kundrot)
Amino Acid Information
Amino Acids from Wikipedia
ProStar Decoy Library (local copy; the original site no longer exists)
with C_alpha coordinates, secondary structure assignment and surface accessibility for all protein entries in the Protein Data Bank (PDB).
SWISS-PROT + TrEMBL non redundant database
WWW services for sequence analysis (Schneider and Rost)
Rotamer Libraries (Lovell et al.)
Dirichlet Mixtures and other Regularizers
CASP6 Home Page
CASP6 issue of Proteins
CASP5 issue of Proteins
CASP4 issue of Proteins
CASP3 issue of Proteins
CAFASP3, Critical Assessment of Fully Automated Prediction
EVA: EValuation of Automatic protein structure prediction
EVA measures for secondary structure prediction accuracy
EMBL MaxSprout server
``MaxSprout is a fast database algorithm for generating protein backbone and side chain co-ordinates from a C(alpha) trace. The backbone is assembled from fragments taken from known structures. Side chain conformations are optimized in rotamer space using a rough potential energy function to avoid clashes.''
LiveBench benchmarking program for protein prediction
ProStar, a pool of potentials, test decoys, and potential evaluation reviews
Decoys 'R' Us, database of computer generated conformations of protein sequences that possess some characteristics of native proteins, but are not biologically real.
Protein Structure Prediction Links
Protein Secondary Structure Prediction Servers and FTP Sites
commented Web Resources for Protein Scientists (by the Protein Society)
Pedro's BioMolecular Research Tools (frozen 1995, many links are no longer working)
The Amber Molecular Dynamics Package
Structural Classification of Proteins (SCOP),
European mirror site
``The scop database ... aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification.''
Molecular Visualization for X11 and Windows
WHAT IF molecular modelling package
``WHAT IF allows the molecular engineer to sit in front of a computer terminal or better, a graphics workstation, and ask questions that start with "What if ...."''
PyMOL molecular visualization system (open source)
William E. Hart
Martin Karplus and CHARMM
Richard H. Lathrop
Tamar Schlick (New York University)
... more people
Research groups interested in protein folding
WWW Chemistry Sites (67K, a list of UCLA)
NIH Computational Structural Biology (links to home pages of NIH researchers)
Pittsburgh Supercomputing Center Projects in Scientific Computing
(with several protein projects)
ExPASy Molecular Biology Server (analysis of protein and nucleic acid sequences)
France - Institute of Biology and Chemistry of Proteins
Fred Cohen Laboratory - University of California, San Francisco
(Langevin dynamics, secondary and tertiary structure prediction)
Birkbeck College, Department of Crystallography
University College London, Structure and Modelling Group
NMRWeb: Links to NMR information
Vienna RNA Secondary Structure Prediction and Comparison
Computational and Molecular Biology Initiative at Caltech
Genetic Algorithms and Protein Folding (a paper by S. Schulze-Kremer)
Arnold Neumaier (Arnold.Neumaier@univie.ac.at)