Molecular Modeling of Proteins

``It is God's privilege to conceal things, but the kings' pride is to research them.''
(Proverbs 25:2; ascribed to King Solomon of Israel, ca. 1000 B.C.)

The protein folding problem entails the mathematical prediction of (tertiary, 3-dimensional) protein structure given the (primary, linear) structure defined by the sequence of amino acids of the protein. It is one of the most challenging problems in current biochemistry, and is a very rich source of interesting problems in mathematical modeling and numerical analysis, requiring an interplay of techniques in eigenvalue calculations, stiff differential equations, stochastic differential equations, local and global optimization, nonlinear least squares, multidimensional approximation of functions, design of experiment, and statistical classification of data. Even topological concepts like the Morse index and invariants in knot theory (Jones polynomials) have been discussed in this context. An extensive recent report from the U.S. National Research Council on the mathematical challenges from theoretical and computational chemistry shows the protein folding problem embedded into a large variety of other mathematical challenges in chemistry.

Molecular biology is mankind's attempt to figure out how God engineered His greatest invention - life. As with all great inventions, details are top secret; however, even top secrets may become known. I find it a great privilege to live in a time where God allows us to gain some insight into His construction plans, only a short step away from giving us the power to control life processes genetically. I hope it will be to the benefit of mankind, and not to its destruction.

This document is updated only sporadically.

Our Own Work

Molecular modeling of proteins and mathematical prediction of protein structure
``The aims of the present paper are to introduce mathematicians to the subject, to provide enough background that the problems in the mathematical modeling of proteins become transparent, to expose the merits and deficiencies of current models, to describe the numerical difficulties in structure prediction when a model is specified, and to point out possible ways of improving model formulation and prediction techniques.''

New techniques for the construction of residue potentials for protein folding,
A smooth empirical potential is constructed for use in off-lattice protein folding studies. Our potential is a function of the amino acid labels and of the distances between the C(alpha) atoms of a protein. The potential is a sum of smooth surface potential terms that model solvent interactions and of pair potentials that are functions of a distance, with a smooth cutoff at 12 Ångstrøm. Techniques include the use of a fully automatic and reliable estimator for smooth densities, of cluster analysis to group together amino acid pairs with similar distance distributions, and of quadratic programming to find appropriate weights with which the various terms enter the total potential.

Hydrophobicity Analysis of Amino Acids
``Based on a principal component analysis of 47 published attempts to quantify hydrophobicity in terms of a single scale, we define a representation of the 20 amino acids as points in a 3-dimensional hydrophobicity space and display it by means of a minimal spanning tree. The dominant scale is found to be close to two scales derived from contact potentials.''

Protein Folding - Introductions and Surveys

Mathematics and Molecules (movies and images on molecular modeling)
``The objectives of MathMol are: 1) to provide students, teachers and the general public with information about the rapidly growing fields of molecular modeling and related areas; 2) to provide K-12 students with basic concepts in mathematics and their connection to molecular modeling...''

The Principles of Protein Structure
An internet course with many links in the course material

A Guide to Structure Prediction (by Robert B. Russell)
A red thread for the practitioner as a guide through the available techniques

Sisyphus and protein structure prediction (by Burkhard Rost)

Molecular Surfaces: A Review (Network Science, April 1996)

Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology (A report by the National Research Council USA)

The Brookhaven Protein Data Bank

PDB Home Page (with links to other sites)

Directory containing all PDB Entries (directory is huge!!)

PDB Structures: Summary Information (from University College London)

Secondary Structure Definitions

DSSP, secondary structure assignment from atomic coordinates, based on H-bond patterns (Kabsch and Sander)

STRIDE, secondary structure assignment from atomic coordinates, based on H-bond patterns and mainchain dihedral angles (Frishman and Argos)

DEFINE_S secondary and first level supersecondary structure from C_alpha trace (Richards and Kundrot)

Other Databases

Amino Acids Guide

Amino Acid Information

Amino Acids from Wikipedia

ProStar Decoy Library (local copy; the original site no longer exists)

DSSP database
with C_alpha coordinates, secondary structure assignment and surface accessibility for all protein entries in the Protein Data Bank (PDB).

SWISS-PROT + TrEMBL non redundant database

WWW services for sequence analysis (Schneider and Rost)

Rotamer Libraries (Lovell et al.)

Dirichlet Mixtures and other Regularizers

Critical Assessment of Techniques for Protein Structure Prediction

CASP7 Home Page

CASP6 Home Page

CASP6 issue of Proteins

CASP5 issue of Proteins

CASP4 issue of Proteins

CASP3 issue of Proteins

CAFASP3, Critical Assessment of Fully Automated Prediction

EVA: EValuation of Automatic protein structure prediction

EVA measures for secondary structure prediction accuracy

EMBL MaxSprout server
``MaxSprout is a fast database algorithm for generating protein backbone and side chain co-ordinates from a C(alpha) trace. The backbone is assembled from fragments taken from known structures. Side chain conformations are optimized in rotamer space using a rough potential energy function to avoid clashes.''

LiveBench benchmarking program for protein prediction

ProStar, a pool of potentials, test decoys, and potential evaluation reviews

Decoys 'R' Us, database of computer generated conformations of protein sequences that possess some characteristics of native proteins, but are not biologically real.


Protein Structure Prediction Center of the Lawrence Livermore National Laboratory

Protein Structure Prediction Links

Protein Secondary Structure Prediction Servers and FTP Sites

BioCatalog Proteins

commented Web Resources for Protein Scientists (by the Protein Society)

Pedro's BioMolecular Research Tools (frozen 1995, many links are no longer working)

The Amber Molecular Dynamics Package

Structural Classification of Proteins (SCOP), European mirror site
``The scop database ... aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification.''

Molecular Visualization for X11 and Windows

WHAT IF molecular modelling package
``WHAT IF allows the molecular engineer to sit in front of a computer terminal or better, a graphics workstation, and ask questions that start with "What if ...."''

PyMOL molecular visualization system (open source)

Some Books

Reviews in Computational Chemistry

Some People

Bernard Brooks

David Covell

Gordon Crippen

William E. Hart

Barry Honig

Robert Jernigan

Martin Karplus and CHARMM

Richard H. Lathrop

Burkhard Rost

Andrej Sali

Tamar Schlick (New York University)

Eugene Shakhnovich

... more people

Research groups interested in protein folding

Some Other Institutions

European Molecular Biology Laboratory (EMBL)

WWW Chemistry Sites (67K, a list of UCLA)

Theoretical Bioinformatics Heidelberg (Deutsches Krebsforschungszentrum)

NIH Computational Structural Biology (links to home pages of NIH researchers)

Pittsburgh Supercomputing Center Projects in Scientific Computing
(with several protein projects)

ExPASy Molecular Biology Server (analysis of protein and nucleic acid sequences)

France - Institute of Biology and Chemistry of Proteins

Fred Cohen Laboratory - University of California, San Francisco
(Langevin dynamics, secondary and tertiary structure prediction)

Birkbeck College, Department of Crystallography

University College London, Structure and Modelling Group

Related Sites (RNA, NMR)

NMR Information Server

NMRWeb: Links to NMR information

Vienna RNA Secondary Structure Prediction and Comparison


Mathematical Challenges from Theoretical/Computational Chemistry
A very informative 1995 report of the National Academy of Sciences USA

Computational and Molecular Biology Initiative at Caltech

Genetic Algorithms and Protein Folding (a paper by S. Schulze-Kremer)

global optimization
mathematics links
scientific links
recent papers and preprints
my home page (

Arnold Neumaier (