Comparative modeling:
Proteins are amino acid residues that come together in a certain order –-- as determined by the corresponding genetic code in the DNA --– to form chains of various length (from 20 residues in small peptides to thousands of residues in large proteins).
The functioning of proteins in living organisms depend on their 3-D conformations.
The 3-D structure is mainly dictated by the dihedral angles – φ (phi) and ψ (psi) – formed by two adjacent amino acid residues.
Predicting the native 3-D form of a protein is very challenging because it requires scanning of every possible φ and ψ values for all amino acids and comparing energetics of each.
Although determining the correct values of φ and ψ (i.e., determining the minimum energy conformation) is more like IMPOSSIBLE, it is worth to note that here are a limited number of 20 standard amino acids common in all proteins.
Studies showed that there are sequence dependent patterns among protein structures, such as each amino acid prefers certain φ/ψ values.
φ and ψ involves adjacent amino acids, and there are also preferred values of these angles as a function of amino acid pairs.
Our method scans the protein data bank (PDB), which hosts 3-D coordinates of over 40,000 distinct proteins (based on 90% non-redundancy), and extract ψ/φ and φ/ψ distribution for each amino acid (20 distributions) and amino acid pair (20x20=400 distributions), respectively.
It then converts these distribution to a score of distance from the median as a function of number of standard deviation.
For the predicted structure (inset of the D2 score above) of the Staphylococcal nuclease, that has a significant role in digesting DNA, our method evaluates that it is less than a standard deviation away from the median of all the protein structures in the PDB.
Although useful in evaluating the likelihood of a given protein structure as a whole, D2 does not provide amino acid level information.
Therefore, we have developed an amino acid level description of the D2, that shows the likelihood of each amino acid in the test protein.
This is a visual description in the form of a color-coded strip that identifies the likely amino acids (green) and unlikely amino acids (red and blue).
Such a color strip may provide great insight as to indicate the areas of the protein structure that needs further refinement and can be a valuable tool in protein structure prediction.