The SCOP database
Structural Classification of Proteins Database developed in Cambridge
For example: Small proteins->Cysteine-knot cytokines->Cysteine-knot cytokines->Transforming growth factor beta - more of a functional than structural classification
a-helical
B-sheet
a+B - a-helices and B-sheets in different parts of proteins, no B-a-B motifs
a/B - Helices and sheets assembled from B-a-B motifs
a/B-linear - Line through centres of strands of sheet roughly linear
a/B-Barrels - Line through centres of strands of sheet roughly circular
Proteins with little/no secondary structure e.g. soft proteins
What does SCOP struggle with?
Domains
Many proteins are multi-domain and SCOP assumes they are single-domain
For example: clotting factors like factor XII:
Fn2-EGF-EGF-Fn1-Kr-SerPr
Factor IX:
Gla-EGF-EGF-SerPr
Difficult to classify evolutionary origins due to domain shuffling.
How do we define domains
The Gō plot method
Defines domains using distances of amino acids from centre of a protein and whether they cluster distantly to the centre of a protein.
Conducting Go plot:
1. Calculate the radius of spherical volume of protein
2. Calculate distance from each a-carbon of each amino acid to all the others
3. If distance is greater than spherical radius, score +.
Lines are drawn in a triangle to identify protein domains
Never get completely clean triangles in real proteins
What are the disadvantages of Gō plots
Explain modular evolution of proteins
How does Pfam build domains?
Gaining a protein from PDB
Input seuqence and Protein sequence databank -> BLAST search
BLAST search -> Filter results (E<threshold) -> Multiple alignment sequence -> Position-specific scoring matrix
Hidden Markov modelling
Sequence
Showing patterns using sequence logos
Sizes of DNA/amino acids can be used to illustrate their presence across a series of different molecules.
What software shows DNA/amino acid patterns using sequence logos
Pfam on a larger scale
How does pfam build domains?
Use the HMM to query GenPept – hmmsearch
Align the new hits to the HMM – hmmalign
Rebuild the HMM to include the new hits – hmmbuild
Repeat as desired, or until there are no new hits
“Structure, structure, structure” (Alex Bateman, founder of Pfam)
Disadvantages of Pfam?
Homstrad
Homologous Structure Alignment Database
Used to collect good seed alignments
Used in construction of globin molecules
Not enough structures
The structural genomics consortium