How does protein sequence affect protein structure and function?
Protein sequence determines protein structure, which is linked to its protein function.
What are the key concepts in sequence-structure relation?
1+3
Protein sequence determines structure
Unique fold: single lowest energy state
Stable fold: resistant to small environmental changes
Accessible fold: can reach stable fold normally
Unique fold tests (3)
denatured and refolded into same fold
different conditions result in same fold.
> not dependant on environment
structure determination w/ various methods w/ various conditions results in same fold.
not dependant on environment
structure determination w/ various methods w/ various conditions results in same fold.
Unique fold limitations (3)
Multi domain proteins hard to refold in vitro.
Beta amyloids (proteins unfold and aggregate to these instead).
1% of proteins switch fold upon stimuli.
What is the energy landscape of protein folding like?
It resembles a rugged-surface funnel with shallow local minima.
For accessible folds is energy consuming assistance required?
No, they should not be required to reach the lowest energy state.
but chaperones can increase the folding efficiency (with and without energy consumption) to avoid local minima.
What is the correlation between sequence identity and protein folds?
Proteins with >20% sequence identity share folds.
provided from stability against mutations
exceptions can be made tho where 1 single AA change can result in fold change.
Protein oligomerisation (4), multi domain structures, enzyme specificity are less correlated to sequence than principle domain folds?
True.
oligomers and multi domain interfaces typically only involve a small amount of all residues
= less dependant on sequence
and enzyme specificity can be determined by single AA
= mutations are bad
only at the domain level is the fold linked to its sequence.
What factors are less correlated to sequence than domain folds?
Protein oligomerization and enzyme specificity.
What physicochemical parameters can be predicted from single sequences?
Isoelectric Point (pI) and extinction coefficient.
weighted sum of all AA.
What other predictions can be made from single sequences?
Linear motifs.*
Secondary structure.
based on tendency of AA to occur in specific secondary structures.
60% accurate.
short sequence patterns within proteins that act as functional signals e.g. kinase recognition site.
Sequence known vs Structures known
1000x more sequences known than structures.
using seq. data and evolutionary relationships is important for seq-struc relationship analysis.
Sequence similarity vs sequence identity w/ example
Sequence similarity measured often by physiochemical property.
Sequence identity w/ AA seq.
e.g. 30% similar with respect to AA hydrophobicity.
60% identical.
What are the types of gene relationships in evolution?
Homolog genes: from common ancestor
Ortholog genes: separated by speciation
Paralog genes: from duplication events within species
Analog genes: similar functions, different origins
Evolutionary events:
InDel,
Mutation.
= missense(=aa change), nonsense (=stop), silent (=no change).
Rearrangements: incl indel, inversions, duplication etc.
What is the purpose of sequence alignments?
To match sequences by adjusting for evolutionary events.
> introducing indels.
optimal arrangement defined by residue-pair scoring and overall scoring.
Identity scoring matrix pro/con
+ easy to compute.
- doesnt account for aa similarity
Physiochemical scoring matrix pro/con
+ good for struc/func analysis.
- inferior to other methods
What is the Needleman-Wunsch algorithm used for?
Calculating pairwise GLOBAL sequence alignment.
incl gap penalties.
What is the Smith-Waterman algorithm used for?
Calculating pairwise LOCAL sequence alignment.
start from highest value and work back.
no gap penalty.
Pairwise sequence alignments, pros cons
+ mathematically defined optimal solution
- not feasible for multiple sequences (computing time inc exponentially)
What is the purpose of heuristic methods in sequence alignment?
To perform faster multiple sequence alignments
What heuristic method is established for faster multiple sequence alignments?
CLUSTAL
Calculate pairwise alignments
Build a phylogenetic tree
Perform progressive alignment based on the tree
How are large sets of sequences represented for progressive alignments/CLUSTAL?
As Profiles
As linear Hidden Markov Models (HMMs)