Positions specific gap opening and extension penalties
Uses different amino acid substitution matrices - one for close relatives, one for distant relatives
When the structure is known we want to increase the gap penalty within helices and strands and decrease it between them, forcing gaps to occur more frequently in loops
If no structure is known, we can use simple rules which depend on the residues occurring and the frequencies of gaps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
Profile-based sequence search methods
A
We can identify patterns of conserved residues by comparing related sequences within a protein family
Even the most distant members of the family will have these patterns of conserved residues
We can make a profile which encapsulates these patterns and use it to detect more distantly related sequences
Highly conserved positions usually correspond to residues important for the folding or packing in the buried core or functional residues within the active site
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
PSI-BLAST
A
First, it constructs a multiple alignment of all the related sequences identified by BLAST
Then it estimates the residue frequencies at each position to construct a position-specific score matrix (PSSM), also known as a weight matrix or 1D profile
Then it uses the 1D profile for scanning the database
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
HMMs for protein family recognition
A
The sequence is aligned using a probabilistic model of interconnecting matches, deleted or inserted states
Contains statistical information on observed and expected positional variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
Protein Language Models (PLMs)
A
Adapted from text learning - learn which features best predict the next residue in a protein
These features can be represented in a vector for each residue to train classifiers for specific tasks e.g. ProtBERT, ProtTrans