BLASTP and BLASTX filter out low complexity regions with the program SEG. Why?
If you perform permutation tests and you get an distribution of scores way below the real score, is this alignment homologous?
Yes
Permutation are more sensitive in general and are good for very distant relationships
What is the twilight zone of evolutionary distance?
Around 15-23% identity
List the rules of thumb for the following alignments
A) Sequence > 100 AA, 25% identical
B) Sequence > 100 AA, 15-25% identity
C) <15% identity
A) Sequence > 100 AA, 25% identical
- Probably significantly homologous
B) Sequence > 100 AA, 15-25% identity
- Probably homologous, but need rigorous testing (including permutation tests)
C) s not significant, look for motifs in multiple alignments as well as tertiary structure
List two benefits to multiple alignments
Describe progressive multiple alignments
Eg. clustal
Pairwise alignments: n!/2(n-2)!
Give the steps of ClustalW
Give 1 advantage and 3 disadvantages of ClustalW
Pros
- Fast
Cons
How are sequences weighted with ClustalW?
Weights allow you to take advantage of similar sequences when you already know the phylogeny or other information that is relevant to weighting.
How does clustal deal with penalties?
These are gap opening penalties and gap extension penalties.
These can be set by the user, but clustal will attempt to manipulate these according to the following criteria:
The percent identity of the sequences is used as a scaling factor to increase the GOP for closely-related sequences and decrease it for more distantly-related sequences
Describe Clustal’s position-specific gap penalities and its reactions to gaps already present at a position
If there are already gaps at a position, the GOP is reduced in proportion to the number of sequences with a gap at this position and GEP is lowered by half.
Near gaps (within 8 residues) have an increased GOP
These rules discourage the opening of too many gaps close together but encourage them to exactly line up
Describe clustal’s treatment of gaps in protein loops
Why is it better to delay the alignment of divergent sequences when making multiple alignments?
The most divergent sequences are usually the most difficult to align.
The user has a choice of setting an identity cutoff to delay the alignment until the others have been aligned
What are the two major changes of clustal omega?
- Incorporates a Hidden Markov Model into the main alignment engine
Why should the output of a multiple alignment algorithm always be checked?
How does database searching with conserved elements of multiple sequence alignment (motifs or patterns, or profiles) improve sensitivity of database searching?
Upweighting important (conserved) sequence elements and downweighting less important (less conserved) sequence features
A query is inherently similar to all sequences in an alignment, but not so similar to any one (less than 40% identity), therefore you need some way of summarizing information from all the sequences in the multiple alignment at once: