a bioinformatics method for finding and quantifying similarities between two biological sequences, like DNA or protein, by inserting gaps to optimize a scoring function that penalizes gaps and mismatches while rewarding matches.
Pairwise sequence alignment
Aligns the entire length of two sequences (from start to end), even if parts don’t match well.
end-to-end alignment
Global alignment
Global alignment example
Needleman–Wunsch
Finds and aligns only the best matching region(s) within the sequences — useful when only a segment is similar.
find and align the similar local region
local alignment
can be used to reduce the time and space cost incurred by dynamic programming. For this, the most widely applied method is to limit the state transition and conduct the alignment in a smaller search space.
do not guarantee that there will be no poor results, ideal alignments can be achieved in many types of software because the sequences to be aligned are usually quite similar.
Heuristic alignment
local alignment example
Smith–Waterman
alignment is the basis of multiple sequence alignment and mainly divided into local alignment and global alignment
Pairwise sequence
has become the basic algorithm that is used in many types of multiple sequence alignment software.
Needleman–Wunsch
Heuristic alignment example
BLAST
heuristic methods
homologous segments (or seeds) are found and used as the “anchors” for the alignment. Each anchor point can divide the dynamic programming matrix into four sub-matrices located at the four corners. When more anchor points distributed throughout the sequences are found, the scale of the dynamic programming matrix can be greatly reduced, thereby reducing the time and space complexity
divide and conquer.
Many alignment algorithms are designed to find such seeds to accelerate alignment, give 2 main ones
FASTA
BLAST
is based on a heuristic idea: if two sequences have close similarity, then the number of gaps inserted in the sequences during alignment will be relatively small.
Therefore, the possible backtracking paths will be close to the diagonal of the dynamic programming matrix for similar sequences. The possible interval can be seen as a strip with certain width parallel to the diagonal
Bounded dynamic programming
core of a pairwise sequence alignment is the sum of the
scores of all aligned pairs
Multiple Sequence Alignment
is a heuristic method that aligns multiple sequences by the consistencies among pairwise alignments of each of the sequences to be aligned and a center sequence, which could be either generated or selected from the input
Star Alignment Strategy
Multiple Sequence Alignment
uses the star topology to simulate the relationship between sequences, and the selection of the center sequence determines the accuracy of the alignment to some extent.
Star Alignment Strategy
Multiple Sequence Alignment
another heuristic method, which performs multiple sequence alignment based on a pre-built guide tree (Figure 2B). Each leaf node in the guide tree represents a sequence, and each internal node represents an alignment of the sequences corresponding to its descendant leaf nodes (also known as profile).
Progressive Alignment Strategy
The star alignment and the progressive alignment strategies have two main defects.
once a wrong gap is inserted in the alignment, it will continue to exist and affect all the subsequent alignment.
Although this rule causes the gap error to spread, it does not create a baseless error, and the root of the error lies in another defect
Local optima usually arise from the greedy principle. Specifically, the optimum selected for a particular sub-problem is not necessarily optimal for the overall problem, and this will adversely affect the final alignment.
Defects of Heuristic Algorithms and Countermeasures
Gap penalties are already considered in the pairwise sequence alignment. The matching score of the two residues obtained by re-estimation with the consistency objective function is the score after the gaps have been penalized. Therefore, gap penalties need not be considered in the subsequent multiple sequence alignment process.
Consistency Objective Function
Defects of Heuristic Algorithms and Countermeasures
processes the results of multiple sequence alignment to remove errors caused by the local minimum trap and the “once a gap, always a gap” rule. There are several ways to perform the iterative refinement, two of which will be introduced in this section.
Iterative Refinement
software uses different algorithms, and therefore, the alignment quality can vary.
Multiple sequence alignment
Quality Estimation of Multiple Sequence Alignment Software
Classical metrics