What is the objective of a global alignment?
optimal alignment that includes all characters from each sequence (ex: cluster generates global alignment)
What is the objective of a local alignment?
optimal alignment that includes only the most similar local region(s)
ex” BLAST generates local alignments
What are the two main groups aligners can be classified into?
Why are statistics critical for sequence similarity searching?
Used to discriminate between real and artifactual matches which is done using a estimate of probability that the matched occurred by chance
Status allow us to give rank order and find optimal solution
Why are references important in bioinformatic analysis?
Why doesn’t BLAST scale to NGS requirements?
What are some examples where sequence alignment strategies are references specific
EX 1: Use BWA for genomic alignments
Ex2: Use STAR aligner for RNA alignments
Strategy used depends on the research question being asked
What is the objective of short read alignment?
To align 100s of millions of short reads against a known reference
Note: repeats longer than read length are problematic
What is Heuristics in computational bio?
There is a trade off b/w computational efficiency/resources and accuracy/precision
Heuristics goal to produce a good enough solution in reasonable time
Make choice between completeness & speed
What are some (4) examples of heuristics?
How is indexing used?
to significantly increase alignment speed by converting the genome &/or reads into an index table of short “words”
T/F Indexing the reference genome is 0 based
T
What does position refer to in a index look up table?
The location in the genome that the sequence occurs
How are BLAST and NGS fundamentally different?
BWA extracts a seed from the 5’ end (which has higher quality). The 5’ end serves as the search space anchor
BLAST takes a 3 char word and searched for it all across the read
What are the 3 steps in indexing?
What is the purpose of seed extention?
To attempt to resolve the optimal alignment (since the seed that matches part of the read may occur in more than one place)
Read is extended past the seed and the corresponding adjacent sequence are compared. Show which of the original seeds the read aligns
How does the aligner decided where a read aligns to (in BWA)?
Is it better to have higher or lower phred scores for mismatched bases during alignment?
Higher quality bases are more penalized in the extension b/c there is high confidence that there is a mismatch compared to a base with a lower phred score
What affect does altering the phred score threshold have in aligning?
Decreasing the score allows less mismatches to be tolerated, alignment is more specific
Use higher phred score to allow more mismatches
Note that there will always be mismatches to reference due to ex: polymorphisms
How does the BWA align reads to the reference?
What is a mapping quality?
Why are indels problematic in aligning?
What happens when a read aligns to more than 1 place in the refence genome
What are the 2 main modules in BWA and how do they differ?
Differ in seeds and how gaps are handled in alignment