What is a permutation?
A number of ways to organize items in a specific order. In our case, base pairs of DNA. Eg. Codons (AGC and TAG are different permutations)
How many distinct permutations in sequence can a stretch of a sequence 2 bases long have?
4 possibilities of base pairs (A,G,C,T) and a stretch of 2 bases. Therefore, 4^n = # of permutations.
So 4^2 = 16 different permutations in a stretch of 2 bases.
How do we determine the number of sites of n bases in a genome of size m?
if n = 1000
and m = 5.0x10^6
number of sites = m - (n-1)
(5.0x10^6) - (1000-1)
= 4.999x10^6
There are 4.999x10^6 sites of n = 1000 in a 5.0x10^6 base genome
Note: we assume this is single stranded DNA, so if double stranded, double your answer
How long does a DNA sequence have to be for us to expect it to be unique within a given genome?
We can expect a sequence to be unique in a genome when the number of permutations is greater than the number of sites of n bases in the genome.
How do we calculate n when the genome is known?
Bacterial genome is approx. 8x10^6 in bacteria. when is the number of permutations greater than the length of the genome? set 8x10^6 = 4^n then solve for n using logs
What are Universal Bits?
Universal bits refer to measuring DNA sequence information in terms of bits of information, rather than just counting base pairs. In DNA, the information content at any position in DNA is 2 bits (1 bp, 2 bases)
Ex. GGCCC = 5, so 2(5) = 10.
Information Theory: Minimum amount of bits to find our target object
The minimum amount of bits we need to find our target object is determined by the log2N equation. Ex. If N = 8, our minimum bits is 3. So we need to measure in 3 bit increments. (If there is only 1 bit present when we need 3, we will find additional targets to our selected target. We will not be able to accurately identify our target object.)
What is a BLAST Search?
Blast Stands for: Basic Local Alignment Search Tool. BLAST is a method used to compare DNA or protein sequences against a database to find similar sequences
How does BLAST Work?
You give BLAST a DNA or Protein Sequence and it searches a database to find: similar sequences, homologous genes, evolutionary relatives, or possible functions. BLAST looks for local regions of similarity, not full sequence matches.
BLAST: Query Strand
You break your sequence of interest, the Query, into shorter segments. These segments are then searched across the database where potential matches will be found.
What are the Two Main Statistics BLAST Search Gives?
Score and E-Value.
- Alignment Score = higher the score the better
- E-value = How likely this match occurred by chance (lower e-value the better)
Whata re the 4 Methods of DNA Sequencing?
Sanger (functions through chain termination and di-deoxy nucleotides)
Illumina (DNA sequencing by synthesis, uses fluorescence and photo imaging)
Nanopore (Assesses current change of nucleotides through pore)
PacBio (Uses DNA Pol and circularizes targets to track what bases are added wiht fluorescence)
Illumina Sequencing
Illumina uses fluorescently labelled nucleotides which stops elongation from occurring - images are taken after fluorescent nucleotides are added, then they are washed off, then repeat the cycle.
Step 1 and 2 of Illumina Sequencing
Steps 3 and 4 of Illumina Sequencing
Steps 5 and 6 of Illumina Sequencing
Pacific Bioscience Sequencing
Method which uses many different wells which immobilize DNA polymerase and the target DNA. Target is circularized and as each base is added it fluoresces which can be tracked by the microwells.
Steps 1 and 2 of PacBio Sequencing
Steps 3, 4 and 5 of PacBio Sequencing
Nanopore Sequencing
Method of DNA sequencing which assesses a change in current when nucleotides pass through a nuclear pore.
Steps of Nanopore Sequencing
Comparison of 4 Methodologies
What is DNA Sequencing and DNA Sequence Alignment?
DNA Sequencing: The process of determining the exact order of nucleotides in a DNA molecule
DNA Sequence Alignment: The process of aligning two or more DNA sequences to identify regions that are similar or different
What is a Contig?
A continuous DNA sequence from a collection of overlapping sequences which will hopefully represent/span the whole genome.