Sequencing Whole Genomes
What are the two basic approaches to whole genome sequencing?
i) minimum tiling path - determining the smallest number of overlapping cloned sequences that cover the entire genome
ii) random shotgun sequencing - sequence clones at random and assemble sequences by aligniment with each other
Minimum Tiling Path
1) construct libraries of cloned genomic DNA by shotgun cloning total genomic DNA by restriction/ligation
2) create different libraries but all with BAC clones, large genomic inserts
3) map the restriction enzyme sites in each cloned sequence using restriction digests
4) this gives you a map of the fragments that make up each cloned sequence allowing you to see where they overlap with the other sequences and how they fit together in the genome
5) this allows you to construct a CONTIG, a continuous sequence of the genome made up using the minimum number of overlapping fragments
6) digest the long sequence clones again using restriction enzymes and sequence each fragment by synthesis
7) from the RE mapping you already know how the fragments fit together in the genome so you can just put the fragment sequences together to make up the whole genome sequence
BAC Plasmid
Random Shotgun Sequencing
1) construct libraries of cloned genomic DNA by shotgun cloning
2) create different libraries containing different size classes: BAC clones (100kb), plasmids with 20kb inserts and plasmids with 8kb inserts
3) sequence 800 to 1000 bp from either end of each clone sequence at each size class
4) put all of these short sequences into a computer, sets of CONTIGs from smaller size class sequences are linked using sequences from larger insert libraries
5) genome sequence is then assembled using sequence analysis software to overlap all of the individual sequences
Minimum Tiling Path vs. Random Shotgun Sequencing
Minimum Tiling Path
Minimum Tiling Path vs. Random Shotgun Sequencing
Random Shotgun Sequencing
- difficult to assemble CONTIGs and align them correctly in a large genome
Next Generation Sequencing
1) DNA randomly sheared into ~400bp fragments
2) ligate “adapter” oligonucleotides to each end, these contain a sequence to which a primer can anneal
3) each molecule is denatured and bound to a solid surface via primers immobilised by covalent bonding
4) the fragments are PCR amplified in situ and a DNA polymerisation is initiated by adding DNA polymerase with a labelled dNTP (either G, C, A or T), the dNTP has a fluor blocking the 3’ -OH so once it is added to the DNA molecule, co further nucleotides can be attached
5) a bright light causes the fluor to fluoresce and the position of each signal is recorded using a CCD camera
6) the bond between the incorporated base and the fluor is broken and the fluor is washed of, this restores the 3’ -OH to the newly incorporated base meaning that it is possible for a nucleotide to bind to it
7) another cycle of polymerisation is initiated by adding DNA polymerase and another fluorescent base, the signal is recorded
8) this cycle of single-base polymerisation/fluorescence detection/regeneration is repeated until a sequence of each fragment has been built up
Can you sequence an entire genome with NGS?
What does NGS allow us to do?
Genome Browsers
Definition
-allow users to scan genome sequence data for gene coding sequences
Eukaryotic Genes Structure
Exons = regions present in the mRNA, made up of translated and non translated regions Introns = regions not present in the mRNA, removed by splicing
Eukaryotic Genes
Primary Transcript
exons, introns and poly A signal added by polyadenylation of the 3’ end of the molecule
Eukaryotic Genes
mRNA
exons and poly A signal
cDNA Synthesis
1) start with mRNA
2) short poly-T primer annealed to the poly-A tail on the mRNA
3) reverse transcriptase (an RNA dependent DNA polymerase) copies mRNA as a single stranded cDNA
4) DNA ligase ligates a linker primer to the single stranded cDNA
5) DNA polymerase (a DNA dependent DNA polymerase) can synthesise a second strand to form double stranded cDNA
What does sequencing cDNA allow us to do?
The Polymerase Chain Reaction
Definition
The Polymerase Chain Reaction
Typical PCR Cycle
1) denature DNA at 95C for 30s so that it is single stranded
2) anneal gene specific primer at 45-60C for 30s
3) DNA polymerase reaction, for 1 to 2 mins
4) repeat steps 1 to 3 35 times
PCR Cycles
Duration of Polymerase Reaction
-the length of the copied from the primer depends on how long you leave the reaction to take place
Thermostable DNA Polymerase for PCR
Uses of PCR - DNA Fingerprinting
PCR Cycle
Uses of PCR - DNA Fingerprinting
DNA Sequences
-many genes contain short elements in their non-coding regions made up of simple sequence repeats e.g. ATATAT or GCGCGC
-these sequences are highly susceptible to insertion/deletion of bases due to DNA polymerase slippage during DNA replication
-where there is no selection for sequence conservation (as in non-protein coding regions) these sequences may exhibit high levels of polymorphism between individuals in a species
-these simple repeats of repeated sequences are known as
vNTRs - variable number tandem repeats
SSRs - simple sequence repeats
or microsatellites
Uses of PCR - DNA Fingerprinting
Identification