Explain the human genome project
Human Genome Project (1990 -
2003)
* 3 billion base pairs long
* All done with traditional Sanger
Sequencing
* Unravelled the first Human Genome
Sequence to drive genetics research
* Billion-dollar cost
Explain the principles of PCR
Fundamental principle for any DNA
sequencing application
3 step process
* PCR is used to amplify a specific
region of DNA; primers flank the
region you want to amplify.
* Each cycle doubles the amount of
DNA copies of your target sequence
https://www.shmoop.com/dna/pcr-polymerase-chain-reaction.html
* Amplify enough DNA molecules
Explain characteristics of Sanger sequencing
Invented by Fred Sanger in 1977
* Cycle Sequencing
* Based on PCR (and needs a PCR product
as input)
* Identify single nucleotide
polymorphisms (SNPs), or
mutations
* We can identify monogenic
disease-causing mutations
* Used often for single gene tests
* E.g. CTFR in cystic fibrosis Needs Primers
* Modified nucleotides
● Chain Terminators
● Nucleotide specific colour tag
* A small proportion of the free nucleotides
are modified this way to allow every base
in the sequence to be read
Explain the next generation of DNA sequencing I
-technological advances since the end of the human genome project
-decrease in the cost of DNA sequences
-the cost has dropped at a rate faster than that of Moore’s law since the end of 2007
Explain the next generation of DNA sequencing II
-development of new NGS methods began 13 years ago with 454 pyrosequencing
-DNA sequencing throughout jumped 10 orders of magnitude
-solexa sequencing by synthesis (SBS) developed end of 2005
Outline the 4 steps in NGS sequencing
1) DNA library construction
2) Cluster generation
3) Sequencing by synthesis
4) Data analysis
Explain DNA library construction
1) DNA library construction:
- in the wet lab- we need to prepare DNA sample for sequencing
-DNA is chopped into small fragments—> this is called shearing
-this can be achieved chemically, enzymatically or physically
2) We have to repair the end of shared DNA fragments
-Adenine nucleotide overhangs are added to the end of fragments
-Adapters with thymine overhangs can be ligated to the DNA fragments
3) Adapters contain the essential components to allow the library fragments to be sequenced
-sequencing primer binding sites
-P5 and P7 anchors for attachment of library fragments to the flow cell
Explain cluster generation
Cluster generation:
-hybridise DNA library fragments to the flow cell
-we need to amplify the fragments to a bigger size for a stronger signal
-perform bride amplification to generate clusters
-many billions of clusters originating from single DNA library molecules
-clusters are now big enough to be visualised and the flow cell is now ready to be loaded
Explain sequencing the library
-DNA libraries deposited on flow cell:
- bridge amplification
-amplified to form ‘clusters’
Explain sequencing by synthesis
Sequencing by synthesis:
-modified 4 bases (ATCG) with:
* chain terminators
* different fluorescent colour dye
-sequence each single nucleotide 1 cycle at a time in a controlled manner
-single nucleotide incorporation (DNA polymerase)
-flow cell wash
-image the 4 bases
-cleave terminator chemical group and dye with enzyme \
-camera sequentially images all 4 bases on the surface of the flow cell each cycle
-each cycle image is converted to a nucleotide base (ACGT)
-cycle number anywhere between 50-600 nucleotide base pairs
-there are millions of short read sequences representing our original DNA library
Explain analysis of NGS data
-short read sequences from the sequencing machine need to be pieced together like a jigsaw
-mapping locations of our sequence reads on the reference genome sequence
-to generate a consensus sequence of our original DNA sample library
Explain NGS vs Sanger
NGS produces a digital readout. Sanger produces an analogue readout
Sanger is one sequence read
NGS is a consensus sequence of many reads
Explain whole genome sequencing
-there are roughly 21,000 genes in the human genome
-often we are only interested in the gene protein coding exons or the ‘exome’ represents 1-2% of the genome
-some 80% pathogenic mutations are protein coding
-target enrichment
-capture target regions of interest with baits
-potential to capture several Mb genomic regions of interest
-exome would be 50Mb in size
-patient DNA sample subjected to exome sequencing
Explain application of exome sequencing
-collecting disease affected individuals and their families
-use of NGS in disease gene identification
-perform exome sequencing
-compare variant profiles of affected individuals
Explain RNA sequencing
-RNA- seq experiments use the total RNA (or mRNA) from a collection of cells or tissue
-RNA is first converted to cDNA prior to library construction
-NGS or RNA samples determine which genes are actively expressed
-the number of sequencing reads produced from each gene can be used as a measure of gene abundance
-quantification of the expression levels
-calculation of the differences in gene expression of all genes in the experimental conditions.