What are the two general approaches to genome sequencing and what happens in each approach. (7)
Define bacterial artificial chromosome(BAC).
Is a plasmid containing foreign DNA
Discuss the approach to organizing the sequencing of large DNA molecule into pieces of known relative position. Provide with all necessary examples or features.
E.g. Arabidopsis thaliana has a haploid genome size of about 10^8 bp. A 3948 clone BAC library for A.thaliana contains ~100kb inserts per clone, giving approximately four-fold coverage.
What is the principle of the whole-genome shotgun approach?
Is to sequence random pieces of the(genomic)DNA and put them together in the right order.
-if this can be done then one can skip the laborious stage of creating a map as the basis for assembling partial sequences.
Explain what happens in the whole-genome shotgun sequencing of the Drosophila melanogaster(fruit fly) genome.
Define a ‘Golden path’.
Is the fully assembled genome sequence, but built by coalescence of contigs.
Define coverage and give the appropriate formula for it.
Is the average number of times each base appears in the (sequenced)fragments.
Coverage = NL/G
•N, number of reads
•L, length of a read
•G, genome length
Explain what can be done to close the gaps, once you’ve assembled the contigs and identified the gaps?
Finishing, which involves synthesis and sequencing of specific fragments to close the gaps.
Provide the two concerns of whole-genome shotgun(WGS) sequencing.
(1) WGS worked smoothly for prokaryotes, contain relatively less internal repetitive sequence.
> publication of the Drosophila genome in the year 2000 contained 120 Mb of finished sequence, with about 1600 gaps.
> later, the no. Of gaps had been reduced to less than 1% of total sequence
(2)Genomes with highly skewed base composition also complicates application of WGS
E.g. Plasmodium falciparum- contains ~80mol% AT
Provide the two positives about WGS approach.
(1) it may be possible to identify genes in a partly assembled genome with many gaps,provided that the genes are contained within contigs.
(2) fruit fly WGS sequencing by Celera- ‘ proof of principle’ ; completed the ‘commercial’ human genome project using academic sequence as reference.
Name the two main differences between BAC-to-BAC and WGS approaches and explain them.
(1) BAC-to-BAC methods are more robust than WGS methods.
• in diploid, fragments arising from homologous regions of two chromosomes of a pair may have sequence differences.
• correct assembly must place them at the same location, while noting the discrepancies, thus, assembly must not split these reads into different contigs because of the imperfect matches(BAC ordering)
(2) Highly inbred laboratory strain vs outbred population or pooled DNA
• would present a more severe assembly challenge (in light of point above)
State the common and different steps in ‘BAC-to-BAC’ and WGS methods.
WGS- partially sequence 1.5kb subfragments of individual plasmid clones.
Define single-end read
A technique in which sequence is reported from only end of a fragment.
Define paired-end read.
A technique in which sequence is reported from both ends of a fragment/template (with a number of undetermined bases between the reads that is known only approximately).
Define paired-end read.
A technique in which sequence is reported from both ends of a fragment/template (with a number of undetermined bases between the reads that is known only approximately).
Define read length
The no. Of bases reported/sequenced from a single experiment on a single fragment/ template.
Define assembly/sequence
The inference of the complete sequence of a region from the data on individual fragments from the region, by piecing together overlaps.
Define contig
A partial assembly of data from overlapping fragments into a contiguous region of sequence.
Define de novo sequencing
Determination of a full-genome sequence without using a known reference sequence from an individual of the species to avoid the assembly step.
Define resequencing
Determination of the sequence of an individual of a species for which a reference genome sequence is known. The assembly process is replaced by mapping process is replaced by mapping the fragments onto the reference genome.
Define exome sequencing
Targeted sequencing of regions in DNA that code for parts of expressed proteins (exons). This method targets only the approximately 180000 exons in the human genome, for example.
Define RNAseq
Sequencing the contents and composition of the RNAs in the cell(called the ‘transcriptome’) by conversion of RNA to complementary DNA and sequencing the results.
What are the general approaches to improving the throughput/cost ratio?
Miniaturization and parallelization or multiplexing
Provide the common preparation steps in NGS in high-throughput DNA sequencing.
(1) target DNA is fragmented
(2) common adaptors are attached to one or both ends
(3) amplification (via PCR) - generates a library of short regions
(3) spatial distribution of library-either in an array of wells or fixed to a solid medium- and sequencing in parallel. ‘De novo’ sequence assembly or mapping to reference genome