Name three applications for DNA sequencing.
And sooo much more!
In the history of DNA sequencing, what were the five most prominent discoveries/developments?
There are three kinds of sequencing methods in use today, for different purposes. Which, and what purposes are they used for?
How does sanger sequencing work in detail?
Sanger sequencing, aka the chain termination method which involves making many copies of a target DNA region using the same principle as PCR but with some tweaks:
What are the three major limitations with sanger sequencing?
What is de novo sequencing vs resequencing?
De novo sequencing is when you’re sequencing a genome for the first time, so there’s no reference genome to compare to. From the millions of reads, you need to align overlapping sequences and puzzle them together to get the whole genome sequence. This needs a lot of computing power!
Resequencing is when you align sequences to a reference genome and by that find their correct positions quickly. This uses less computing power as it is a lot less complex.
Explain the terms “read” vs “contig”.
The read length is very important when selecting HTS method, why?
It’s easier to align few long sequences than many short ones, but you get a lot more short reads than long ones with the current methods, so the applications short/long read methods differ. A combination of both is usually the surest way to go but expensive.
Long reads are better for de novo sequencing, but short reads are better to for example find point mutations or isoforms of a gene product.
What is meant by a “library” in HTS?
When performing HTS you need to do library prep, and the library is your prepped sample. Library prep often include fragmentation of the DNA and adding stuff to your sample to use it in the HTS method, like adapters/ends.
What is single- vs. paired-end reads?
End reads can be useful in short read sequencing methods, to determine both the order and orientation of the reads. Single-end reads have an added end on one side, while mate-paired end reads have a double added ends which give you the orientation and order for longer sequences which is good to use for building the scaffold.
What does “coverage” mean in a HTS context?
The coverage number is calculated by taking the total number of sequenced bases divided by the total number of bases in the genome. A high coverage is better, especially in de novo sequence but higher coverage also means more work, so a balance there is good.
Note: coverage is just an average, so a coverage of 1 can still mean you have gaps of un-sequenced DNA, while a lot in other places. This is important to have in mind when choosing a method. For example, you don’t need full coverage when building a scaffold, but if you’re determining a point mutation, you want a high enough coverage to confidently say that you have a true variation (majority of reads showing the same variation, which is hard if you only have two reads with conflicting info) rather than a sequencing error.
What is “metagenomics”?
Metagenomics = sequencing of mixed populations and then separating organisms by bioinformatics (= in silico).
What is metagenomics used for and what are the advantages?
Metagenomics is commonly used in ecology to determine the genome of a certain environment from a small sample. The advantages is that no cultivation of organisms required (cultivation-based experiments estimated to miss 99% of microorganisms).
Two main kinds:
1) Whole-genome sequencing
2) Targeted sequencing (usually 16S rDNA)
When preparing your library for HTS, there are many things to think about. What does GIGO stand for in this context and what does it mean?
GIGO = Garbage in, garbage out: It’s basically a warning reminding you to prepare your sample using the correct approach for your chosen method, to know your sample well and quality check it.
Using HTS methods is very expensive, so sequencing a lot of samples cuts the cost per sample drastically. What approaches can you use if you want to sequence many samples at the same time?
This removes error sources, saves money and workload!
Which three HTS methods are the biggest players today?
What are the four steps needed for library prep before HTS?
Target enrichment (optional)
- Depletion of short DNA molecules
- Targeted amplification of specific regions
- polyA-selection / rRNA depletion
- Enrichment of sRNA
Input DNA/RNA fragmentation and End-repair
- Size selection
- Not performed when maximum read length
is wanted (e.g. Nanopore)
Add adapters and barcodes
- PCR or ligation
- Barcoded samples can be pooled after this step
PCR amplification of library
- Not always necessary
- PCR bias (which need to be kept in mind when analyzing results)
Describe in as much detail as possible how the Illumina procedure works.
Analysis:
- forward and reverse strands are labeled if using paired end sequencing, the sequences are aligned and clustered if they contain similar base pairings.
pros: millions/billions of reads, fairly low error rate, cons: short reads so not great for de novo sequencing or exon determination.
Describe in as much detail as possible how the SMRT PacBio procedure works.
Pros: long reads with high accuracy, cons: less data than Illumina but better for de novo sequencing. Much faster than illumina, and can detect DNA methylation etc (based on how long it takes to incorporate the NT).
Describe in as much detail as possible how the Oxford Nanopore procedure works.
Note: New advancement, keeping the comp strand at the same pore and reading it after, which gives you double the info to correct for errors.
pros: accessible, very long reads. cons: higher error rate but is getting better. Can also be used to detect modifications.
Which HTS method(s) would you use for de novo sequencing and why?
For de novo sequencing you’d want to use long-read tech, PacBio or Nanopore, because longer reads are easier to puzzle together. For a big genome, Nanopore would probably be best as it generates the longest reads of the two.
Note: The best would be a combination of short- and long-read, as the long-read data is good for scaffolding but have more errors, while the short read data is more precise and more reads, which together gives the possibility of producing an accurate sequence. But that’s expensive!
Which HTS method would you use to evaluate gene expression in different cells?
Short-read tech, Illumina. For this question you’d want as much data as possible to be able to first, determine differences in gene expression but also to have enough data so say that the differences are true and not a false positive.
Which HTS method would you use to find new isoforms of a gene-product?
Long-read tech, so either PacBio or Nanopore. When using long read tech you can easily see gaps in the same read or alternative lengths of the sequence, which make it clear if any alternative isoforms are present. With short-read tech you would not get reads longer than exons, so you could easily miss isoforms.
What is bioinformatics?
Bioinformatics is an interdisciplinary field of science (biology, computer science and statistics) that develops methods and software tools for understanding biological data, especially when the data sets are large and complex.