Sequencing platforms
* Illumina NovaSeq X Plus: This is Novogene’s primary platform for high-throughput sequencing of premade libraries, offering high accuracy and extraordinary sequencing power.
* PacBio Sequel IIe and Revio systems: Used for long-read sequencing applications, providing HiFi reads with high consensus accuracy and coverage.
* Oxford Nanopore PromethION: Offers real-time, long-read sequencing of DNA and RNA, including ultra-long reads for complex genomes and structural variation detection.
Flow cell types
* NovaSeq X Plus:
* 10B flow cell: Offers ~1.25 billion paired reads per lane.
* 25B flow cell: Provides significantly higher throughput, reaching up to ~3.2 billion paired reads per lane, allowing for more data-intensive applications.
Sequencing strategies
Novogene provides a range of sequencing strategies for premade libraries, catering to various project needs:
* Paired-End 150 bp (PE150): Recommended for mRNA sequencing and other applications requiring high-quality short reads.
* Paired-End 250 bp (PE250): Available on the NovaSeq 6000 SP flow cell, suitable for applications needing longer read lengths.
* Paired-End 50 bp (PE50) and Single-End 50 bp (SE50): Offered for specific applications and projects with lower read length requirements.
Outsourcing partners
Novogene collaborates with various partners to ensure access to a diverse range of technologies and expertise, according to their partnership page. These partnerships expand their capabilities and allow them to leverage the most effective technologies available, including those from Illumina, Pacific Biosciences, Oxford Nanopore, and Life Technologies.
In summary, Novogene offers extensive options for premade library sequencing, featuring a combination of cutting-edge platforms, high-throughput flow cells, diverse sequencing strategies, and a strong network of outsourcing partners to meet the needs of various research projects.
In essence, the NovaSeq X Plus represents an advancement over the NovaSeq 6000, offering increased throughput, speed, cost efficiency, and streamlined features while maintaining high data quality.
but we mainly only use the Novaseq Xplus
10B = 375Gb raw data/lane – 1,249.875M
- $1799/lane (0-7 lanes)
25B = 1000Gb raw data/lane – 3333m
- $3149/lane (0-7lanes)
50Gb - $600
100Gb - $700
150Gb - $1050
200Gb - $1200
250Gb - $1500
300Gb - $1800
350Gb - $2100
4000Gb - $275/50Gb
Yes, Novogene allows for batched submissions via partial lane sequencing, also referred to as shared lane sequencing.
This service enables you to submit multiple projects or samples to be sequenced on a portion of a lane on sequencing platforms like the NovaSeq X Plus, according to Novogene. This offers cost-efficiency and flexibility, especially for smaller projects or those with lower data volume requirements.
Key aspects of Novogene’s batch partial lane submission
* Sharing Lanes: Multiple projects can share a single sequencing lane.
* Cost-Efficiency: This provides an economical option compared to full lane sequencing.
* Flexibility: You have control over your projects with options for multiple lane or data amount purchases.
* Sample Pooling: It’s recommended to send samples before pooling, but if pooling is necessary, limit it to no more than 10 sub-libraries in one tube.
* Data Quality: Novogene guarantees high data quality with a Q30 score of ≥ 80% for partial lane sequencing, exceeding the Illumina official guarantee of ≥ 75%.
In essence, Novogene’s partial lane sequencing facilitates efficient and affordable batch processing of samples by allowing projects to share sequencing resources.
The amount of library needed for full lanes in sequencing depends on several factors, including the sequencing platform (e.g., Illumina, PacBio), the flow cell type, and the specific reagent chemistry being used.
Here’s a breakdown of general guidelines and considerations based on your search results:
Illumina platforms
* NovaSeq X Plus:
* Uses patterned flow cells (1.5B, 10B, 25B) each with 8 lanes.
* Library preparation instructions provide specific volumes of libraries to be denatured and diluted to achieve the recommended final loading concentration for the different flow cell types (e.g., 34 µl for 10B or 1.5B, and 56 µl for 25B).
* MiSeq and NextSeq 500/550:
* Recommended final loading concentrations are typically in the picomolar (pM) range, varying based on the reagent kit version and library type.
* For example, MiSeq sequencing with V2 or V3 reagents suggests 8-10 pM.
* NextSeq 500/550 recommends 1.2-1.4 pM.
* NextSeq 1000/2000:
* Loading volumes and concentrations vary by library type.
* For onboard denature/dilute, a starting concentration of 650 pM is recommended for unlisted library types, with optimization over subsequent runs.
* General Considerations for Illumina:
* Accurate Quantification: Use reliable methods like qPCR or Qubit to accurately quantify your libraries to avoid over- or under-clustering.
* Loading Concentration Optimization: The optimal loading concentration might need fine-tuning for specific libraries and experiments.
* PhiX Spike-in: For low-diversity libraries, adding a PhiX control can help ensure proper cluster density and data quality.
* NaOH Denaturation: Ensure the correct concentration of NaOH is used for denaturation, as excess can inhibit cluster formation.
PacBio platforms
* PacBio Revio and Sequel II:
* Requires high-quality DNA and RNA samples.
* Specific guidelines exist for sample requirements, including concentration and integrity (e.g., RIN score for RNA).
* The amount of DNA or RNA needed for library preparation can be in the microgram range.
* General Considerations for PacBio:
* High Molecular Weight (HMW) DNA: Handling and assessment of HMW DNA is crucial for optimal results.
* Library Size: The desired library insert size affects the preparation methods and expected yields.
* Sample Quality Control: Thorough QC of the starting material is essential for successful sequencing.
Important notes
* Always consult the latest documentation and protocols provided by the manufacturer for the specific sequencing platform and reagent kits you are using.
* Library preparation and sequencing are complex procedures that require careful attention to detail and good laboratory practices.
In conclusion, determining the precise amount of library needed requires careful consideration of the specific platform, flow cell, and reagents being used, along with appropriate quantification and potentially optimization of loading concentrations based on experimental results.
Novogene implements a rigorous Library QC process to ensure the quality and integrity of sequencing libraries before proceeding to sequencing.
Steps Involved in Novogene Library QC and their Purpose:
1. Qubit 2.0 (Preliminary Library Concentration Assessment):
* Purpose: To quickly determine the concentration of the prepared library using a sensitive fluorescent-based quantitation method. This provides an initial estimate before more precise quantification.
2. Agilent 2100 Bioanalyzer (Insert Size and Fragment Distribution Analysis):
* Purpose: To assess the size distribution and integrity of the library fragments. This ensures the prepared library contains fragments within the expected size range for optimal sequencing performance.
3. qPCR (Precise Library Quantification):
* Purpose: To accurately quantify the number of amplifiable library molecules (effective concentration). This is critical for pooling multiple libraries accurately for sequencing and ensuring even read distribution.
Overall Purpose of Library QC at Novogene:
* Ensure optimal sequencing performance: By confirming the library quality, Novogene can predict and optimize sequencing outcomes.
* Prevent sequencing failures: Detecting and addressing potential issues like low library concentration or incorrect fragment size before sequencing minimizes the risk of sequencing failure.
* Generate high-quality data: Quality control measures throughout the process, including library QC, contribute to the generation of reliable and accurate sequencing data for downstream analysis.
* Enable efficient sequencing: Precise library quantification allows for accurate pooling and efficient utilization of sequencing capacity.
Note: The specific steps might vary slightly depending on the type of library and the sequencing application. However, the core principles of assessing concentration, size, and effective concentration remain essential for ensuring a successful sequencing project.
Significant discrepancies between Qubit/Fragment Analyzer and qPCR derived nanomolar concentrations of nucleic acid libraries can provide valuable insights into sample quality and the success of library preparation.
Here’s an interpretation of these differences:
1. Qubit/Fragment Analyzer concentrations significantly higher than qPCR
This often suggests that the sample contains a substantial amount of double-stranded DNA that is not amplifiable by the qPCR assay.
Possible reasons include:
* Presence of non-target DNA: The Qubit/Fragment Analyzer measures all dsDNA present in the sample, while qPCR only quantifies sequences flanked by specific adapter sequences that can be amplified.
* Failed or inefficient adapter ligation: If the adapters haven’t been successfully ligated to all DNA fragments, Qubit/Fragment Analyzer will still detect the non-ligated fragments, leading to a higher concentration compared to qPCR.
* Single-stranded DNA: Some sample preparation methods might result in single-stranded DNA (ssDNA). Qubit and Fragment Analyzers don’t detect ssDNA, but qPCR might be able to amplify it, leading to higher qPCR concentrations in some cases where a high amount of ssDNA is present and amplifiable.
2. Qubit/Fragment Analyzer concentrations significantly lower than qPCR
This scenario might indicate that the Qubit/Fragment Analyzer is underestimating the concentration of functional library molecules available for sequencing, which could lead to overloading the sequencer.
Possible reasons include:
* Underestimation due to degraded DNA: If the DNA is degraded or fragmented, Qubit might underestimate the concentration of the usable library, while qPCR, targeting smaller amplicons, could still quantify these fragments.
* Contamination or inhibitors affecting Qubit fluorescence: The presence of contaminants or substances that interfere with the fluorescent dye used in Qubit assays can lead to an underestimation of the DNA concentration.
* Presence of specific sequences amplified efficiently by qPCR: It’s also possible that qPCR primers are particularly efficient at amplifying specific sequences present in the library, leading to a higher detected concentration compared to the overall dsDNA quantification by Qubit, according to a 10x Genomics article.
In summary, when faced with discrepancies, it is crucial to consider the type of nucleic acid, potential issues during sample preparation, and the principles behind each quantification method. In some cases, combining multiple methods can provide a more accurate assessment of the usable library concentration.
PhiX is a DNA control used in Illumina next-generation sequencing to assess run quality, calibrate the sequencing process, and introduce diversity into low-complexity libraries. It serves as a technical control for cluster generation, sequencing accuracy, and alignment quality. It’s also used to improve sequencing for libraries with low base diversity.
Purpose of PhiX:
* Quality Control:
PhiX provides a known DNA sequence that allows for the assessment of cluster generation, sequencing accuracy, and alignment quality.
* Calibration:
It acts as a calibration control for cross-talk matrix generation, phasing, and prephasing calculations.
* Diversity:
PhiX is used to introduce base diversity into low-complexity libraries, which can improve base-calling and overall run quality.
* Troubleshooting:
It helps in troubleshooting sequencing runs by providing a reference point for comparison.
When to Use PhiX:
* Low-Complexity Libraries:
PhiX is particularly useful when sequencing libraries with low base diversity, as it provides the necessary complexity for accurate base-calling.
* Run Validation:
It’s a standard practice to use PhiX for validating new sequencing runs.
* Troubleshooting:
When encountering issues with sequencing quality, adding PhiX can help identify the source of the problem.
* Specific Illumina Platforms:
Illumina recommends PhiX for certain platforms like MiSeq and HiSeq, especially when sequencing low-diversity libraries.
Example:
For example, if you’re sequencing a library with a highly repetitive sequence (low diversity), adding PhiX can improve the quality of the sequencing run by providing a more diverse set of sequences for the instrument to analyze.
Next-generation sequencing experiments often require the addition of PhiX control DNA to libraries, especially those with low nucleotide diversity.
Here’s why and which library types are particularly susceptible:
* Why is PhiX needed? Next-generation sequencing platforms, particularly Illumina systems, rely on balanced nucleotide signals during the initial sequencing cycles for accurate base calling, cluster mapping, and overall data quality. Low diversity libraries, lacking this balance, can lead to:
* Negative impact on cluster mapping and template registration: If the library’s nucleotide composition is heavily skewed, the sequencing machine might struggle to properly distinguish and register individual clusters on the flow cell.
* Reduced data quality and output: Unbalanced signals can lead to errors in base calling and lower quality scores for reads, potentially compromising the overall data output.
* Challenges with phasing/pre-phasing, color matrix corrections, and pass filter calculations: These initial steps in the sequencing process are crucial for accurate data generation and are affected by low nucleotide diversity.
* How does PhiX help? PhiX Control v3 Library has a diverse base composition (45% GC and 55% AT), providing the balanced fluorescent signals required for optimal sequencing performance, according to Illumina. By spiking in PhiX, you effectively increase the overall nucleotide diversity of the sequencing run, improving template registration, cluster mapping, and overall run quality.
* Library types requiring higher than average PhiX:
* Restriction-site Associated DNA sequencing (RAD) Libraries: These libraries often exhibit low diversity due to the nature of their preparation, involving restriction enzyme digestion and subsequent amplification, says Novogene.
* Genotyping by Sequencing (GBS) Libraries: Similar to RAD libraries, GBS libraries involve restriction digestion and are prone to low diversity, requiring higher PhiX spike-in amounts.
* 10x Single Cell RNA Libraries and Single-cell DNA/RNA Libraries: Single-cell sequencing methods can generate libraries with limited diversity, especially when studying specific cell populations, according to Novogene.
* Amplicon Libraries: Libraries created by amplifying specific DNA regions, such as 16S rRNA gene libraries for microbiome studies, often have very low diversity due to the focus on a limited set of sequences.
* Important considerations: The optimal PhiX concentration can vary depending on factors such as the sequencing platform, the specific library type, and the clustering efficiency of the library compared to PhiX, says Illumina. It’s often recommended to start with a higher PhiX spike-in for low diversity libraries and adjust based on run performance and quality control metrics. While PhiX improves sequencing quality, it also reduces the number of reads available for your target library, as a portion of the reads will be from the PhiX genome.
Understanding traces from bioanalyzer, tapestation, and fragment analyzer
These instruments utilize microfluidic capillary electrophoresis to analyze the size, quantity, and quality of DNA and RNA samples, playing a crucial role in applications like next-generation sequencing (NGS). Here’s a summary of what to look for and the information extracted from the traces:
I. Bioanalyzer and TapeStation traces for RNA quality control
1. Ribosomal RNA Peaks:
* For eukaryotic RNA, expect two sharp, distinct ribosomal RNA (rRNA) peaks, 18S and 28S, in an intact RNA sample.
* The 28S peak should ideally be approximately twice as intense as the 18S peak (2:1 ratio).
* In prokaryotic RNA, look for 16S and 23S rRNA peaks.
2. Baseline: A relatively flat baseline between the rRNA peaks indicates high RNA integrity.
3. Degradation:
* Degraded RNA shows a smeared appearance, lacks sharp rRNA bands, or does not exhibit the 2:1 ratio of high-quality RNA.
* Completely degraded RNA appears as a very low molecular weight smear.
* Degraded RNA can also manifest as small, rounded peaks between the marker peak and the ribosomal peaks, or a noisy baseline with multiple peaks.
4. RNA Integrity Number (RIN): The Bioanalyzer algorithm calculates a RIN, a value from 1 to 10 that indicates RNA integrity, with 1 being degraded and 10 being intact. Generally, RIN scores of 7 to 10 are considered acceptable, but specific requirements depend on the downstream application.
5. Genomic DNA Contamination: Unexpected large peaks beyond the ribosomal peaks or an increase in the inter-region (between the ribosomal units) might indicate genomic DNA contamination.
II. Bioanalyzer, TapeStation, and Fragment Analyzer traces for DNA quality control
1. Lower and Upper Markers: These internal markers are used for accurate sizing and concentration calculations and are not part of the sample.
2. Library Peak: For Illumina sequencing, a typical library trace shows a main peak in the range of 200–1000 base pairs (bp).
3. Size Distribution: The shape and width of the library peak indicate the size distribution of the DNA fragments, which should be within the expected range for the library prep.
4. Contamination:
*
* Primer dimers or adapter dimers, appearing as peaks around 130 bp, limit usable sequencing reads and require clean-up.
* Free primers, around 65 bp, can also cause issues with cluster generation and reduce sequencing yield.
5. Concentration: The software uses the known concentration of the upper marker to determine the sample concentration. The height of the peaks can also provide a general idea of the concentration, with taller peaks suggesting higher quantities.
6. DNA Integrity Number (DIN): TapeStation systems can provide a DIN for genomic DNA, assessing its degradation on a scale of 1 (severely degraded) to 10 (highly intact).
III. General considerations
* Bubbles: Bubbles in the trace can interfere with analysis. Re-running the sample might be necessary if the bubble significantly affects the trace.
* Sample Mixing: Proper mixing of the sample with the diluent marker or dilution buffer is crucial before loading onto the instrument.
* Reagent Quality: Running a ladder or high-quality control samples can help identify issues with kit reagents or instrument performance.
* Software Analysis: The instruments’ software provides features for data analysis, including calculating integrity numbers (RIN, DIN), quantifying fragments, and visualizing traces.
By understanding these key elements of Bioanalyzer, TapeStation, and Fragment Analyzer traces, researchers can effectively assess the quality and integrity of their RNA and DNA samples for various molecular biology applications.
A primer dimer is an unintended by-product formed during the polymerase chain reaction (PCR), where two primers anneal (bind) to each other instead of to the target DNA sequence. This happens when primers have complementary base pairs, particularly at their 3′ ends.
Why primer dimers are an issue
1. Reduced PCR efficiency: Primer dimers deplete PCR reagents, such as primers and deoxynucleotide triphosphates (dNTPs), which are essential for the amplification of the target DNA sequence. This competition for reagents hinders the amplification of the desired DNA segment, leading to reduced efficiency and yield of the target product.
2. False positives: In PCR techniques like quantitative PCR (qPCR) that use fluorescent dyes like SYBR Green to detect double-stranded DNA, primer dimers can produce a fluorescent signal indistinguishable from the target amplicon, leading to false-positive results and inaccurate quantification.
3. Interference with analysis: Primer dimers are typically short DNA fragments (usually below 100 base pairs) and appear as a smear on gel electrophoresis, making it difficult to distinguish them from the desired PCR product. This can complicate downstream analysis and interpretation of results.
4. Reduced sensitivity: The formation of primer dimers, especially when low concentrations of the target gene are present, can lead to reduced amplification efficiency and potentially false-negative results, as the primers are not effectively binding to the target.
In summary, primer dimers significantly compromise the accuracy and reliability of PCR experiments by diverting resources from the target amplification and potentially leading to misinterpretations of the results.
Understanding indexing strategies in DNA sequencing
In next-generation sequencing (NGS), especially when performing multiplexing (sequencing multiple samples in a single run), it’s crucial to differentiate between reads originating from different samples. Indexing, achieved by adding unique short DNA sequences (barcodes) to each sample during library preparation, allows for this demultiplexing.
Here are the different indexing methods:
1. Single indexing
* Mechanism: Only one index sequence (typically the i7 index, or Index 1) is added to each DNA fragment during library preparation.
* Advantages:
* Simpler and faster workflow as only one index read is performed.
* Shorter sequencing run time.
* Disadvantages:
* Less accurate demultiplexing, especially with higher multiplexing levels, due to the increased risk of index mis-assignment (index hopping).
* Not recommended for applications requiring high accuracy, like oncology research.
* Use cases: Workflows requiring low levels of multiplexing or not requiring ultra-high resolution.
2. Dual indexing
Dual indexing involves using two index sequences per DNA fragment, typically one on each end (i7/Index 1 and i5/Index 2). This significantly enhances demultiplexing accuracy compared to single indexing. There are two main approaches to dual indexing:
* (Image from Lexogen)
* A. Combinatorial dual indexing (CDI)
* Mechanism: Uses a fixed set of i7 and i5 indices in various combinations, creating unique pairs for each sample within the pool.
* Advantages:
* Enables high-scale multiplexing (thousands of samples per run).
* Provides increased efficiency and reduced cost per sample compared to single indexing.
* Disadvantages:
* More susceptible to index hopping, where reads can be mistakenly assigned to the wrong sample, particularly on patterned flow cell instruments.
* Index adapters may share some sequences.
* B. Unique dual indexing (UDI)
* Mechanism: Utilizes entirely unique i7 and i5 index sequences for each sample, with no overlap or reuse of individual index sequences within the set.
* Advantages:
* Greatly mitigates index hopping, by allowing unexpected index combinations to be filtered out during demultiplexing.
* Higher accuracy in sample identification and demultiplexing.
* Reduced per-sample costs through increased multiplexing efficiency compared to CDI or single indexing.
* Disadvantages:
* May lead to a higher rate of discarded data compared to CDI if index hopping occurs, because incorrect combinations are filtered out.
* Use cases: High sensitivity applications, such as low allele fraction tumor sequencing or detection of rare transcripts. Recommended for instruments with patterned flow cells, like the NovaSeq 6000.
3. Inline barcodes
* Mechanism: Barcode sequences are placed directly adjacent to the sample DNA and read as part of the same sequencing read (insert read).
* Advantages:
* Easy to add and integrate into the sequencing workflow.
* Cheaper and potentially faster to analyze and process, as no dedicated index reads are required.
* Disadvantages:
* Consumes part of the sequencing read length, potentially limiting the length of the biological sequence that can be read.
* Requires bioinformatics analysis for demultiplexing, as the barcodes are part of the sequence data.
* Use cases: Recommended for long reads of well-annotated genomes, such as in ChIP-Seq or RNA-seq. Can be combined with multiplex indexing for greater flexibility.
In summary
* Single indexing is simpler and faster but less accurate.
* Dual indexing provides higher accuracy and better multiplexing capacity, with Unique Dual Indexes (UDI) being the preferred method to mitigate index hopping, especially on newer Illumina instruments.
* Inline barcodes are integrated into the read itself, offering ease of use and cost-effectiveness, but consuming read length.
The choice of indexing strategy depends on the specific research question, the desired level of multiplexing, the sensitivity required for analysis, and the type of sequencing platform used.
a. Which of these components are mandatory vs optional?
Understanding the components of an Illumina sequencing library
Illumina sequencing libraries are composed of several key elements that allow for the efficient and accurate sequencing of DNA fragments. Here’s a breakdown of the parts, their order, and purpose, along with which are mandatory and optional:
Essential components
* P5 and P7 Adapters: These are critical for the library fragments to bind to the flow cell surface during the sequencing process. They also act as primers for the initial amplification steps on the flow cell. All Illumina sequencing requires full-length P5 and P7 sequences.
* Sequencing Primer Binding Sites: These are short DNA sequences where the sequencing primers anneal to initiate the sequencing reaction itself. They are located within the P5 and P7 adapter structures.
* Insert: This refers to the actual DNA fragment being sequenced. It’s the region of interest that is inserted between the adapters during library preparation.
Optional components
* i7 and i5 Indices (Barcodes): These short sequences are unique identifiers that are attached to each sample during library preparation. They allow multiple samples to be pooled and sequenced together, with the indices used to demultiplex the data and identify the origin of each read. Libraries can be single-indexed (only i7) or dual-indexed (i5 and i7). Using indices is optional, particularly when sequencing a single sample per flow cell lane, but highly recommended for multiplexing to save costs and increase throughput.
Typical order of elements in an Illumina library
While the exact order can vary slightly based on the library preparation method (e.g., single vs. dual indexing), a common structure is:
P5 Adapter – i5 Index (optional) – Read 1 Primer Binding Site – Insert – Read 2 Primer Binding Site – i7 Index (optional) – P7 Adapter
* The P5 and P7 adapters are always on the ends of the fragment.
* The Read 1 primer binding site is typically located near the P5 adapter.
* The i5 index (when used) is usually positioned between the P5 adapter and the Read 1 primer binding site.
* The insert containing the DNA sequence of interest is in the middle.
* The Read 2 primer binding site (and sometimes the i7 index, if dual-indexed) follows the insert.
* The i7 index is typically situated between the Read 2 primer binding site and the P7 adapter.
In summary
The P5 and P7 adapters and the sequencing primer binding sites are mandatory elements of an Illumina library, forming the foundation for flow cell binding and sequencing initiation. The insert represents the DNA being sequenced and is the core of the library. The i7 and i5 indices are optional but highly beneficial for multiplexing samples and efficiently utilizing sequencing runs.
17.18. What is plexity?
“Plexity” refers to the number of equivalent elements that make up something. It describes how a quantity is divided into individual, identical parts. For example, in the context of library preparation, plexity indicates how many pre-enriched libraries are combined in a single reaction.
Here’s a more detailed breakdown:
* Uniplex: A uniplex system or quantity has only one element.
* Multiplex: A multiplex system or quantity has more than one element.
* General Concept: “Plexity” is a way to categorize things based on how many equivalent parts they are composed of, similar to the grammatical concept of “number” (singular, plural) when discussing things like nouns.
* Example: In Illumina’s library preparation kits, a “12-plex” kit means you can combine up to 12 libraries in a single reaction.
18.19. When reviewing indicies for plexity concerns – what color (and associated base pair(s)) should ideally be present in every cycle
When reviewing indices for plexity concerns in Illumina sequencing, signal in both channels (colors) is ideal for every cycle to ensure proper image registration and accurate base calling.
Here’s why and what that means for different Illumina systems:
* Two-Channel Systems (like NovaSeq X Series, NextSeq 1000/2000): These systems use two colors or channels to detect the bases.
* XLEAP-SBS Chemistry: It’s recommended to combine index sequences such that signal is present in both channels for every cycle whenever possible. This is because C is the dual-color nucleotide (generates signal in both channels), while A generates signal in both channels in standard SBS chemistry. The ideal situation ensures a balanced signal for optimal sequencing.
* Standard SBS Chemistry: Select index sequences that provide signal in at least one channel, preferably both, for every cycle. This means avoiding cycles with only G bases, as G produces no signal.
* HiSeq/MiSeq: These systems use red for A and C, and green for G and T. For proper image registration, both red (A or C) and green (G or T) channels need to be read in each cycle.
In summary, for reliable sequencing and demultiplexing, especially with low plexity pools, it’s crucial to select index combinations that ensure color diversity in every cycle. This prevents potential issues like registration failures and increased undetermined reads.
19.20. What is the index N-call issue on the X Plus?
The phrase “Index N-call issue on the X Plus” is not a standard or widely recognized technical term related to phones or consumer electronics. However, based on the search results, it seems like the question could refer to two different contexts:
1. Troubleshooting issues with the “Index” feature on a phone:
* Index is a specific calling feature or app, and issues might arise where incoming calls appear as “Incoming Index Call” or only show the phone number instead of the contact name.
* These issues can be caused by various factors, including permissions, VoIP calling settings, call forwarding, device settings (like blocking unknown numbers), or carrier-related issues like spam filtering services (e.g., AT&T’s Mobile Security & Call Protect or T-Mobile’s Call Protection Solutions) blocking calls, according to Index Support.
*
2. Troubleshooting issues with the “NovaSeq X/X Plus” system:
* The NovaSeq X Plus is a high-throughput sequencing system, and technical issues related to “multiple entries with identical indices” during run setup can occur, preventing users from proceeding with the run, according to Illumina.
* These errors might be due to a single index or index pair being present multiple times in the lane, requiring unique indexing for proper demultiplexing, or incorrect formatting of the sample sheet.
*
Assuming the question refers to the phone context, the “Index N-call issue” likely points to problems with the “Index” calling feature, resulting in issues with caller ID display and potentially preventing incoming calls from ringing or displaying contact information correctly.
20.21. What is the index miscall issue on the X Plus?
In the context of Illumina’s NovaSeq X/X Plus instrument, an “index miscall issue” typically refers to an error where the sequencing instrument incorrectly identifies or assigns index sequences during the demultiplexing process.
Causes of index miscall issues can include:
* Duplicate index sequences: When a single lane within the run contains multiple samples with identical index sequences, the instrument cannot uniquely identify each sample during demultiplexing, leading to an error.
* Incorrect sample sheet formatting: Errors in the sample sheet, such as missing headers or incorrect index entries, can prevent the instrument from properly assigning indexes to samples.
* Signal registration issues: Certain index sequences (e.g., starting with two G bases) may not generate sufficient signal intensity for accurate detection by the instrument, causing registration problems.
* Suboptimal color balance: Poorly color-balanced index pools can lead to base calling errors and misassignments, especially when running lower plexity pools.
* Library overloading or underloading: Sequencing libraries at concentrations outside the optimal range can negatively impact demultiplexing results, even with optimally color-balanced pools.
To troubleshoot and resolve index miscall issues:
* Verify index uniqueness: Ensure each sample within a lane has a unique index sequence.
* Check sample sheet format: Confirm the sample sheet has the correct headers and is formatted according to guidelines.
* Optimize index selection: Choose index sequences that provide good color balance and avoid sequences with known registration issues.
* Adjust loading concentration: Sequence libraries at their recommended optimal loading concentrations.
* Correct index errors in BaseSpace Sequence Hub: If using BaseSpace, you can fix index errors and regenerate FASTQ files up to five times.
It’s important to note that these guidelines are specific to Illumina sequencing instruments like the NovaSeq X/X Plus. If you encounter these errors, it is recommended to consult the Illumina knowledge base or contact Illumina Technical Support for further assistance.
21.22. The different disclaimers that are provided to clients (NXP and partial lane), and when to provide them
NXP Sequencing (Full Lane Sequencing on NovaSeq X Plus)
* Reliability and Precision: The full lane sequencing service offers entire lanes on the NovaSeq X Plus, aiming to minimize cross-contamination and ensure reliable and precise results.
* Base Diversity and Index Adapters: To maintain high-quality data, it’s crucial to preserve base diversity on the lane. Clients are recommended to select index adapters with diverse index sequences that optimize color balance within pooled libraries. This strategy is essential for successful demultiplexing and subsequent data analysis.
When to provide NXP sequencing disclaimers
* Project Commencement: These points should be clearly communicated to the client when discussing and initiating a full lane sequencing project, particularly during the quote or agreement phase.
* Sample Submission and Preparation: Ensuring the client understands the importance of index adapter diversity is crucial before sample submission.
2. Partial Lane Sequencing
* Economical Flexibility: Partial lane sequencing allows multiple projects to share a lane, offering a more affordable option for smaller data volume needs.
* Library Grouping and Balancing: Novogene will group and balance client libraries with other samples to optimize base diversity on the lane, potentially eliminating the need for PhiX spike-in libraries.
* Index Information and Responsibility: It is crucial for clients to provide accurate and correct index information in the proper orientation. Failure to do so may result in additional charges. In the event of sequencing failure due to incorrect information (including index information or library type), the client will be liable to pay Novogene a fee for the incurred loss.
* Sample Pooling Recommendations: It’s recommended to send samples before pooling, but if necessary, it’s better not to pool more than 10 sub-libraries in one tube.
* Turnaround Time: Express turnaround times for partial lane sequencing are also available, with the data QC report delivered within 7 working days for projects with large sample sizes (after confirmation of the library QC report).
When to provide partial lane sequencing disclaimers
* Project Discussion and Quote Phase: The benefits and limitations of partial lane sequencing, particularly regarding index information and potential charges for errors, should be discussed thoroughly with the client when they are considering this option.
* Sample Submission and Guidelines: Clear guidelines on sample pooling and index submission should be provided to the client before or during the sample submission process.
General disclaimers applicable to all projects
* Data Confidentiality: Novogene prioritizes the safety and confidentiality of project data, guaranteeing it will not be disclosed to other parties unless specified by the client. Upon request, a written confidentiality agreement can be provided.
* Data Deletion Policy: Data will be deleted from Novogene’s databases within a specified period (90 days) following data delivery. Clients are responsible for storing their data promptly and carefully.
* Sample Quality and Preparation: Clients are responsible for ensuring their samples meet Novogene’s quality standards and are prepared and packaged according to their guidelines.
* QC Analysis and Data Quality Guarantee: Novogene will perform stringent QC on raw data to ensure accuracy and reliability, including analyzing quality, error rate, Q20, Q30, and adapter contamination. They guarantee that ≥ 80% of bases will have a sequencing quality score ≥ Q30.
* Third-Party Links: Novogene’s services may contain links to third-party websites or services. Clients should be aware that Novogene is not responsible for the privacy practices of these third parties.
* Product Change Notices (PCNs): NXP, a potential partner or technology provider, communicates any changes affecting product fit, form, function, quality, or reliability through PCNs 90 days before implementation. These notices include summary information, effective date, impacted part numbers, and contacts for further information.
When to provide general disclaimers
* Initial Client Onboarding and Agreement: These disclaimers should be provided to clients during the initial onboarding process, ideally included within the service agreement or terms and conditions.
* Service Support and Communication: Clear communication channels and accessible support resources should be in place to address client inquiries about data management, sample submission, or any other aspect of the service.
* Data Delivery and Deletion: Clients should be reminded about the data deletion policy at the time of data delivery.
Note: It’s important to consult Novogene’s official website and specific project documentation for the most up-to-date and complete list of disclaimers and terms and conditions.
22.23. What is SWIFT-Seq?
SWIFT-Seq (Swift Normalase Amplicon Panels) is a Next-Generation Sequencing (NGS) method used for library preparation and sequencing, particularly for applications like amplicon-based sequencing of pathogens such as SARS-CoV-2. It leverages the patented Adaptase® technology by Swift Biosciences.
Here’s a breakdown:
* Mechanism: SWIFT-Seq uses Adaptase technology for efficient library preparation, converting cDNA into libraries for sequencers like Illumina®. This technology has also been adapted for single-cell methyl-seq, improving read-mapping rates and reducing costs for low-input DNA.
* Applications:
* SARS-CoV-2 Sequencing: It’s been used for high-quality genome recovery from clinical specimens to study the virus.
* Various NGS Applications: Swift Biosciences offers solutions for whole-genome sequencing, targeted DNA sequencing, and epigenetic analysis.
* Single-Cell RNA Sequencing: A method called SWIFT-Seq is used for single-cell RNA sequencing of circulating tumor cells in multiple myeloma patients, allowing for profiling and potentially consolidating multiple tests.
* Other Applications: It’s also used for ChIP-Seq, metagenomics, and various sample types.
* Advantages:
* Faster turnaround times: Streamlined workflows contribute to quicker results.
* Increased Read-Mapping Rates: Improved read-mapping rates have been observed in single-cell methyl-seq.
* Reduced Costs: Efficiency improvements can lead to cost savings.
* Versatility: It’s compatible with various library types and applications, including challenging samples.
In essence, SWIFT-Seq offers a suite of sequencing technologies aimed at providing faster, more efficient, and cost-effective solutions for various research and clinical applications in next-generation sequencing.
23.24. What type of client is an ideal candidate for SWIFT Seq?
A key strength of NGS, including SWIFT Seq, is its high-throughput capability, allowing the sequencing of millions of DNA fragments simultaneously. This makes it particularly well-suited for large-scale sequencing projects, such as whole-genome sequencing, whole-exome sequencing, and transcriptome analysis, which would be time-consuming and expensive using traditional methods like Sanger sequencing.
Based on the information available, an ideal client for SWIFT Seq NGS would likely be:
* Researchers and institutions with large-scale sequencing needs: Projects involving the sequencing of numerous samples or entire genomes can greatly benefit from the speed and efficiency of NGS.
* Researchers focused on discovering new mutations or analyzing large panels of genes: NGS allows for the efficient scanning of large genomic regions to detect novel variants or assess a broad range of genes simultaneously.
* Researchers working with limited or degraded samples: NGS can produce high-quality data from small amounts of material, such as FFPE samples or liquid biopsies.
It’s important to note that while NGS offers numerous advantages, it can be more expensive than Sanger sequencing for targeted sequencing of a small number of genes, and requires more advanced instrumentation and bioinformatics expertise. Therefore, the choice of sequencing technique depends on the specific needs of the project.
24.25. What are the different ways data can be released to a client (including the relevant costs)?
Novogene primarily delivers sequencing data to clients through their Customer Service System (CSS), a cloud-based platform allowing researchers to track their projects and access data online. Data can also be delivered via FTP or other command-line tools like rsync or wget for large files.
Regarding costs associated with data release:
* Access to the Customer Service System (CSS) and downloading data through it is generally free for Novogene clients with active projects.
* Downloading through FTP or command-line tools doesn’t incur direct costs from Novogene, but users may have to consider internet service provider data charges, especially for large datasets.
* Novogene explicitly states they do not charge for faster turnaround time for data release.
* However, Novogene may charge for returning remaining samples after sequencing, with the price depending on package details.
It’s important to note that Novogene deletes data from their servers within a specified period (typically 90 days for Illumina data and 60 days for PacBio and Nanopore data) after delivery, so clients are advised to download and verify their data promptly.
25.26. What is included in the data QC report for FASTQ files?
A FastQC report, generated for quality control of sequencing data in FASTQ files, includes several modules that provide detailed information about the quality and characteristics of the reads.
A FastQC report typically includes several modules:
* Basic Statistics: Provides fundamental file information like file name, quality score encoding, number of reads, sequence length, and GC content.
* Per Base Sequence Quality: A plot showing the distribution of quality scores at each position across all reads using a box-and-whisker plot and average line.
* Per Sequence Quality Scores: Displays the number of reads with a specific mean quality score.
* Per Base Sequence Content: Shows the proportion of each base (A, T, C, G) at each position.
* Per Sequence GC Content: Displays the GC distribution over all sequences and compares it to a normal distribution.
* Per Base N Content: Shows the percentage of ‘N’ calls at each position.
* Sequence Length Distribution: Displays the distribution of sequence lengths.
* Per Tile Sequence Quality: Assesses quality across different regions on the flow cell.
* Sequence Duplication Levels: Reports the percentage of duplicate sequences.
* Overrepresented Sequences: Identifies sequences present more often than expected, which could indicate issues like contamination or low library diversity.
* Adapter Content: Checks for the presence and abundance of common adapter sequences.
Each module is flagged as “Passed”, “Warn”, or “Fail” based on thresholds. These flags are based on assumptions and should be interpreted in the context of the specific experiment.
26.27. What is included in the data QC report for BCL files?
A data quality control (QC) report for BCL files typically includes metrics and analyses that assess the quality of raw sequencing data generated by Illumina sequencers. While BCL files themselves are binary and not directly interpretable by most analysis software, they are converted into FASTQ files, which form the basis for QC analysis.
Here’s a breakdown of common elements in such a report:
* FastQC Reports: These reports provide comprehensive quality metrics, often in the form of HTML documents or text files. Key elements include:
* Per base sequence quality: A plot showing the distribution of quality scores at each position within the reads, across all reads.
* Per sequence quality scores: A histogram of the mean quality scores for individual reads.
* Per base sequence content: Displays the frequency of each nucleotide (A, C, G, T) at every position in the read.
* Per sequence GC content: A histogram showing the distribution of GC content across all reads.
* Sequence length distribution: A histogram illustrating the lengths of the sequenced reads.
* Sequence duplication levels: Indicates the number of duplicate reads found.
* Overrepresented sequences: Identifies sequences that occur more frequently than expected by chance, potentially indicating adapter contamination or PCR artifacts.
* Adapter content: Specifically checks for the presence and frequency of known sequencing adapter sequences within the reads.
* Summary statistics: Beyond FastQC plots, reports often include overall summaries of the sequencing run, such as:
* Total number of reads
* Total number of bases sequenced
* Mean, median, and maximum read length
* N50, a metric indicating the length of the shortest sequence for which at least 50% of the total length of the sequences is contained in sequences of that length or longer
* Percentage of mapped reads (post-demultiplexing)
* Mean quality score across all bases
* Number of bases with quality scores above a specific threshold (e.g., Q30)
* Demultiplexing statistics: When samples are pooled and indexed before sequencing, the demultiplexing process separates the data into individual samples. A QC report will include metrics related to this, such as:
* Number of mapped reads with barcodes matching, potentially allowing for a certain number of mismatches
* Number of reads assigned to each sample
* Information about index reads, including the number of mismatches allowed
* Adapter trimming and masking information: Reports can detail the parameters used for adapter trimming and masking, such as the adapter sequences used and the minimum adapter overlap allowed before trimming.
Essentially, the data QC report aims to provide a comprehensive overview of the sequencing run’s quality, identifying potential issues like low quality reads, adapter contamination, or biases that could impact downstream analysis. This information is crucial for determining if the data meets the required standards for subsequent bioinformatics pipelines.
27.28. What are Phred-scores?
In DNA sequencing, a Phred quality score (also known as a Q score) is a measure of the quality or accuracy of a nucleotide base call. It quantifies the probability that a base was identified incorrectly by the sequencing instrument.
Here’s a breakdown of Phred scores:
* Logarithmic scale: Phred quality scores are expressed on a negative logarithmic scale. This means a higher Q score indicates a lower probability of error and thus a more confident base call.
* Calculating the score: The score is calculated using the formula: Q = -10 log₁₀(P), where P is the probability of an incorrect base call.
* Interpretation: For example:
* A Phred score of 10 means there is a 1 in 10 chance of an incorrect base call (90% accuracy).
* A Phred score of 20 means there is a 1 in 100 chance of an incorrect base call (99% accuracy).
* A Phred score of 30 means there is a 1 in 1000 chance of an incorrect base call (99.9% accuracy).
* Encoding in FASTQ files: Phred scores are typically stored alongside DNA sequences in the FASTQ format, encoded as ASCII characters.
Applications of Phred scores
* Sequence Quality Assessment: Phred scores are used to assess the overall quality of sequencing data and identify regions of low quality within a sequence.
* Filtering and Trimming: Low-quality bases (those with low Phred scores) can be removed or trimmed from sequencing reads to improve the accuracy of downstream analysis.
* Consensus Sequence Determination: Phred scores are crucial in DNA sequence assembly, where multiple overlapping reads are combined to create a single, more accurate consensus sequence. Higher Phred scores contribute to higher confidence in the consensus sequence.
* Variant Calling: Phred scores also play a role in variant calling, helping to estimate the confidence that a particular variation (like a single nucleotide polymorphism or SNP) detected in the sequenced DNA is real and not a sequencing error.