Conventional DNA sequencing is getting more powerful and time-effective. However, it is still based on amplification, which involves using a DNA polymerase such as Taq to produce numerous copies of the sequence to be analysed. The problem with amplification is that it can increase the risk of false-positive results in terms of pertinent mutations.
Other methods, such as short-sequence high-throughput screening, are also very accurate, but are not applicable to scans of larger portions of DNA, such as haplotyping. A new sequencing technique, developed at UCSD, can sequence strands that encompass haplotypes with greater accuracy than PCR, according to the team behind it.
Modern-day DNA sequencing methods, such as those based on the polymerase chain reaction (PCR) are accurate enough to detect single-nucleotide polymorphisms (SNPs), which are among the finest variations the human genome can present. However, they can be associated with ‘false calls’, which are erroneous SNP detections that can considerably exaggerate the actual risk of a somatic mutation.
‘False calls’ can occur in relatively high numbers (e.g. in five figures) per sequencing analysis. In addition, clinical medicine also benefits from the study of genetic features at the larger scale. These include haplotypes, which are varying numbers of alleles often closely related by chromosomal location and the functions of the proteins they express. For example, the HLA haplotype, which spans about five thousand base-pairs on chromosome 6, codes for important immune-system proteins and plays a role in organ donor compatibility. Haplotype analysis may require whole-genome sequencing (WGS), in which all chromosomal DNA in a cell is amplified and scanned for anomalies. However, this increases the risk of false calls.
Therefore, a team from the Departments of Bioengineering, Computer Science & Engineering, Electrical & Computer Engineering and Paediatrics at the University of California (San Diego) proposed a new DNA sequencing system, involving the use of a microfluidic processor. This device has multiple compartments, one of which isolates a single cell for DNA extraction. Other chambers extract the genome from the cell, and separate the twin strands of DNA. One of these strands is then broken down into 24 fragments, each of which is sent into its own individual container. They are then amplified using the multiple deflection technique (or MDA).
New amplification technique
The team behind this new amplification technique call it SISSOR, or single-stranded sequencing using microfluidic reactors. They claim that it results in a considerably reduced error rate (of about 1x10-8 compared to about 1x10-5 following conventional MDA) due to a number of factors and optimisations. These include the reduced risk of DNA contamination and the fine-tuning of the steps involved in DNA purification and amplification. Examples of these were fine-tuning the concentrations of alkaline solution and temperatures in the strand-separation chamber; optimising the MDA process with the high-accuracy Phi-29 polymerase, and longer primers (short strands of custom-generated DNA that start the amplification process).
The team tested the SISSOR method on the well-known PGP1 human cell line. They isolated three single cells, amplified them as described, and then converted the resulting 24 fragments for each cell into barcoded sequence-data libraries, which were then converted into full DNA sequences using the conventional Illumina method. These were mapped onto a human reference genome, which generated up to 98 percent homology in terms of mappable bases (or ‘reads’). The combined reads from all three cells resulted in 94.9 percent coverage of the reference genome. However, each individual cell generated approximately 63 percent coverage, indicating that certain fragments were lost in the course of each individual SISSOR process.
The human DNA (CC BY-SA 4.0)
The team analysed the PGP1 genome by reconstructing the fragments into a whole genome in the analysis stage, using the Hidden Markov model (HMM). This resulted in the consistent detection of fragment boundaries, which in turn allowed the team to detect one or more alleles in a haplotype with high accuracy. The team were able to assemble haplotypes at a rate of about 500 kilobases at a time, which is a length of about tenfold compared to those achieved using conventional assembly methods. The team’s analysis method improved the chances that each allele is only ‘called’ once, regardless of how many separate chambers across multiple cells it is found in. On the other hand, variations within these same alleles could be called with greater confidence than variations within an allele analysed using conventional sequencing processes.
The SISSOR technique may produce multiple copies of the same allele (especially if a number of cells are used, as in this study) to compare for each variation detected. In other words, if the variation is found in all copies of the same allele from three identical cells, this improves the real picture of its statistical power and significance. The team developed an algorithm for variant-calling that reflected this and also corrected for potential errors such as those caused by MDA. The allele sequences were assembled into those for full haplotypes using the popular algorithm for this purpose, HapCUT2, which generated contiguous haplotype sequences of an average of 7 megabases (Mb) in length (compared to about 3Mb for pre-existing PGP1 haplotypes). In all, the team detected 1.2 million SNPs in their genome, which were found to have a 99.3 percent rate of agreement across all fragments. This compares well with those associated with pre-existing PGP1 sequencing data.
SISSOR, as described in an advance publication of the journal PNAS, is a novel sequencing technique that takes a strand of chromosomal DNA and breaks it down into 24 relatively large fragments, which are then amplified separately using an optimised microfluidic processor for high-fidelity sequencing using conventional techniques. This allows for the re-assembly of this data into ultra-long haplotype sequences that possess highly accurate information on the location of SNPs and how likely they are to cause a dangerous mutation in living cells. Therefore, the SISSOR method need not rely on existing sequencing libraries of the haplotype or genome to be analysed, but instead has the option of comparing multiple copies of the same allele from a small number of donor cells. The team of UCSD researchers behind this new sequencing technique believe it may have a role in applications such as IVF, in which the cells available for testing are severely limited.
Top image: Human DNA Analysis. (Public Domain)
Chu WK, Edge P, Lee HS, Bansal V, Bafna V, Huang X, et al. Ultraaccurate genome sequencing and haplotyping of single human cells. Proceedings of the National Academy of Sciences. 2017. Available at: http://www.pnas.org/content/early/2017/10/23/1707609114.full.pdf
Ramsey D. UC San Diego Scientists Create Device for Ultra-Accurate Genome Sequencing of Single Human Cells. UCSD News Center. 2017. Available at: http://ucsdnews.ucsd.edu/pressrelease/uc_san_diego_scientists_create_device_for_ultra_accurate_genome_sequencing