DNA nanoball sequencing
Encyclopedia
DNA nanoball sequencing is a high throughput sequencing technology that is used to determine the entire genomic sequence of an organism. The method uses rolling circle replication
Rolling circle replication
Rolling circle replication describes a process of unidirectional nucleic acid replication that can rapidly synthesize multiple copies of circular molecules of DNA or RNA, such as plasmids, the genomes of bacteriophages, and the circular RNA genome of viroids...

 to amplify small fragments of genomic DNA into DNA nanoballs. Fluorescent probes bind to complementary DNA
Complementary DNA
In genetics, complementary DNA is DNA synthesized from a messenger RNA template in a reaction catalyzed by the enzyme reverse transcriptase and the enzyme DNA polymerase. cDNA is often used to clone eukaryotic genes in prokaryotes...

 and the probes are then ligated to anchor sequences bound to known sequences on the DNA template. The base order is determined via the fluorescence
Fluorescence
Fluorescence is the emission of light by a substance that has absorbed light or other electromagnetic radiation of a different wavelength. It is a form of luminescence. In most cases, emitted light has a longer wavelength, and therefore lower energy, than the absorbed radiation...

 of the ligated and bound probes. This DNA sequencing
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....

 method allows large numbers of DNA nanoballs to be sequenced per run at lower reagent
Reagent
A reagent is a "substance or compound that is added to a system in order to bring about a chemical reaction, or added to see if a reaction occurs." Although the terms reactant and reagent are often used interchangeably, a reactant is less specifically a "substance that is consumed in the course of...

 costs compared to other next generation sequencing platforms. However, a limitation of this method is that it generates only short sequences of DNA, which presents challenges to mapping its reads to a reference genome
Reference genome
A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of a species' genetic code. As they are often assembled from the sequencing of DNA from a number of donors, reference genomes do not accurately represent the genetic code of any...

. The company Complete Genomics
Complete Genomics
Complete Genomics is a life sciences company that has developed and commercialized a DNA sequencing platform for human genome sequencing and analysis. This solution combines the company’s proprietary human genome sequencing technology with its informatics and data management software in an...

 uses DNA nanoball sequencing to sequence samples submitted by researchers.

Procedure

DNA Nanoball Sequencing involves isolating DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 that is to be sequenced, shearing it into small 400 – 500 base pair
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...

 (bp) fragments, ligating adapter sequences to the fragments, and circularizing the fragments. The circular fragments are copied by rolling circle replication
Rolling circle replication
Rolling circle replication describes a process of unidirectional nucleic acid replication that can rapidly synthesize multiple copies of circular molecules of DNA or RNA, such as plasmids, the genomes of bacteriophages, and the circular RNA genome of viroids...

 resulting in many single-stranded copies of each fragment. The DNA copies cocatenate head to tail in a long strand, and are compacted into a DNA nanoball. The nanoballs are then adsorbed onto a sequencing flow-cell. Unchained sequencing reactions interrogate specific nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

 locations in the nanoball by ligating fluorescent probes to the DNA. The color of the fluorescence
Fluorescence
Fluorescence is the emission of light by a substance that has absorbed light or other electromagnetic radiation of a different wavelength. It is a form of luminescence. In most cases, emitted light has a longer wavelength, and therefore lower energy, than the absorbed radiation...

 at each interrogated position is recorded through a high-resolution camera. Bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

 are used to analyze the fluorescence data and make a base call, and for mapping the 35-bp mate pair reads to a reference genome
Reference genome
A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of a species' genetic code. As they are often assembled from the sequencing of DNA from a number of donors, reference genomes do not accurately represent the genetic code of any...

. The genome is assembled and any polymorphisms present in the sequence are identified.

DNA Isolation, fragmentation, and size capture

Cells are lysed and DNA is extracted
DNA extraction
DNA isolation is a routine procedure to collect DNA for subsequent molecular or forensic analysis. There are three basic and two optional steps in a DNA extraction:...

 from the cell lysate. The high-molecular-weight DNA, often several megabase pairs long, is sonicated to break the DNA double-strands at random intervals. Bioinformatic mapping of the sequencing reads is most efficient when the sample DNA contains a narrow length range. Therefore, selecting the ideal fragment lengths of the DNA for sequencing the fragments are size separated by polyacrylamide gel electrophoresis (PAGE). The DNA of the suitable size range is purified by gel extraction, resulting in DNA with lengths within a narrow range (typically 400 – 500 base pairs).

Attaching adapter sequences

Adapter DNA sequences must be attached to the unknown DNA fragment so that DNA with known sequences flank the unknown DNA. In the first round of adapter ligation, a right and left adapter (Ad1) is attached to the right and left flanks of the fragmented DNA, and the DNA is PCR amplified (Figure 3). The right and left Ad1 are modified to create complementary single strand ends that bind to each other and form circular DNA. A restriction enzyme
Restriction enzyme
A Restriction Enzyme is an enzyme that cuts double-stranded DNA at specific recognition nucleotide sequences known as restriction sites. Such enzymes, found in bacteria and archaea, are thought to have evolved to provide a defense mechanism against invading viruses...

 is added which cleaves the DNA 13 bp to the right of the right adapter. This results in linear double-stranded DNA (Figure 3). Right and left adapter sequences (Ad2) are ligated onto the ends of the linear DNA and the product is PCR amplified. The Ad2 sequences are modified to allow them to bind each other and form circular DNA
Circular DNA
Circular DNA is a form of DNA that is found in viruses, bacteria and archaea as well as in eukaryotic cells in the form of either mitochondrial DNA or plastid DNA....

. The restriction enzyme is used again to cleave the circular DNA 13 bp to the left of Ad1. The result is a linear DNA fragment (Figure 3). Right and left adapter sequences (Ad3) are ligated to the right and left flank of the linear DNA and the product is PCR amplified. The adapters are modified so that they bind to each other and form circular DNA. The type III restriction enzyme EcoP15 is added, which cleaves the DNA 26 bp to the left of Ad3 and 26 bp to the right of Ad2. This step removes a large segment of DNA and linearizes the DNA once again. Right and left adapters (Ad4) are ligated to the DNA, the product is PCR amplified, and the Ad4 sequences are modified so that they bind each other. The result is the completed circular DNA template.

Rolling circle replication

Once a circular DNA template, containing sample DNA that is ligated to four unique adapter sequences has been generated, the full sequence is amplified into a long string of DNA. This is accomplished by rolling circle replication
Rolling circle replication
Rolling circle replication describes a process of unidirectional nucleic acid replication that can rapidly synthesize multiple copies of circular molecules of DNA or RNA, such as plasmids, the genomes of bacteriophages, and the circular RNA genome of viroids...

 with the Phi 29 DNA polymerase which binds and replicates the DNA template (Figure 4). The newly synthesized strand is released from the circular template, resulting in a long single-stranded DNA comprising several head-to-tail copies of the circular template. The four adapter sequences contain palindromic sequences which hybridize and cause the single strand to fold onto itself, resulting in a tight ball of DNA approximately 300 nanometers (nm) across. This allows the nanoballs to remain separated from each other and reduces any tangling between different single stranded DNA lengths.

DNA nanoball microarray

To obtain DNA sequence, the DNA nanoballs are attached to a microarray
Microarray
A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate that assays large amounts of biological material using high-throughput screening methods.Types of microarrays include:...

 flow cell (Figure 5). The flow cell is a 25 mm by 75 mm silicon wafer coated with silicon dioxide
Silicon dioxide
The chemical compound silicon dioxide, also known as silica , is an oxide of silicon with the chemical formula '. It has been known for its hardness since antiquity...

, titanium
Titanium
Titanium is a chemical element with the symbol Ti and atomic number 22. It has a low density and is a strong, lustrous, corrosion-resistant transition metal with a silver color....

, hexamethyldisilazane (HMDS), and a photoresist
Photoresist
A photoresist is a light-sensitive material used in several industrial processes, such as photolithography and photoengraving to form a patterned coating on a surface.-Tone:Photoresists are classified into two groups: positive resists and negative resists....

 material. The DNA nanoballs are added to the flow cell and selectively bind to the aminosilane in a highly ordered pattern, allowing a very high density of DNA nanoballs to be sequenced.

Unchained sequencing by ligation

The order of the DNA bases between the adapter sequences is determined after being arrayed onto a flow cell (Figure 6). First, oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...

 anchor DNA that is complementary to either the right or left end of one of the adapters is added to the flow cell. Next, T4 DNA ligase
DNA ligase
In molecular biology, DNA ligase is a specific type of enzyme, a ligase, that repairs single-stranded discontinuities in double stranded DNA molecules, in simple words strands that have double-strand break . Purified DNA ligase is used in gene cloning to join DNA molecules together...

 is added to a pool of four 10-mer DNA sequences that have degenerate nucleotides in all but one position (for example position 1 next to the anchor, figure) and are added to the flow cell. The interrogative position in the DNA probe contains an "A" nucleotide with a red fluorophore
Fluorophore
A fluorophore, in analogy to a chromophore, is a component of a molecule which causes a molecule to be fluorescent. It is a functional group in a molecule which will absorb energy of a specific wavelength and re-emit energy at a different wavelength...

 attached, a "C" with a yellow fluorophore attached, a "G" with a green fluorophore attached or a "T" with a blue fluorophore attached. Only the probe that has a complementary nucleotide in the interrogative position will bind. The T4 DNA ligase attaches the probe to the anchor, the non-binding probes are washed away, and the fluorescence
Fluorescence
Fluorescence is the emission of light by a substance that has absorbed light or other electromagnetic radiation of a different wavelength. It is a form of luminescence. In most cases, emitted light has a longer wavelength, and therefore lower energy, than the absorbed radiation...

 is detected. The probe/anchor is removed from the DNA nanoball and another anchor is added. A new pool of probes is added with a different interrogative position. The correct probe hybridizes, is ligated, rinsed and the fluorescence is read and recorded. This process is repeated with all ten interrogation positions next to an anchor sequence. Once all ten positions are recorded, an anchor is added that binds to a different adapter and the process is repeated to identify the ten nucleotides next to that adapter.

Imaging

After each DNA probe/ligation step, the flow cell is imaged to determine which nucleotide base bound to the DNA nanoball. The fluorophore is excited with an arc lamp
Arc lamp
"Arc lamp" or "arc light" is the general term for a class of lamps that produce light by an electric arc . The lamp consists of two electrodes, first made from carbon but typically made today of tungsten, which are separated by a gas...

 that radiates specific wavelengths of light towards the flow cell. The wavelength of the fluorescence of each DNA nanoball is captured on a high resolution CCD camera. The image is then processed to remove background noise and assess the intensity of each point. The color of each DNA nanoball corresponds to a base at the interrogative position and a computer records the base position information. The order of 70 nucleotides per DNA nanoball is determined using 80 images (8 rounds of 10 interrogative positions per round).

Genome assembly

In generating the circular template, a large segment of the original 400 – 500 base pair fragment was replaced with the adapter Ad4. The 70 bp that are sequenced are therefore the first 35 bp of the original 400 – 500 bp fragment, and the last 35 bp of the 400 – 500 bp fragment. Therefore, the sequence is identified for two 35 bp reads of DNA separated by about 330 – 430 bp. These 35 bp reads are compared, using bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

, to a reference genome
Reference genome
A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of a species' genetic code. As they are often assembled from the sequencing of DNA from a number of donors, reference genomes do not accurately represent the genetic code of any...

 and assigned to a genetic locus (Figure 7). Massive parallel genome mapping is accomplished through high coverage of the reference nucleotide positions, and the complete genome of the DNA sample is assembled. Single base-pair mismatches between the sequenced reads and the reference sequence are used to identify possible single nucleotide polymorphism
Single nucleotide polymorphism
A single-nucleotide polymorphism is a DNA sequence variation occurring when a single nucleotide — A, T, C or G — in the genome differs between members of a biological species or paired chromosomes in an individual...

 (SNP). In addition, mapping of one 35-bp portion of the mate pair may identify DNA inserts and deletions (indel
Indel
Indel is a molecular biology term that has different definitions in different fields:*In evolutionary studies, indel is used to mean an insertion or a deletion and indels simply refers to the mutation class that includes both insertions, deletions, and the combination thereof, including insertion...

s) via bioinformatics algorithms that detect possible discrepancies between the mate pairs.

Advantages

DNA nanoball sequencing technology offers several advantages over other sequencing platforms. One major advantage is the use of very high-density arrays. The array design permits one DNA nanoball to attach to each pit that is part of an ordered array, and therefore a higher concentration of DNA can be added. This allows a high percentage of the pits to be occupied by a DNA nanoball thus maximizing the number of reads per flow cell (Figure Top). compared to other sequencing arrays where molecules of DNA are added to a flow cell in a random orientation (Figure bottom)

Another important advantage is that the sequencing reactions are non-progressive; after each reading of the probe, the probe and anchor are removed and a new anchor and probe set are added. Therefore, if a probe did not bind in the previous reaction, this has no effect on the next probe ligation, thus eliminating a major source of reading error that may occur in other next generation sequencing platforms. It also reduces the use of expensive probes, since DNA nanoball sequencing does not necessitate the probe ligation reaction to be run to completion.

Other advantages of DNA nanoball sequencing include the use of high-fidelity Phi 29 DNA polymerase to ensure accurate amplification of the circular template, several hundred copies of the circular template compacted into a small area resulting in an intense signal, and attachment of the fluorophore to the probe at a long distance from the ligation point results in improved ligation.

Disadvantages

The main disadvantage of DNA nanoball sequencing is short read length of the DNA sequences obtained with this method. Short reads, especially for DNA high in DNA repeats, may map to two or more regions of the reference genome. A second disadvantage of this method is that multiple rounds of PCR have to be used. This can introduce PCR bias and possibly amplify contaminants in the template construction phase.

Applications

DNA nanoball sequencing has been used in recent studies. Lee et al. used this technology to find mutations that were present in a lung cancer and compared them to normal lung tissue. They were able to identify over 50,000 single nucleotide variants Roach et al. used DNA nanoball sequencing to sequence the genomes of a family of four relatives and were able to identify SNPs that may be responsible for a Mendelian disorder, and were able to estimate the inter-generation mutation rate. The Institute for Systems Biology
Institute for Systems Biology
The Institute for Systems Biology is a non-profit research institution, located in Seattle, Washington, United States. Leroy Hood co-founded the Institute with Alan Aderem and Ruedi Aebersold in 2000....

 has now uses this technology to sequence 615 complete human genome samples as part of a survery studying neurodegenerative diseases, and the National Cancer Institute
National Cancer Institute
The National Cancer Institute is part of the National Institutes of Health , which is one of 11 agencies that are part of the U.S. Department of Health and Human Services. The NCI coordinates the U.S...

 is using DNA nanoball sequencing to sequence 50 tumours and matched normal tissues from pediatric cancers.

Significance

Massively parallel
Massively parallel
Massively parallel is a description which appears in computer science, life sciences, medical diagnostics, and other fields.A massively parallel computer is a distributed memory computer system which consists of many individual nodes, each of which is essentially an independent computer in itself,...

 next generation sequencing platforms like DNA nanoball sequencing may contribute to the treatment and diagnosis of many genetic diseases. The cost of sequencing an entire human genome has fallen from about one million dollars in 2008, to $4400 dollars in 2010 with the DNA nanoball technology. Sequencing of the entire genome of patients with heritable diseases
Genetic disorder
A genetic disorder is an illness caused by abnormalities in genes or chromosomes, especially a condition that is present from before birth. Most genetic disorders are quite rare and affect one person in every several thousands or millions....

 or cancer
Cancer
Cancer , known medically as a malignant neoplasm, is a large group of different diseases, all involving unregulated cell growth. In cancer, cells divide and grow uncontrollably, forming malignant tumors, and invade nearby parts of the body. The cancer may also spread to more distant parts of the...

, mutations associated with these diseases have been identified, opening up strategies, such as targeted therapeutics for at-risk people and for genetic counseling
Genetic counseling
Genetic counseling or traveling is the process by which patients or relatives, at risk of an inherited disorder, are advised of the consequences and nature of the disorder, the probability of developing or transmitting it, and the options open to them in management and family planning...

. As the price of sequencing an entire human genome approaches the $1000 mark, genomic sequencing the entire population may become feasible as part of normal preventative medicine.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK