Contig
Encyclopedia
A contig is a set of overlapping DNA segments that together represent a consensus region of DNA.
In bottom-up sequencing projects, a contig refers to overlapping sequence data (reads); in top-down sequencing projects, contig refers to the overlapping clones that form a physical map of the genome that is used to guide sequencing and assembly. Contigs can thus refer both to overlapping DNA sequence and to overlapping physical segments (fragments) contained in clones depending on the context.

Sequence contigs

Sequence contigs are comprised of contiguous, overlapping sequence reads resulting from the reassembly of the small DNA fragments generated by bottom-up sequencing
Shotgun sequencing
In genetics, shotgun sequencing, also known as shotgun cloning, is a method used for sequencing long DNA strands. It is named by analogy with the rapidly-expanding, quasi-random firing pattern of a shotgun....

 strategies. This meaning of contig is consistent with the original definition by Rodger Staden (1979).
The bottom-up DNA sequencing
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....

 strategy involves shearing genomic DNA into many small fragments (“bottom”), sequencing these fragments, reassembling them back into contigs and eventually the entire genome (“up”). Because current technology allows for the direct sequencing of only relatively short DNA fragments (300-1000 nucleotides), genomic DNA must be fragmented into small pieces prior to sequencing. In bottom-up sequencing projects, amplified
Polymerase chain reaction
The polymerase chain reaction is a scientific technique in molecular biology to amplify a single or a few copies of a piece of DNA across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence....

 DNA is sheared randomly into fragments appropriately sized for sequencing. The subsequent sequence reads, which are the data that contains the sequence of each fragment, are assembled into contigs, which are finally connected by sequencing the gaps between them resulting in a sequenced genome.
The ability to assemble contigs depends on the overlap of reads. Because shearing is random and performed on multiple copies of DNA, each portion of the genome should be represented multiple times in different fragment frames. In other words, the sequences of the fragments (and thus the reads) should overlap. After sequencing, the overlapping reads are assembled into contigs by assembly software.
Today, it is common to use paired-end sequencing technology where both ends of consistently sized longer DNA fragments are sequenced. Here, a contig still refers to any contiguous stretch of sequence data created by read overlap. Because the fragments are of known length, the distance between the two end reads from each fragment is known. This gives additional information about the orientation of contigs constructed from these reads and allows for their assembly into scaffolds. Scaffolds consist of overlapping contigs separated by gaps of known length. The new constraints placed on the orientation of the contigs allows for the placement of highly repeated sequences in the genome. If one end read has a repetitive sequence, as long as its mate pair is located within in a contig, its placement is known. The remaining gaps between the contigs in the scaffolds can then be sequenced by a variety of methods, including PCR amplification followed by sequencing (for smaller gaps) and BAC
Bacterial artificial chromosome
A bacterial artificial chromosome is a DNA construct, based on a functional fertility plasmid , used for transforming and cloning in bacteria, usually E. coli. F-plasmids play a crucial role because they contain partition genes that promote the even distribution of plasmids after bacterial cell...

 cloning methods followed by sequencing for larger gaps.

BAC contigs

Contig can also refer to the overlapping clones
Bacterial artificial chromosome
A bacterial artificial chromosome is a DNA construct, based on a functional fertility plasmid , used for transforming and cloning in bacteria, usually E. coli. F-plasmids play a crucial role because they contain partition genes that promote the even distribution of plasmids after bacterial cell...

 that form a physical map
Physical map
Physical map may refer to:* Physical map , maps that shows countries of the world.* Physical map , showing how much DNA separates two genes and is measured in base pairs, as opposed to a genetic map...

 of a chromosome when the top-down or hierarchical sequencing strategy is used. In this sequencing method, a low-resolution map is made prior to sequencing in order to provide a framework to guide the later assembly of the sequence reads of the genome. This map identifies the relative positions and overlap of the clones used for sequencing. Sets of overlapping clones that form a contiguous stretch of DNA are called contigs; the minimum number of clones that form a contig that covers the entire chromosome comprise the tiling path that is used for sequencing. Once a tiling path has been selected, its component BACs are sheared into smaller fragments and sequenced. Contigs therefore provide the framework for hierarchical sequencing.
The assembly of a contig map involves several steps. First, DNA is sheared into larger (50-200kb) pieces, which are cloned into BACs or PACs
P1-derived artificial chromosome
The P1-derived artificial chromosome are DNA constructs that are derived from the DNA of P1 bacteriophage. They can carry large amounts of other sequences for a variety of bioengineering purposes...

 to form a BAC library
Library (biology)
In molecular biology, a library is a collection of DNA fragments that is stored and propagated in a population of micro-organisms through the process of molecular cloning...

. Since these clones should cover the entire genome/chromosome, it is theoretically possible to assemble a contig of BACs that covers the entire chromosome. Reality, however, is not always ideal. Gaps often remain, and a scaffold—consisting of contigs and gaps—that covers the map region is often the first result. The gaps between contigs can be closed by various methods outlined below.

Construction of BAC contigs

BAC contigs are constructed by aligning BAC regions of known overlap via a variety of methods. One common strategy is to use sequence-tagged site
Sequence-tagged site
A sequence-tagged site is a short DNA sequence that has a single occurrence in the genome and whose location and base sequence are known....

 (STS) content mapping to detect unique DNA sites in common between BACs. The degree of overlap is roughly estimated by the number of STS markers in common between two clones, with more markers in common signifying a greater overlap. Because this strategy provides only a very rough estimate of overlap, restriction digest fragment analysis, which provides a more precise measurement of clone overlap, is often used. In this strategy, clones are treated with one or two restriction enzymes and the resulting fragments separated by gel electrophoresis
Gel electrophoresis
Gel electrophoresis is a method used in clinical chemistry to separate proteins by charge and or size and in biochemistry and molecular biology to separate a mixed population of DNA and RNA fragments by length, to estimate the size of DNA and RNA fragments or to separate proteins by charge...

. If two clones overlap, they will likely have restriction sites in common, and will thus share several fragments. Because the number of fragments in common and the length of these fragments is known (the length is judged by comparison to a size standard), the degree of overlap can be deduced to a high degree of precision.

Gaps between contigs

Gaps often remain after initial BAC contig construction. These gaps occur if the BAC library screened has low complexity, meaning it does not contain a high number of STS or restriction sites, or if certain regions were less stable in cloning hosts and thus underrepresented in the library. If gaps between contigs remain after STS landmark mapping and restriction fingerprinting have been performed, the sequencing of contig ends can be used to close these gaps. This end-sequencing strategy essentially creates a novel STS with which to screen the other contigs. Alternatively, the end sequence of a contig can be used as a primer to primer walk
Primer walking
Primer walking is a sequencing method of choice for sequencing DNA fragments between 1.3 and 7 kilobases. Such fragments are too long to be sequenced in a single sequence read using the chain termination method. This method works by dividing the long sequence into several consecutive short ones...

across the gap.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK