Paired-end Tags
Encyclopedia
Paired-end tags, also known as PET, refer to the short sequences at the 5’ and 3’ ends of the DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 fragment of interest, which can be a piece of genomic DNA or cDNA. These short sequences are called tags or signatures because, in theory, they should contain enough sequence information to be uniquely mapped to the genome and thus represent the whole DNA fragment of interest. It was shown conceptually that 13 bp is sufficient to map tags uniquely. However, longer sequences are more practical for mapping reads uniquely. The endonucleases (discussed below) used in PET produce longer tags (18/20 bp and 25/27 bp) but sequences of 50-100 base pairs would be optimal for both mapping and cost efficiency. After extracting the PETs from many DNA fragments, they are linked (concatenated) together for efficient sequencing. On average, 20-30 tags could be sequenced with the Sanger
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....

 method, which has a longer read length. Since the tag sequences are short, individual PETs are well suited for next-generation sequencing that has short read lengths and higher throughput. The main advantages of PET sequencing are its reduced cost by sequencing only short fragments, detection of structural variants in the genome, and increased specificity when aligning back to the genome compared to single tags, which involves only one end of the DNA fragment.

Constructing the PET library

PET libraries are typically prepared in two general methods: cloning based and cloning-free based.

Cloning based

Fragmented genomic DNA or complementary DNA (cDNA) of interest is cloned into plasmid vectors. The cloning sites are flanked with adaptor sequences that contain restriction sites for endonucleases (discussed below). Inserts are ligated to the plasmid vectors and individual vectors are then transformed
Transformation (genetics)
In molecular biology transformation is the genetic alteration of a cell resulting from the direct uptake, incorporation and expression of exogenous genetic material from its surroundings and taken up through the cell membrane. Transformation occurs naturally in some species of bacteria, but it can...

 into E. coli making the PET library. PET sequences are obtained by purifying plasmid and digesting with specific endonuclease leaving two short sequences on the ends of the vectors. Under intramolecular (dilute) conditions, vectors are re-circularized and ligated, leaving only the ditags in the vector. The sequences unique to the clone are now paired together. Depending on the next-generation sequencing technique, PET sequences can be left singular, dimerized, or concatenated into long chains.

Cloning-free based

Instead of cloning, adaptors containing the endonuclease sequence are ligated to the ends of fragmented genomic DNA or cDNA. The molecules are then self-circularized and digested with endonuclease, releasing the PET. Before sequencing, these PETs are ligated to adaptors to which PCR primers anneal for amplification.
The advantage of cloning based construction of the library is that it maintains the fragments or cDNA intact for future use. However, the construction process is much longer than the cloning-free method. Variations on library construction have been produced by next-generation sequencing companies to suit their respective technologies.

Endonucleases

Unlike other endonucleases, the MmeI (type IIS) and EcoP15I (type III) restriction endonucleases cut downstream of their target binding sites. MmeI cuts 18/20 base pairs downstream and EcoP15I cuts 25/27 base pairs downstream. As these restriction enzymes bind at their target sequences located in the adaptors, they cut and release vectors that contain short sequences of the fragment or cDNA ligated to them, producing PETs.

PET Applications

  1. DNA-PET: Because PET represent connectivity between the tags, the use of PET in genome re-sequencing has advantages over the use of single reads
    DNA sequencing
    DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....

    . Anchoring one half of the pair uniquely to a single location in the genome allows mapping of the other half that is ambiguous. Ambiguous reads are those that map to more than a single location. This increased efficiency reduces the cost of sequencing as these ambiguous sequences, or reads, would normally be discarded. The connectivity of PET sequences also allows detection of structural variations: insertions
    Insertion (genetics)
    In genetics, an insertion is the addition of one or more nucleotide base pairs into a DNA sequence. This can often happen in microsatellite regions due to the DNA polymerase slipping...

    , deletions, duplications
    Gene duplication
    Gene duplication is any duplication of a region of DNA that contains a gene; it may occur as an error in homologous recombination, a retrotransposition event, or duplication of an entire chromosome.The second copy of the gene is often free from selective pressure — that is, mutations of it have no...

    , inversions
    Chromosomal inversion
    An inversion is a chromosome rearrangement in which a segment of a chromosome is reversed end to end. An inversion occurs when a single chromosome undergoes breakage and rearrangement within itself. Inversions are of two types: paracentric and pericentric.Paracentric inversions do not include the...

    , translocations
    Chromosomal translocation
    In genetics, a chromosome translocation is a chromosome abnormality caused by rearrangement of parts between nonhomologous chromosomes. A gene fusion may be created when the translocation joins two otherwise separated genes, the occurrence of which is common in cancer. It is detected on...

    . During the construction of the PET library, the fragments can be selected to all be of a certain size. After mapping, the PET sequences are thus expected to be consistently a particular distance away from each other. A discrepancy from this distance indicates a structural variation between the PET sequences. For example (Figure on the right): a deletion in the sequenced genome will have reads that map further away than expected in the reference genome as the reference genome will have a segment of DNA that is not present in the sequenced genome.
  2. ChIP-PET: The combined use of chromatin immunoprecipitation (ChIP
    Immunoprecipitation
    Immunoprecipitation is the technique of precipitating a protein antigen out of solution using an antibody that specifically binds to that particular protein. This process can be used to isolate and concentrate a particular protein from a sample containing many thousands of different proteins...

    ) and PET is used to detect regions of DNA bound by a protein of interest. ChIP-PET has the advantage over single read sequencing by reducing ambiguity of the reads generated. The advantage over chip hybridization (ChIP-Chip
    ChIP-on-chip
    ChIP-on-chip is a technique that combines chromatin immunoprecipitation with microarray technology . Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA in vivo...

    ) is that hybridization tiling arrays do not have the statistical sensitivity that sequence reads have. However, ChIP-PET, ChIP-Seq
    Chip-Sequencing
    ChIP-Sequencing, also known as ChIP-Seq, is used to analyze protein interactions with DNA. ChIP-Seq combines chromatin immunoprecipitation with massively parallel DNA sequencing to identify the cistrome of DNA-associated proteins. It can be used to precisely map global binding sites for any...

     and ChIP-chip have all been highly successful.
  3. ChIA-PET
    ChIA-PET
    Chromatin Interaction Analysis by Paired-End Tag Sequencing is a technique that incorporates chromatin immunoprecipitation -based enrichment, chromatin proximity ligation, Paired-End Tags, and ultra-high-throughput sequencing to determine de novo long-range chromatin interactions genome-wide...

    : The application of PET sequencing on chromatin interaction analysis. It is a genome-wide strategy for finding de novo long-range interactions between DNA elements bound by protein factors. The first ChIA-PET was developed by Fullwood et al.. (2009) to generate a map of the interactions between chromatin bound by oestrogen receptor α
    Estrogen receptor
    Estrogen receptor refers to a group of receptors that are activated by the hormone 17β-estradiol . Two types of estrogen receptor exist: ER, which is a member of the nuclear hormone family of intracellular receptors, and the estrogen G protein-coupled receptor GPR30 , which is a G protein-coupled...

     (ER-α) in oestrogen-treated human breast adenocarcinoma
    Adenocarcinoma
    Adenocarcinoma is a cancer of an epithelium that originates in glandular tissue. Epithelial tissue includes, but is not limited to, the surface layer of skin, glands and a variety of other tissue that lines the cavities and organs of the body. Epithelium can be derived embryologically from...

     cells. ChIA-PET is an unbiased way to analyze interactions and higher-order chromatin structures because it can detect interactions between unknown DNA elements. In contrast, 3C and 4C
    Chromosome conformation capture
    Chromosome conformation capture, or 3C, is a high-throughput molecular biology technique used to analyze the organization of chromosomes in a cell's natural state...

     methods are used to detect interactions involving a specific target region in the genome. ChIA-PET is similar to finding fusion genes through RNA-PET in that the paired tags map to different regions in the genome. However, ChIA-PET involves artificial ligations between different DNA fragments located at different genomic regions, rather than naturally occurring fusion between two genomic regions as in RNA-PET.
  4. RNA-PET: This application is used for studying the transcriptome
    Transcriptome
    The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non-coding RNA produced in one or a population of cells.-Scope:...

    : transcripts, gene structures, and gene expressions. The PET library is generated using full length cDNAs, so the ditags represent the 5’ capped and the 3’ polyA tail signatures of individual transcripts. Therefore, RNA-PET is especially useful for demarcating the boundaries of transcription units. This will help identify alternative transcription start sites and polyadenylation
    Polyadenylation
    Polyadenylation is the addition of a poly tail to an RNA molecule. The poly tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. In eukaryotes, polyadenylation is part of the process that produces mature messenger RNA for translation...

     sites of genes. RNA-PET could also be used to detect fusion genes and trans-splicing
    Trans-splicing
    Trans-splicing is a special form of RNA processing in eukaryotes where exons from two different primary RNA transcripts are joined end to end and ligated....

    , but further experiment is needed to distinguish between them. Other methods of finding the boundaries of transcripts include the single-tag strategies CAGE
    Cap analysis gene expression
    Cap analysis gene expression is a technique used in molecular biology to produce a snapshot of the 5' end of the messenger RNA population in a biological sample. The small fragments from the very beginnings of mRNAs are extracted, reverse-transcribed to DNA, PCR amplified and sequenced...

    , SAGE
    Serial Analysis of Gene Expression
    Serial analysis of gene expression is a technique used by molecular biologists to produce a snapshot of the messenger RNA population in a sample of interest in the form of small tags that correspond to fragments of those transcripts. The original technique was developed by Dr. Victor Velculescu...

    , and the most recent SuperSAGE
    SuperSAGE
    SuperSAGE is the most advanced derivate of the serial analysis of gene expression technology for the analysis of expressed genes in eukaryotic organisms . Like in SAGE, a specific tag from each transcribed gene is recovered...

    , with the CAGE and 5’ SAGE defining the transcription start sites and the 3’ SAGE defining the polyadenylation
    Polyadenylation
    Polyadenylation is the addition of a poly tail to an RNA molecule. The poly tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. In eukaryotes, polyadenylation is part of the process that produces mature messenger RNA for translation...

     sites. The advantages of PET sequencing over these methods are that PET identify both ends of the transcripts and, at the same time, provide more specificity when mapping back to the genome. Sequencing the cDNAs can reveal the structures of transcripts in great details, but this approach is much more expensive than RNA-PET sequencing, especially for characterizing the whole transcriptome
    Transcriptome
    The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non-coding RNA produced in one or a population of cells.-Scope:...

    . The major limitation of RNA-PET is the lack of information regarding the organization of the internal exons of transcripts. Therefore, RNA-PET is not suitable for detecting alternative splicing
    Alternative splicing
    Alternative splicing is a process by which the exons of the RNA produced by transcription of a gene are reconnected in multiple ways during RNA splicing...

    . In addition, if the cloning
    Cloning
    Cloning in biology is the process of producing similar populations of genetically identical individuals that occurs in nature when organisms such as bacteria, insects or plants reproduce asexually. Cloning in biotechnology refers to processes used to create copies of DNA fragments , cells , or...

    procedure is used construct the cDNA library before generating the PETs, cDNAs that are difficult to clone (for example, because of long transcripts) would have lower coverage. Similarly, transcripts (or transcript isoforms) with low expression level would likely be under-represented as well.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK