Chargaff's rules
Encyclopedia
Chargaff's rules state that DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 from any cell of all organisms should have a 1:1 ratio (base Pair Rule)of pyrimidine
Pyrimidine
Pyrimidine is a heterocyclic aromatic organic compound similar to benzene and pyridine, containing two nitrogen atoms at positions 1 and 3 of the six-member ring...

 and purine
Purine
A purine is a heterocyclic aromatic organic compound, consisting of a pyrimidine ring fused to an imidazole ring. Purines, including substituted purines and their tautomers, are the most widely distributed kind of nitrogen-containing heterocycle in nature....

 bases and, more specifically, that the amount of guanine
Guanine
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...

 is equal to cytosine
Cytosine
Cytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine . It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...

 and the amount of adenine
Adenine
Adenine is a nucleobase with a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate and the cofactors nicotinamide adenine dinucleotide and flavin adenine dinucleotide , and protein synthesis, as a chemical component of DNA...

 is equal to thymine
Thymine
Thymine is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be derived by methylation of uracil at...

. This pattern is found in both strands of the DNA. They were discovered by Austrian chemist Erwin Chargaff
Erwin Chargaff
Erwin Chargaff was an American biochemist who emigrated to the United States during the Nazi era. Through careful experimentation, Chargaff discovered two rules that helped lead to the discovery of the double helix structure of DNA...

.

Chargaff Parity Rule 1

The first rule holds that a double-stranded DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 molecule globally has percentage base pair equality: %A = %T and %G = %C. The rigorous validation of the rule constitutes the basis of Watson-Crick pairs in the DNA double helix.

Chargaff Parity Rule 2

The second rule holds that both %A ~ %T and %G ~ %C are valid for each of the two DNA strands. This describes only a global feature of the base composition in a single DNA strand.

Research

The second of Chargaff's rules (or "Chargaff's second parity rule") is that the composition of DNA varies from one species to another; in particular in the relative amounts of A, G, T, and C bases. Such evidence of molecular diversity, which had been presumed absent from DNA, made DNA a more credible candidate for the genetic material than protein.

In 2006, it was shown that this rule applies to four of the five types of double stranded genomes; specifically it applies to the eukaryotic
Eukaryote
A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Eukaryotes may more formally be referred to as the taxon Eukarya or Eukaryota. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear...

 chromosomes, the bacteria
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...

l chromosomes, the double stranded DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 viral genomes, and the archeal chromosomes. It does not apply to the organellar genomes (mitochondria and plastid
Plastid
Plastids are major organelles found in the cells of plants and algae. Plastids are the site of manufacture and storage of important chemical compounds used by the cell...

s) (actually it applies to many plastid and organellar genomes that are longer than ~20-30 kbp) nor does it apply to the single stranded DNA (viral) genomes or any type of RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

 genome (probably, because all such genomes are very small). The basis for this rule is still under investigation.

The rule itself has consequences. In most bacterial genomes (which are generally 80-90% coding) genes are arranged in such a fashion that approximately 50% of the coding sequence lies on either strand. Wacław Szybalski, in the 1960s, showed that in bacteriophage
Bacteriophage
A bacteriophage is any one of a number of viruses that infect bacteria. They do this by injecting genetic material, which they carry enclosed in an outer protein capsid...

 coding sequences purines (A and G) exceed pyrimidines (C and T). This rule has since been confirmed in other organisms and should probably be now termed "Szybalski's rule". While Szybalski's rule generally holds, exceptions are known to exist. The biological basis for Szybalski's rule, like Chargaff's, is not yet known.

The combined effect of Chargaff's second rule and Szybalski's rule can be seen in bacterial genomes where the coding sequences are not equally distributed. The genetic code
Genetic code
The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....

 has 64 codons of which 3 function as termination codons: there are only 20 amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

s normally present in proteins. (There are two uncommon amino acids—selenocysteine
Selenocysteine
Selenocysteine is an amino acid that is present in several enzymes .-Nomenclature:...

 and pyrrolysine
Pyrrolysine
Pyrrolysine is a naturally occurring, genetically coded amino acid used by some methanogenic archaea and one known bacterium in enzymes that are part of their methane-producing metabolism. It is similar to lysine, but with an added pyrroline ring linked to the end of the lysine side chain...

—found in a limited number of proteins and encoded by the stop codon
Stop codon
In the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are based on polypeptides, which are unique sequences of amino acids. Most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide...

s - TGA and TAG respectively.) The mismatch between the number of codons and amino acids allows several codons to code for a single amino acid. These codons normally differ in the third codon base position.

Multivariate statistical analysis of codon use within genomes with unequal quantities of coding sequences on the two strands has shown that codon use in the third position depends on the strand on which the gene is located. This seems likely to be the result of Szybalski's and Chargaff's rules. Because of the asymmetry in pyrimidine and purine use in coding sequences, the strand with the greater coding content will tend to have the greater number of purine bases (Szybalski's rule). Because the number of purine bases will to a very good approximation equal the number of their complementary pyrimidines within the same strand and because the coding sequences occupy 80-90% of the strand, there appears to be (1) a selective pressure on the third base to minimize the number of purine bases in the strand with the greater coding content; and (2) that this pressure is proportional to the mismatch in the length of the coding sequences between the two strands.

The origin of the deviation from Chargaff's rule in the organelles has been suggested to be a consequence of the mechanism of replication. During replication the DNA strands separate. In single stranded DNA, cytosine
Cytosine
Cytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine . It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...

 spontaneously slowly deaminates to adenosine (a C to A transversion
Transversion
In molecular biology, transversion refers to the substitution of a purine for a pyrimidine or vice versa. It can only be reverted by a spontaneous reversion. Because this type of mutation changes the chemical structure dramatically, the consequences of this change tend to be more drastic than those...

). The longer the strands are separated the greater the quantity of deamination. For reasons that are not yet clear the strands tend to exist longer in single form in mitochondria than in chromsomal DNA. This process tends to yield one strand that is enriched in guanine
Guanine
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...

 (G) and thymine
Thymine
Thymine is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be derived by methylation of uracil at...

 (T) with its complement enriched in cytosine (C) and adenosine (A), and this process may have given rise to the deviations found in the mitochondria.

Chargaff's second rule appears to be the consequence of a more complex parity rule: within a single strand of DNA any oligonucleotide is present in equal numbers to its reverse complementary nucleotide. Because of the computational requirements this has not been verified in all genomes for all oligonucleotides. It has been verified for triplet oligonucleotides for a large data set. Albrecht-Buehler has suggested that this rule is the consequence of genomes evolving by a process of inversion
Chromosomal inversion
An inversion is a chromosome rearrangement in which a segment of a chromosome is reversed end to end. An inversion occurs when a single chromosome undergoes breakage and rearrangement within itself. Inversions are of two types: paracentric and pericentric.Paracentric inversions do not include the...

 and transposition
Transposon
Transposable elements are sequences of DNA that can move or transpose themselves to new positions within the genome of a single cell. The mechanism of transposition can be either "copy and paste" or "cut and paste". Transposition can create phenotypically significant mutations and alter the cell's...

. This process does not appear to have acted on the mitochondrial genomes. Chargaff's second parity rule appears to be extended from the nucleotide-level to populations of codon triplets, in the case of whole single-stranded Human genome DNA.
A kind of "codon-level second Chargaff's parity rule" is proposed as follows:
Codon populations where 1st base position is T are identical to codon populations where 3rd base position is A:
« % codons Twx ~ % codons yzA » (where Twx and yzA are mirror codons i.e TCG and CGA).
Codon populations where 1st base position is C are identical to codon populations where 3rd base position is G:
« % codons Cwx ~ % codons yzG » (where Cwx and yzG are mirror codons i.e CTA and TAG).
Codon populations where 2nd base position is T are identical to codon populations where 2nd base position is A:
« % codons wTx ~ % codons yAz » (where wTx and yAz are mirror codons i.e CTG and CAG).
Codon populations where 2nd base position is C are identical to codon populations where 2nd base position is G:
« % codons wCx ~ % codons yGz » (where wCx and yGz are mirror codons i.e TCT and AGA).
Codon populations where 3rd base position is T are identical to codon populations where 1st base position is A:
« % codons wxT ~ % codons Ayz » (where wxT and Ayz are mirror codons i.e CTT and AAG).
Codon populations where 3rd base position is C are identical to codon populations where 1st base position is G:
« % codons wxC ~ % codons Gyz » (where wxC and Gyz are mirror codons i.e GGC and GCC).

Examples - computing whole human genome using the first codons reading frame provides:
36530115 TTT and 36381293 AAA (ratio % = 1.00409). 2087242 TCG and 2085226 CGA (ratio % = 1.00096), etc...

Relative proportions (%) of bases in DNA

The following table is a representative sample of Erwin Chargaff's 1952 data, listing the base composition of DNA from various organisms and support both of Chargaff's rules.
Organism %A %G %C %T A/T G/C %GC %AT
φX174  24.0 23.3 21.5 31.2 0.77 1.08 44.8 55.2
Maize
Maize
Maize known in many English-speaking countries as corn or mielie/mealie, is a grain domesticated by indigenous peoples in Mesoamerica in prehistoric times. The leafy stalk produces ears which contain seeds called kernels. Though technically a grain, maize kernels are used in cooking as a vegetable...

 
26.8 22.8 23.2 27.2 0.99 0.98 46.1 54.0
Octopus
Octopus
The octopus is a cephalopod mollusc of the order Octopoda. Octopuses have two eyes and four pairs of arms, and like other cephalopods they are bilaterally symmetric. An octopus has a hard beak, with its mouth at the center point of the arms...

 
33.2 17.6 17.6 31.6 1.05 1.00 35.2 64.8
Chicken
Chicken
The chicken is a domesticated fowl, a subspecies of the Red Junglefowl. As one of the most common and widespread domestic animals, and with a population of more than 24 billion in 2003, there are more chickens in the world than any other species of bird...

 
28.0 22.0 21.6 28.4 0.99 1.02 43.7 56.4
Rat
Rat
Rats are various medium-sized, long-tailed rodents of the superfamily Muroidea. "True rats" are members of the genus Rattus, the most important of which to humans are the black rat, Rattus rattus, and the brown rat, Rattus norvegicus...

 
28.6 21.4 20.5 28.4 1.01 1.00 42.9 57.0
Human
Human
Humans are the only living species in the Homo genus...

 
29.3 20.7 20.0 30.0 0.98 1.04 40.7 59.3
Grasshopper
Grasshopper
The grasshopper is an insect of the suborder Caelifera in the order Orthoptera. To distinguish it from bush crickets or katydids, it is sometimes referred to as the short-horned grasshopper...

 
29.3 20.5 20.7 29.3 1.00 0.99 41.2 58.6
Sea Urchin
Sea urchin
Sea urchins or urchins are small, spiny, globular animals which, with their close kin, such as sand dollars, constitute the class Echinoidea of the echinoderm phylum. They inhabit all oceans. Their shell, or "test", is round and spiny, typically from across. Common colors include black and dull...

 
32.8 17.7 17.3 32.1 1.02 1.02 35.0 64.9
Wheat
Wheat
Wheat is a cereal grain, originally from the Levant region of the Near East, but now cultivated worldwide. In 2007 world production of wheat was 607 million tons, making it the third most-produced cereal after maize and rice...

 
27.3 22.7 22.8 27.1 1.01 1.00 45.5 54.4
Yeast
Yeast
Yeasts are eukaryotic micro-organisms classified in the kingdom Fungi, with 1,500 species currently described estimated to be only 1% of all fungal species. Most reproduce asexually by mitosis, and many do so by an asymmetric division process called budding...

 
31.3 18.7 17.1 32.9 0.95 1.09 35.8 64.4
E. coli
Escherichia coli
Escherichia coli is a Gram-negative, rod-shaped bacterium that is commonly found in the lower intestine of warm-blooded organisms . Most E. coli strains are harmless, but some serotypes can cause serious food poisoning in humans, and are occasionally responsible for product recalls...

24.7 26.0 25.7 23.6 1.05 1.01 51.7 48.3

See also

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK