GC-content
Encyclopedia
In molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...

 and genetics
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....

, GC-content (or guanine-cytosine content) is the percentage of nitrogenous bases on a DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 molecule that are either guanine
Guanine
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...

 or cytosine
Cytosine
Cytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine . It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...

 (from a possibility of four different ones, also including adenine
Adenine
Adenine is a nucleobase with a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate and the cofactors nicotinamide adenine dinucleotide and flavin adenine dinucleotide , and protein synthesis, as a chemical component of DNA...

 and thymine
Thymine
Thymine is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be derived by methylation of uracil at...

). This may refer to a specific fragment of DNA or RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

, or that of the whole genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

. When it refers to a fragment of the genetic material, it may denote the GC-content of part of a gene (domain), single gene, group of genes (or gene clusters), or even a non-coding region. G (guanine) and C (cytosine) undergo a specific hydrogen bonding, whereas A (adenine) bonds specifically with T (thymine).

The GC pair is bound by three hydrogen bond
Hydrogen bond
A hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom, such as nitrogen, oxygen or fluorine, that comes from another molecule or chemical group. The hydrogen must be covalently bonded to another electronegative atom to create the bond...

s, while AT pairs are bound by two hydrogen bonds. DNA with high GC-content is more stable than DNA with low GC-content; however, the hydrogen bonds do not stabilize the DNA significantly, and stabilization is due mainly to stacking interactions.
In spite of the higher thermostability
Thermostability
Thermostability is the quality of a substance to resist irreversible change in its chemical or physical structure at a high relative temperature.Thermostable materials may be used industrially as fire retardants...

 conferred to the genetic material, it is envisaged that cells with DNA of high GC-content undergo autolysis, thereby reducing the longevity of the cell per se. Due to the robustness endowed to the genetic materials in high GC organisms, it was commonly believed that the GC content played a vital part in adaptation temperatures, a hypothesis that has recently been refuted.
However, the same study showed a strong correlation between higher temperatures and the GC content of structured RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

s (such as ribosomal RNA
Ribosomal RNA
Ribosomal ribonucleic acid is the RNA component of the ribosome, the enzyme that is the site of protein synthesis in all living cells. Ribosomal RNA provides a mechanism for decoding mRNA into amino acids and interacts with tRNAs during translation by providing peptidyl transferase activity...

, transfer RNA
Transfer RNA
Transfer RNA is an adaptor molecule composed of RNA, typically 73 to 93 nucleotides in length, that is used in biology to bridge the three-letter genetic code in messenger RNA with the twenty-letter code of amino acids in proteins. The role of tRNA as an adaptor is best understood by...

, and many other non-coding RNA
Non-coding RNA
A non-coding RNA is a functional RNA molecule that is not translated into a protein. Less-frequently used synonyms are non-protein-coding RNA , non-messenger RNA and functional RNA . The term small RNA is often used for short bacterial ncRNAs...

s); GC base pair
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...

s are more stable than AU base pairs, which makes high-GC-content RNA structures more tolerant of high temperatures.
More recently, the first large-scale systematic gene-centric association analysis demonstrated the correlation between GC content and temperature for certain genomic regions while not for others.

In PCR experiments, the GC-content of primers
Primer (molecular biology)
A primer is a strand of nucleic acid that serves as a starting point for DNA synthesis. They are required for DNA replication because the enzymes that catalyze this process, DNA polymerases, can only add new nucleotides to an existing strand of DNA...

 are used to predict their annealing temperature
Annealing temperature
Annealing temperature may refer to:*Annealing *Annealing *Polymerase chain reaction...

 to the template DNA. A higher GC-content level indicates a higher melting temperature.

Determination of GC content

GC content is usually expressed as a percentage value, but sometimes as a ratio (called G+C ratio or GC-ratio). GC-content percentage is calculated as


whereas the AT/GC ratio is calculated as
.


The GC-content percentages as well as GC-ratio can be measured by several means, but one of the simplest methods is to measure what is called the melting temperature of the DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 double helix using spectrophotometry
Spectrophotometry
In chemistry, spectrophotometry is the quantitative measurement of the reflection or transmission properties of a material as a function of wavelength...

. The absorbance of DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 at a wavelength
Wavelength
In physics, the wavelength of a sinusoidal wave is the spatial period of the wave—the distance over which the wave's shape repeats.It is usually determined by considering the distance between consecutive corresponding points of the same phase, such as crests, troughs, or zero crossings, and is a...

 of 260 nm increases fairly sharply when the double-stranded DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 separates into two single strands when sufficiently heated. The most commonly used protocol for determining GC ratios uses flow cytometry
Flow cytometry
Flow cytometry is a technique for counting and examining microscopic particles, such as cells and chromosomes, by suspending them in a stream of fluid and passing them by an electronic detection apparatus. It allows simultaneous multiparametric analysis of the physical and/or chemical...

 for large number of samples.

In alternative manner, if the DNA or RNA molecule under investigation has been sequenced then the GC-content can be accurately calculated by simple arithmetic or by using the free online GC calculator.

GC ratio of genomes

GC ratios within a genome is found to be markedly variable. These variations in GC ratio within the genomes of more complex organisms result in a mosaic-like formation with islet regions called isochores
Isochore (genetics)
In genetics, an isochore is a large region of DNA with a high degree uniformity in G-C and C-G which tends to have more genes, higher local melting or denaturation temperatures, and different flexibility...

. This results in the variations in staining intensity in the chromosomes. GC-rich isochores include in them many protein coding genes, and thus determination of ratio of these specific regions contributes in mapping gene-rich regions of the genome.

GC ratios and coding sequence

Within a long region of genomic sequence, genes are often characterised by having a higher GC-content in contrast to the background GC-content for the entire genome. Evidence of GC ratio with that of length of the coding region
Coding region
The coding region of a gene, also known as the coding sequence or CDS, is that portion of a gene's DNA or RNA, composed of exons, that codes for protein. The region is bounded nearer the 5' end by a start codon and nearer the 3' end with a stop codon...

 of a gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

 has shown that the length of the coding sequence is directly proportional to higher G+C content. This has been pointed to the fact that the stop codon
Stop codon
In the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are based on polypeptides, which are unique sequences of amino acids. Most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide...

 has a bias towards A and T nucleotides, and, thus, the shorter the sequence the higher the AT bias.

Application in systematics

GC content is found to be variable with different organisms, the process of which is envisaged to be contributed to by variation in selection
Gene-centered view of evolution
The gene-centered view of evolution, gene selection theory or selfish gene theory holds that evolution occurs through the differential survival of competing genes, increasing the frequency of those alleles whose phenotypic effects successfully promote their own propagation, with gene defined as...

, mutational bias, and biased recombination-associated DNA repair
DNA repair
DNA repair refers to a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as UV light and radiation can cause DNA damage, resulting in as many as 1...

. The species problem
Species problem
The species problem or species concept is a mixture of difficult, related questions that often come up when biologists identify species and when they define the word "species"....

 in prokaryotic taxonomy has led to various suggestions in classifying bacteria, and the ad hoc committee on reconciliation of approaches to bacterial systematics has recommended use of GC ratios in higher level hierarchical classification. For example, the Actinobacteria
Actinobacteria
Actinobacteria are a group of Gram-positive bacteria with high guanine and cytosine content. They can be terrestrial or aquatic. Actinobacteria is one of the dominant phyla of the bacteria....

 are characterised as "high GC-content bacteria". In Streptomyces coelicolor
Streptomyces coelicolor
Streptomyces coelicolor is a soil-dwelling Gram-positive bacterium that belongs to the genus Streptomyces.-Usage in biotechnology:Strains of S. coelicolor produce various antibiotics, including actinorhodin, methylenomycin, undecylprodigiosin, and perimycin. Certain strains of S. coelicolor can...

A3(2), GC content is 72%. The GC-content of Yeast
Yeast
Yeasts are eukaryotic micro-organisms classified in the kingdom Fungi, with 1,500 species currently described estimated to be only 1% of all fungal species. Most reproduce asexually by mitosis, and many do so by an asymmetric division process called budding...

 (Saccharomyces cerevisiae
Saccharomyces cerevisiae
Saccharomyces cerevisiae is a species of yeast. It is perhaps the most useful yeast, having been instrumental to baking and brewing since ancient times. It is believed that it was originally isolated from the skin of grapes...

) is 38%, and that of another common model organism
Model organism
A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the organism model will provide insight into the workings of other organisms. Model organisms are in vivo models and are widely used to...

 Thale Cress (Arabidopsis thaliana
Arabidopsis thaliana
Arabidopsis thaliana is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, arabidopsis is popular as a model organism in plant biology and genetics...

) is 36%. Because of the nature of the genetic code
Genetic code
The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....

, it is virtually impossible for an organism to have a genome with a GC-content approaching either 0% or 100%. A species with an extremely low GC-content is Plasmodium falciparum
Plasmodium falciparum
Plasmodium falciparum is a protozoan parasite, one of the species of Plasmodium that cause malaria in humans. It is transmitted by the female Anopheles mosquito. Malaria caused by this species is the most dangerous form of malaria, with the highest rates of complications and mortality...

(GC% = ~20%), and it is usually common to refer to such examples as being AT-rich instead of GC-poor.

External links

  1. Table with GC-content of all sequenced prokaryotes
  2. Taxonomic browser of bacteria based on GC ratio on NCBI website.
  3. GC ratio in diverse species.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK