Codon usage bias
Encyclopedia
Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotide
s (triplets) that encodes
a specific amino acid
residue in a polypeptide chain or for the termination of translation (stop codon
s).
There are 64 different codons (61 codons encoding for amino acids plus 3 stop codons) but only 20 different translated amino acids. The overabundance in the number of codons allows many amino acids to be encoded by more than one codon. Because of such redundancy it is said that the genetic code
is degenerate. Different organisms often show particular preferences for one of the several codons that encode the same amino acid- that is, a greater frequency of one will be found than expected by chance. How such preferences arise is a much debated area of molecular evolution
.
It is generally acknowledged that codon preferences reflect a balance between mutational biases and natural selection for translational optimization. Optimal codons in fast-growing microorganisms, like Escherichia coli
or Saccharomyces cerevisiae
(baker's yeast), reflect the composition of their respective genomic tRNA pool. It is thought that optimal codons help to achieve faster translation rates and high accuracy. As a result of these factors, translational selection is expected to be stronger in highly expressed genes, as is indeed the case for the above-mentioned organisms. In other organisms that do not show high growing rates or that present small genomes, codon usage optimization is normally absent, and codon preferences are determined by the characteristic mutational biases seen in that particular genome. Examples of this are Homo sapiens (human) and Helicobacter pylori
. Organisms that show an intermediate level of codon usage optimization include Drosophila melanogaster
(fruit fly), Caenorhabditis elegans
(nematode worm
) or Arabidopsis thaliana
(thale cress).
The nature of the codon usage-tRNA optimization has been fiercely debated. It is not clear whether codon usage drives tRNA evolution or vice versa. At least one mathematical model has been developed where both codon-usage and tRNA-expression co-evolve in feedback fashion (i.e., codons already present in high frequencies drive up the expression of their corresponding tRNAs, and tRNAs normally expressed at high levels drive up the frequency of their corresponding codons), however this model does not seem to yet have experimental confirmation. Another problem is that the evolution of tRNA genes has been a very inactive area of research.
' (CAI) are used to predict gene expression levels, while methods such as the 'effective number of codons
' (Nc) and Shannon entropy from information theory are used to measure codon usage evenness. Multivariate statistical methods, such as correspondence analysis and principal component analysis, are widely used to analyze variations in codon usage among genes. There are many computer programs to implement the statistical analyses enumerated above, including CodonW, GCUA, INCA, etc. Codon optimization has applications in designing synthetic genes and DNA vaccines. Several software packages are available online for this purpose (refer to external links). Optimizing the occurrence of desired/undesired motifs and sequence composition in all possible reveres translated gene sequences increases the search space exponentially w.r.t. gene length. For those reasons, the problem could be addressed using optimization algorithms like genetic algorithms (Sandhu et al., In Silico Biol. 2008;8(2):187-92).
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...
s (triplets) that encodes
Genetic code
The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....
a specific amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...
residue in a polypeptide chain or for the termination of translation (stop codon
Stop codon
In the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are based on polypeptides, which are unique sequences of amino acids. Most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide...
s).
There are 64 different codons (61 codons encoding for amino acids plus 3 stop codons) but only 20 different translated amino acids. The overabundance in the number of codons allows many amino acids to be encoded by more than one codon. Because of such redundancy it is said that the genetic code
Genetic code
The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....
is degenerate. Different organisms often show particular preferences for one of the several codons that encode the same amino acid- that is, a greater frequency of one will be found than expected by chance. How such preferences arise is a much debated area of molecular evolution
Molecular evolution
Molecular evolution is in part a process of evolution at the scale of DNA, RNA, and proteins. Molecular evolution emerged as a scientific field in the 1960s as researchers from molecular biology, evolutionary biology and population genetics sought to understand recent discoveries on the structure...
.
It is generally acknowledged that codon preferences reflect a balance between mutational biases and natural selection for translational optimization. Optimal codons in fast-growing microorganisms, like Escherichia coli
Escherichia coli
Escherichia coli is a Gram-negative, rod-shaped bacterium that is commonly found in the lower intestine of warm-blooded organisms . Most E. coli strains are harmless, but some serotypes can cause serious food poisoning in humans, and are occasionally responsible for product recalls...
or Saccharomyces cerevisiae
Saccharomyces cerevisiae
Saccharomyces cerevisiae is a species of yeast. It is perhaps the most useful yeast, having been instrumental to baking and brewing since ancient times. It is believed that it was originally isolated from the skin of grapes...
(baker's yeast), reflect the composition of their respective genomic tRNA pool. It is thought that optimal codons help to achieve faster translation rates and high accuracy. As a result of these factors, translational selection is expected to be stronger in highly expressed genes, as is indeed the case for the above-mentioned organisms. In other organisms that do not show high growing rates or that present small genomes, codon usage optimization is normally absent, and codon preferences are determined by the characteristic mutational biases seen in that particular genome. Examples of this are Homo sapiens (human) and Helicobacter pylori
Helicobacter pylori
Helicobacter pylori , previously named Campylobacter pyloridis, is a Gram-negative, microaerophilic bacterium found in the stomach. It was identified in 1982 by Barry Marshall and Robin Warren, who found that it was present in patients with chronic gastritis and gastric ulcers, conditions that were...
. Organisms that show an intermediate level of codon usage optimization include Drosophila melanogaster
Drosophila melanogaster
Drosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is known generally as the common fruit fly or vinegar fly. Starting from Charles W...
(fruit fly), Caenorhabditis elegans
Caenorhabditis elegans
Caenorhabditis elegans is a free-living, transparent nematode , about 1 mm in length, which lives in temperate soil environments. Research into the molecular and developmental biology of C. elegans was begun in 1974 by Sydney Brenner and it has since been used extensively as a model...
(nematode worm
Worm
The term worm refers to an obsolete taxon used by Carolus Linnaeus and Jean-Baptiste Lamarck for all non-arthropod invertebrate animals, and stems from the Old English word wyrm. Currently it is used to describe many different distantly-related animals that typically have a long cylindrical...
) or Arabidopsis thaliana
Arabidopsis thaliana
Arabidopsis thaliana is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, arabidopsis is popular as a model organism in plant biology and genetics...
(thale cress).
The nature of the codon usage-tRNA optimization has been fiercely debated. It is not clear whether codon usage drives tRNA evolution or vice versa. At least one mathematical model has been developed where both codon-usage and tRNA-expression co-evolve in feedback fashion (i.e., codons already present in high frequencies drive up the expression of their corresponding tRNAs, and tRNAs normally expressed at high levels drive up the frequency of their corresponding codons), however this model does not seem to yet have experimental confirmation. Another problem is that the evolution of tRNA genes has been a very inactive area of research.
Factors contributing to codon usage bias
Different factors have been proposed to be related to codon usage bias, including gene expression level (reflecting selection for optimizing translation process by tRNA abundance), %G+C composition (reflecting horizontal gene transfer or mutational bias), GC skew (reflecting strand-specific mutational bias), amino acid conservation, protein hydropathy, transcriptional selection, RNA stability, optimal growth temperature and hypersaline adaptation.Methods of analyzing codon usage bias
In the field of bioinformatics and computational biology, many statistical methods have been proposed and used to analyze codon usage bias. Methods such as the 'frequency of optimal codons' (Fop) , the Relative Codon Adaptation (RCA) or the 'Codon Adaptation IndexCodon Adaptation Index
The Codon Adaptation Index is the most widespread technique for analyzing Codon usage bias. As opposed to other measures of codon usage bias, such as the 'effective number of codons' , which measure deviation from a uniform bias , CAI measures the deviation of a given protein coding gene...
' (CAI) are used to predict gene expression levels, while methods such as the 'effective number of codons
Effective number of codons
Effective number of codons is a measure to study the state of codon usage biases in genes and genomes. The way that ENC is computed has obvious similarities to the computation of effective population size in population genetics...
' (Nc) and Shannon entropy from information theory are used to measure codon usage evenness. Multivariate statistical methods, such as correspondence analysis and principal component analysis, are widely used to analyze variations in codon usage among genes. There are many computer programs to implement the statistical analyses enumerated above, including CodonW, GCUA, INCA, etc. Codon optimization has applications in designing synthetic genes and DNA vaccines. Several software packages are available online for this purpose (refer to external links). Optimizing the occurrence of desired/undesired motifs and sequence composition in all possible reveres translated gene sequences increases the search space exponentially w.r.t. gene length. For those reasons, the problem could be addressed using optimization algorithms like genetic algorithms (Sandhu et al., In Silico Biol. 2008;8(2):187-92).
External links
- CAT - Composition Analysis Toolkit: estimating codon usage bias and its statistical significance
- Codon Usage Database
- CodonW
- GCUA - General Codon Usage Analysis
- Graphical Codon Usage Analyser
- JCat - Java Codon Usage Adaptation Tool
- INCA - Interactive Codon Analysis software
- ACUA - Automated Codon Usage Analysis Tool
- OPTIMIZER - Codon usage optimization
- HEG-DB - Highly Expressed Genes Database
- E-CAI - Expected value of Codon Adaptation Index
- CAIcal -Set of tools to assess codon usage adaptation
- scRCA - Automatic determination of translational codon usage bias
- Online Synonymous Codon Usage Analyses with the ade4 and seqinR packages
- Genetic Algorithm Simulation for Codon Optimization