Genetic code
Overview
The genetic code is the set of rules by which information encoded in genetic material (DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 or mRNA sequences) is translated
Translation (genetics)
In molecular biology and genetics, translation is the third stage of protein biosynthesis . In translation, messenger RNA produced by transcription is decoded by the ribosome to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein...

 into protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

s (amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

 sequences) by living cell
Cell (biology)
The cell is the basic structural and functional unit of all known living organisms. It is the smallest unit of life that is classified as a living thing, and is often called the building block of life. The Alberts text discusses how the "cellular building blocks" move to shape developing embryos....

s.

The code defines how sequences of three nucleotides, called codons, specify which amino acid will be added next during protein synthesis. With some exceptions, a three-nucleotide codon in a nucleic acid sequence specifies a single amino acid.
Encyclopedia
The genetic code is the set of rules by which information encoded in genetic material (DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 or mRNA sequences) is translated
Translation (genetics)
In molecular biology and genetics, translation is the third stage of protein biosynthesis . In translation, messenger RNA produced by transcription is decoded by the ribosome to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein...

 into protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

s (amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

 sequences) by living cell
Cell (biology)
The cell is the basic structural and functional unit of all known living organisms. It is the smallest unit of life that is classified as a living thing, and is often called the building block of life. The Alberts text discusses how the "cellular building blocks" move to shape developing embryos....

s.

The code defines how sequences of three nucleotides, called codons, specify which amino acid will be added next during protein synthesis. With some exceptions, a three-nucleotide codon in a nucleic acid sequence specifies a single amino acid. Because the vast majority of gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s are encoded with exactly the same code (see the RNA codon table), this particular code is often referred to as the canonical or standard genetic code, or simply the genetic code, though in fact some variant codes have evolved. For example, protein synthesis in human mitochondria
Mitochondrion
In cell biology, a mitochondrion is a membrane-enclosed organelle found in most eukaryotic cells. These organelles range from 0.5 to 1.0 micrometers in diameter...

 relies on a genetic code that differs from the standard genetic code.

Not all genetic information is stored using the genetic code. All organisms' DNA contains regulatory sequences, intergenic segments, chromosomal structural areas, and other non-coding DNA that can contribute greatly to phenotype
Phenotype
A phenotype is an organism's observable characteristics or traits: such as its morphology, development, biochemical or physiological properties, behavior, and products of behavior...

. Those elements operate under sets of rules that are distinct from the codon-to-amino acid paradigm underlying the genetic code.

Discovery

After the structure of DNA was discovered by James Watson
James D. Watson
James Dewey Watson is an American molecular biologist, geneticist, and zoologist, best known as one of the co-discoverers of the structure of DNA in 1953 with Francis Crick...

 and Francis Crick
Francis Crick
Francis Harry Compton Crick OM FRS was an English molecular biologist, biophysicist, and neuroscientist, and most noted for being one of two co-discoverers of the structure of the DNA molecule in 1953, together with James D. Watson...

, who used the experimental evidence of Maurice Wilkins
Maurice Wilkins
Maurice Hugh Frederick Wilkins CBE FRS was a New Zealand-born English physicist and molecular biologist, and Nobel Laureate whose research contributed to the scientific understanding of phosphorescence, isotope separation, optical microscopy and X-ray diffraction, and to the development of radar...

 and Rosalind Franklin
Rosalind Franklin
Rosalind Elsie Franklin was a British biophysicist and X-ray crystallographer who made critical contributions to the understanding of the fine molecular structures of DNA, RNA, viruses, coal and graphite...

 (among others), serious efforts to understand the nature of the encoding of proteins began. George Gamow
George Gamow
George Gamow , born Georgiy Antonovich Gamov , was a Russian-born theoretical physicist and cosmologist. He discovered alpha decay via quantum tunneling and worked on radioactive decay of the atomic nucleus, star formation, stellar nucleosynthesis, Big Bang nucleosynthesis, cosmic microwave...

 postulated that a three-letter code must be employed to encode the 20 standard amino acids used by living cells to encode proteins. With four different nucleotides, a code of 2 nucleotides could only code for a maximum of 42 or 16 amino acids. A code of 3 nucleotides could code for a maximum of 43 or 64 amino acids.

The fact that codons consist of three DNA bases was first demonstrated in the Crick, Brenner et al. experiment
Crick, Brenner et al. experiment
The Crick, Brenner, Barnett, Watts-Tobin experiment of 1961 was a scientific experiment performed in 1961 by Francis Crick, Sydney Brenner, Leslie Barnett and R.J. Watts-Tobin. They demonstrated that three bases of DNA code for one amino acid in the genetic code...

. The first elucidation of a codon was done by Marshall Nirenberg and Heinrich J. Matthaei
Heinrich J. Matthaei
J. Heinrich Matthaei is a German biochemist. He is best known for his unique contribution to solving the genetic code on May 15, 1961...

 in 1961 at the National Institutes of Health
National Institutes of Health
The National Institutes of Health are an agency of the United States Department of Health and Human Services and are the primary agency of the United States government responsible for biomedical and health-related research. Its science and engineering counterpart is the National Science Foundation...

. They used a cell-free system
Cell-free system
A cell-free system is an in vitro tool widely used to study biological reactions that happen within cells while reducing the complex interactions found in a whole cell. Subcellular fractions can be isolated by ultracentrifugation to provide molecular machinery that can be used in reactions in the...

 to translate a poly-uracil RNA sequence (i.e., UUUUU...) and discovered that the polypeptide that they had synthesized consisted of only the amino acid phenylalanine
Phenylalanine
Phenylalanine is an α-amino acid with the formula C6H5CH2CHCOOH. This essential amino acid is classified as nonpolar because of the hydrophobic nature of the benzyl side chain. L-Phenylalanine is an electrically neutral amino acid, one of the twenty common amino acids used to biochemically form...

. They thereby deduced that the codon UUU specified the amino acid phenylalanine. This was followed by experiments in the laboratory of Severo Ochoa
Severo Ochoa
Severo Ochoa de Albornoz was a Spanish-American doctor and biochemist, and joint winner of the 1959 Nobel Prize in Physiology or Medicine with Arthur Kornberg.-Early life:...

 demonstrating that the poly-adenine RNA sequence (AAAAA...) coded for the polypeptide poly-lysine and that the poly-cytosine RNA sequence (CCCCC...) coded for the polypeptide poly-proline. Therefore the codon AAA specified the amino acid lysine, and the codon CCC specified the amino acid proline. Using different copolymers most of the remaining codons were then determined. Extending this work, Nirenberg and Philip Leder
Philip Leder
Philip Leder is an American geneticist. He was born in Washington, D.C. and studied at Harvard University, graduating in 1956. In 1960, he graduated from Harvard Medical School....

 revealed the triplet nature of the genetic code and allowed the codons of the standard genetic code to be deciphered. In these experiments, various combinations of mRNA were passed through a filter that contained ribosome
Ribosome
A ribosome is a component of cells that assembles the twenty specific amino acid molecules to form the particular protein molecule determined by the nucleotide sequence of an RNA molecule....

s, the components of cells that translate RNA into protein. Unique triplets promoted the binding of specific tRNAs to the ribosome. Leder and Nirenberg were able to determine the sequences of 54 out of 64 codons in their experiments.

Subsequent work by Har Gobind Khorana identified the rest of the genetic code. Shortly after, Robert W. Holley
Robert W. Holley
Robert William Holley was an American biochemist. He shared the Nobel Prize in Physiology or Medicine in 1968 for describing the structure of alanine transfer RNA, linking DNA and protein synthesis.Holley was born in Urbana, Illinois, and graduated from Urbana High School in 1938...

 determined the structure of transfer RNA
Transfer RNA
Transfer RNA is an adaptor molecule composed of RNA, typically 73 to 93 nucleotides in length, that is used in biology to bridge the three-letter genetic code in messenger RNA with the twenty-letter code of amino acids in proteins. The role of tRNA as an adaptor is best understood by...

 (tRNA), the adapter molecule that facilitates the process of translating RNA into protein. This work was based upon earlier studies by Severo Ochoa
Severo Ochoa
Severo Ochoa de Albornoz was a Spanish-American doctor and biochemist, and joint winner of the 1959 Nobel Prize in Physiology or Medicine with Arthur Kornberg.-Early life:...

, who received the Nobel prize
Nobel Prize
The Nobel Prizes are annual international awards bestowed by Scandinavian committees in recognition of cultural and scientific advances. The will of the Swedish chemist Alfred Nobel, the inventor of dynamite, established the prizes in 1895...

 in 1959 for his work on the enzymology of RNA synthesis. In 1968, Khorana, Holley and Nirenberg received the Nobel Prize in Physiology or Medicine
Nobel Prize in Physiology or Medicine
The Nobel Prize in Physiology or Medicine administered by the Nobel Foundation, is awarded once a year for outstanding discoveries in the field of life science and medicine. It is one of five Nobel Prizes established in 1895 by Swedish chemist Alfred Nobel, the inventor of dynamite, in his will...

 for their work.

Over forty years after elucidation of the entire codon table, some of the subtle features of the table were linked to a single evolutionary origin in an early aminoacylated RNA world in which nucleotides were also used to bind amino acids. These features include: (1) the absence of any codons for D-amino acids; (2) the presence of alternate codon patterns for some amino acids, such as 5'-CGN and 5'-AGR for L-Arg; (3) the confinement of synonymous positions to a codon's third nucleotide; and (4) the specification of only 20 amino acids as opposed to a number closer to 64. Thus, these seemingly idiosyncratic features of the codon table were discovered to be related to the universal homochirality
Homochirality
Homochirality is a term used to refer to a group of molecules that possess the same sense of chirality. Molecules involved are not necessarily the same compound, but similar groups are arranged in the same way around a central atom. In biology homochirality is found in the chemical building blocks...

 of amino acids (L-amino acids only) and nucleotides (D-ribose only).

Transfer of information via the genetic code

The genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

 of an organism
Organism
In biology, an organism is any contiguous living system . In at least some form, all organisms are capable of response to stimuli, reproduction, growth and development, and maintenance of homoeostasis as a stable whole.An organism may either be unicellular or, as in the case of humans, comprise...

 is inscribed in DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

, or, in the case of some viruses, RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

. The portion of the genome that codes for a protein or an RNA is called a gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

. Those genes that code for proteins are composed of tri-nucleotide units called codons, each coding for a single amino acid. Each nucleotide sub-unit consists of a phosphate
Phosphate
A phosphate, an inorganic chemical, is a salt of phosphoric acid. In organic chemistry, a phosphate, or organophosphate, is an ester of phosphoric acid. Organic phosphates are important in biochemistry and biogeochemistry or ecology. Inorganic phosphates are mined to obtain phosphorus for use in...

, a deoxyribose
Deoxyribose
Deoxyribose, more, precisely 2-deoxyribose, is a monosaccharide with idealized formula H---3-H. Its name indicates that it is a deoxy sugar, meaning that it is derived from the sugar ribose by loss of an oxygen atom...

 sugar, and one of the four nitrogenous nucleobase
Nucleobase
Nucleobases are a group of nitrogen-based molecules that are required to form nucleotides, the basic building blocks of DNA and RNA. Nucleobases provide the molecular structure necessary for the hydrogen bonding of complementary DNA and RNA strands, and are key components in the formation of stable...

s. The purine
Purine
A purine is a heterocyclic aromatic organic compound, consisting of a pyrimidine ring fused to an imidazole ring. Purines, including substituted purines and their tautomers, are the most widely distributed kind of nitrogen-containing heterocycle in nature....

 bases adenine
Adenine
Adenine is a nucleobase with a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate and the cofactors nicotinamide adenine dinucleotide and flavin adenine dinucleotide , and protein synthesis, as a chemical component of DNA...

 (A) and guanine
Guanine
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...

 (G) are larger and consist of two aromatic rings. The pyrimidine
Pyrimidine
Pyrimidine is a heterocyclic aromatic organic compound similar to benzene and pyridine, containing two nitrogen atoms at positions 1 and 3 of the six-member ring...

 bases cytosine
Cytosine
Cytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine . It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...

 (C) and thymine
Thymine
Thymine is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be derived by methylation of uracil at...

 (T) are smaller and consist of only one aromatic ring. In the double-helix configuration, two strands of DNA are joined to each other by hydrogen bonds in an arrangement known as base pair
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...

ing. These bonds almost always form between an adenine base on one strand and a thymine base on the other strand, or between a cytosine base on one strand and a guanine base on the other. This means that the number of A and T bases will be the same in a given double helix, as will the number of G and C bases. In RNA, thymine (T) is replaced by uracil
Uracil
Uracil is one of the four nucleobases in the nucleic acid of RNA that are represented by the letters A, G, C and U. The others are adenine, cytosine, and guanine. In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by thymine.Uracil is a common and...

 (U), and the deoxyribose is substituted by ribose
Ribose
Ribose is an organic compound with the formula C5H10O5; specifically, a monosaccharide with linear form H––4–H, which has all the hydroxyl groups on the same side in the Fischer projection....

.

Each protein-coding gene is transcribed
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

 into a molecule of the related polymer RNA. In prokaryotes, this RNA functions as messenger RNA
Messenger RNA
Messenger RNA is a molecule of RNA encoding a chemical "blueprint" for a protein product. mRNA is transcribed from a DNA template, and carries coding information to the sites of protein synthesis: the ribosomes. Here, the nucleic acid polymer is translated into a polymer of amino acids: a protein...

 or mRNA; in eukaryotes, the transcript needs to be processed
Post-transcriptional modification
Post-transcriptional modification is a process in cell biology by which, in eukaryotic cells, primary transcript RNA is converted into mature RNA. A notable example is the conversion of precursor messenger RNA into mature messenger RNA , which includes splicing and occurs prior to protein synthesis...

 to produce a mature mRNA. The mRNA is, in turn, translated
Translation (genetics)
In molecular biology and genetics, translation is the third stage of protein biosynthesis . In translation, messenger RNA produced by transcription is decoded by the ribosome to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein...

 on the ribosome
Ribosome
A ribosome is a component of cells that assembles the twenty specific amino acid molecules to form the particular protein molecule determined by the nucleotide sequence of an RNA molecule....

 into an amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

 chain or polypeptide
Peptide
Peptides are short polymers of amino acid monomers linked by peptide bonds. They are distinguished from proteins on the basis of size, typically containing less than 50 monomer units. The shortest peptides are dipeptides, consisting of two amino acids joined by a single peptide bond...

. The process of translation requires transfer RNA
Transfer RNA
Transfer RNA is an adaptor molecule composed of RNA, typically 73 to 93 nucleotides in length, that is used in biology to bridge the three-letter genetic code in messenger RNA with the twenty-letter code of amino acids in proteins. The role of tRNA as an adaptor is best understood by...

s specific for individual amino acids with the amino acids covalently attached to them, guanosine triphosphate
Guanosine triphosphate
Guanosine-5'-triphosphate is a purine nucleoside triphosphate. It can act as a substrate for the synthesis of RNA during the transcription process...

 as an energy source, and a number of translation factors. tRNAs have anticodons complementary to the codons in mRNA and can be "charged" covalently with amino acids at their 3' terminal CCA ends. Individual tRNAs are charged with specific amino acids by enzymes known as aminoacyl tRNA synthetase
Aminoacyl tRNA synthetase
An aminoacyl tRNA synthetase is an enzyme that catalyzes the esterification of a specific amino acid or its precursor to one of all its compatible cognate tRNAs to form an aminoacyl-tRNA. This is sometimes called "charging" the tRNA with the amino acid...

s, which have high specificity for both their cognate amino acids and tRNAs. The high specificity of these enzymes is a major reason why the fidelity of protein translation is maintained.

There are 4³ = 64 different codon combinations possible with a triplet codon of three nucleotides; all 64 codons are assigned for either amino acids or stop signals during translation. If, for example, an RNA sequence UUUAAACCC is considered and the reading frame
Reading frame
In biology, a reading frame is a way of breaking a sequence of nucleotides in DNA or RNA into three letter codons which can be translated in amino acids. There are 3 possible reading frames in an mRNA strand: each reading frame corresponding to starting at a different alignment...

 starts with the first U (by convention, 5' to 3'), there are three codons, namely, UUU, AAA, and CCC, each of which specifies one amino acid. This RNA sequence will be translated into an amino acid sequence, three amino acids long. A given amino acid may be encoded by between one and six different codon sequences. A comparison may be made with computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, where the codon is similar to a word, which is the standard "chunk" for handling data (like one amino acid of a protein), and a nucleotide is similar to a bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

, in that it is the smallest unit.

The standard genetic code is shown in the following tables. Table 1 shows what amino acid each of the 64 codons specifies. Table 2 shows what codons specify each of the 20 standard amino acids involved in translation. These are called forward and reverse codon tables, respectively. For example, the codon AAU represents the amino acid asparagine
Asparagine
Asparagine is one of the 20 most common natural amino acids on Earth. It has carboxamide as the side-chain's functional group. It is not an essential amino acid...

, and UGU and UGC represent cysteine
Cysteine
Cysteine is an α-amino acid with the chemical formula HO2CCHCH2SH. It is a non-essential amino acid, which means that it is biosynthesized in humans. Its codons are UGU and UGC. The side chain on cysteine is thiol, which is polar and thus cysteine is usually classified as a hydrophilic amino acid...

 (standard three-letter designations, Asn and Cys, respectively).

RNA codon table

nonpolar polar basic acidic (stop codon)

  2nd base
U C A G
1st base U UUU (Phe/F) Phenylalanine
Phenylalanine
Phenylalanine is an α-amino acid with the formula C6H5CH2CHCOOH. This essential amino acid is classified as nonpolar because of the hydrophobic nature of the benzyl side chain. L-Phenylalanine is an electrically neutral amino acid, one of the twenty common amino acids used to biochemically form...

UCU (Ser/S) Serine
Serine
Serine is an amino acid with the formula HO2CCHCH2OH. It is one of the proteinogenic amino acids. By virtue of the hydroxyl group, serine is classified as a polar amino acid.-Occurrence and biosynthesis:...

UAU (Tyr/Y) Tyrosine
Tyrosine
Tyrosine or 4-hydroxyphenylalanine, is one of the 22 amino acids that are used by cells to synthesize proteins. Its codons are UAC and UAU. It is a non-essential amino acid with a polar side group...

UGU (Cys/C) Cysteine
Cysteine
Cysteine is an α-amino acid with the chemical formula HO2CCHCH2SH. It is a non-essential amino acid, which means that it is biosynthesized in humans. Its codons are UGU and UGC. The side chain on cysteine is thiol, which is polar and thus cysteine is usually classified as a hydrophilic amino acid...

UUC (Phe/F) Phenylalanine UCC (Ser/S) Serine UAC (Tyr/Y) Tyrosine UGC (Cys/C) Cysteine
UUA (Leu/L) Leucine
Leucine
Leucine is a branched-chain α-amino acid with the chemical formula HO2CCHCH2CH2. Leucine is classified as a hydrophobic amino acid due to its aliphatic isobutyl side chain. It is encoded by six codons and is a major component of the subunits in ferritin, astacin and other 'buffer' proteins...

UCA (Ser/S) Serine UAA Stop
Stop codon
In the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are based on polypeptides, which are unique sequences of amino acids. Most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide...

 (Ochre)
UGA Stop (Opal)
UUG (Leu/L) Leucine UCG (Ser/S) Serine UAG Stop (Amber) UGG (Trp/W) Tryptophan
Tryptophan
Tryptophan is one of the 20 standard amino acids, as well as an essential amino acid in the human diet. It is encoded in the standard genetic code as the codon UGG...

    
C CUU (Leu/L) Leucine CCU (Pro/P) Proline
Proline
Proline is an α-amino acid, one of the twenty DNA-encoded amino acids. Its codons are CCU, CCC, CCA, and CCG. It is not an essential amino acid, which means that the human body can synthesize it. It is unique among the 20 protein-forming amino acids in that the α-amino group is secondary...

CAU (His/H) Histidine
Histidine
Histidine Histidine, an essential amino acid, has a positively charged imidazole functional group. It is one of the 22 proteinogenic amino acids. Its codons are CAU and CAC. Histidine was first isolated by German physician Albrecht Kossel in 1896. Histidine is an essential amino acid in humans...

CGU (Arg/R) Arginine
Arginine
Arginine is an α-amino acid. The L-form is one of the 20 most common natural amino acids. At the level of molecular genetics, in the structure of the messenger ribonucleic acid mRNA, CGU, CGC, CGA, CGG, AGA, and AGG, are the triplets of nucleotide bases or codons that codify for arginine during...

CUC (Leu/L) Leucine CCC (Pro/P) Proline CAC (His/H) Histidine CGC (Arg/R) Arginine
CUA (Leu/L) Leucine CCA (Pro/P) Proline CAA (Gln/Q) Glutamine
Glutamine
Glutamine is one of the 20 amino acids encoded by the standard genetic code. It is not recognized as an essential amino acid but may become conditionally essential in certain situations, including intensive athletic training or certain gastrointestinal disorders...

CGA (Arg/R) Arginine
CUG (Leu/L) Leucine CCG (Pro/P) Proline CAG (Gln/Q) Glutamine CGG (Arg/R) Arginine
A AUU (Ile/I) Isoleucine
Isoleucine
Isoleucine is an α-amino acid with the chemical formula HO2CCHCHCH2CH3. It is an essential amino acid, which means that humans cannot synthesize it, so it must be ingested. Its codons are AUU, AUC and AUA....

ACU (Thr/T) Threonine
Threonine
Threonine is an α-amino acid with the chemical formula HO2CCHCHCH3. Its codons are ACU, ACA, ACC, and ACG. This essential amino acid is classified as polar...

        
AAU (Asn/N) Asparagine
Asparagine
Asparagine is one of the 20 most common natural amino acids on Earth. It has carboxamide as the side-chain's functional group. It is not an essential amino acid...

AGU (Ser/S) Serine
AUC (Ile/I) Isoleucine ACC (Thr/T) Threonine AAC (Asn/N) Asparagine AGC (Ser/S) Serine
AUA (Ile/I) Isoleucine ACA (Thr/T) Threonine AAA (Lys/K) Lysine
Lysine
Lysine is an α-amino acid with the chemical formula HO2CCH4NH2. It is an essential amino acid, which means that the human body cannot synthesize it. Its codons are AAA and AAG....

AGA (Arg/R) Arginine
AUG (Met/M) Methionine
Methionine
Methionine is an α-amino acid with the chemical formula HO2CCHCH2CH2SCH3. This essential amino acid is classified as nonpolar. This amino-acid is coded by the codon AUG, also known as the initiation codon, since it indicates mRNA's coding region where translation into protein...

ACG (Thr/T) Threonine AAG (Lys/K) Lysine AGG (Arg/R) Arginine
G GUU (Val/V) Valine
Valine
Valine is an α-amino acid with the chemical formula HO2CCHCH2. L-Valine is one of 20 proteinogenic amino acids. Its codons are GUU, GUC, GUA, and GUG. This essential amino acid is classified as nonpolar...

GCU (Ala/A) Alanine
Alanine
Alanine is an α-amino acid with the chemical formula CH3CHCOOH. The L-isomer is one of the 20 amino acids encoded by the genetic code. Its codons are GCU, GCC, GCA, and GCG. It is classified as a nonpolar amino acid...

GAU (Asp/D) Aspartic acid
Aspartic acid
Aspartic acid is an α-amino acid with the chemical formula HOOCCHCH2COOH. The carboxylate anion, salt, or ester of aspartic acid is known as aspartate. The L-isomer of aspartate is one of the 20 proteinogenic amino acids, i.e., the building blocks of proteins...

GGU (Gly/G) Glycine
Glycine
Glycine is an organic compound with the formula NH2CH2COOH. Having a hydrogen substituent as its 'side chain', glycine is the smallest of the 20 amino acids commonly found in proteins. Its codons are GGU, GGC, GGA, GGG cf. the genetic code.Glycine is a colourless, sweet-tasting crystalline solid...

GUC (Val/V) Valine GCC (Ala/A) Alanine GAC (Asp/D) Aspartic acid GGC (Gly/G) Glycine
GUA (Val/V) Valine GCA (Ala/A) Alanine GAA (Glu/E) Glutamic acid
Glutamic acid
Glutamic acid is one of the 20 proteinogenic amino acids, and its codons are GAA and GAG. It is a non-essential amino acid. The carboxylate anions and salts of glutamic acid are known as glutamates...

GGA (Gly/G) Glycine
GUG (Val/V) Valine GCG (Ala/A) Alanine GAG (Glu/E) Glutamic acid GGG (Gly/G) Glycine

The codon AUG both codes for methionine and serves as an initiation site: the first AUG in an mRNA's coding region is where translation into protein begins.


Inverse table
Ala/A GCU, GCC, GCA, GCG Leu/L UUA, UUG, CUU, CUC, CUA, CUG
Arg/R CGU, CGC, CGA, CGG, AGA, AGG Lys/K AAA, AAG
Asn/N AAU, AAC Met/M AUG
Asp/D GAU, GAC Phe/F UUU, UUC
Cys/C UGU, UGC Pro/P CCU, CCC, CCA, CCG
Gln/Q CAA, CAG Ser/S UCU, UCC, UCA, UCG, AGU, AGC
Glu/E GAA, GAG Thr/T ACU, ACC, ACA, ACG
Gly/G GGU, GGC, GGA, GGG Trp/W UGG
His/H CAU, CAC Tyr/Y UAU, UAC
Ile/I AUU, AUC, AUA Val/V GUU, GUC, GUA, GUG
START AUG STOP UAA, UGA, UAG

DNA codon table

The DNA codon table is essentially identical to that for RNA, but with U
Uracil
Uracil is one of the four nucleobases in the nucleic acid of RNA that are represented by the letters A, G, C and U. The others are adenine, cytosine, and guanine. In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by thymine.Uracil is a common and...

 replaced by T
Thymine
Thymine is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be derived by methylation of uracil at...

.

Sequence reading frame

A codon is defined by the initial nucleotide from which translation starts. For example, the string GGGAAACCC, if read from the first position, contains the codons GGG, AAA, and CCC; and, if read from the second position, it contains the codons GGA and AAC; if read starting from the third position, GAA and ACC. Every sequence can, thus, be read in three reading frames, each of which will produce a different amino acid sequence (in the given example, Gly-Lys-Pro, Gly-Asn, or Glu-Thr, respectively). With double-stranded DNA, there are six possible reading frames, three in the forward orientation on one strand and three reverse on the opposite strand. The actual frame in which a protein sequence is translated is defined by a start codon
Start codon
The start codon is generally defined as the point, sequence, at which a ribosome begins to translate a sequence of RNA into amino acids.When an RNA transcript is "read" from the 5' carbon to the 3' carbon by the ribosome the start codon is the first codon on which the tRNA bound to Met,...

, usually the first AUG codon in the mRNA sequence.

Start/stop codons

Translation starts with a chain initiation codon
Start codon
The start codon is generally defined as the point, sequence, at which a ribosome begins to translate a sequence of RNA into amino acids.When an RNA transcript is "read" from the 5' carbon to the 3' carbon by the ribosome the start codon is the first codon on which the tRNA bound to Met,...

 (start codon). Unlike stop codons, the codon alone is not sufficient to begin the process. Nearby sequences (such as the Shine-Dalgarno sequence in E. coli) and initiation factor
Initiation factor
Initiation factors are proteins that bind to the small subunit of the ribosome during the initiation of translation, a part of protein biosynthesis.They are divided into three major groups:*Prokaryotic initiation factors*Archaeal initiation factors...

s are also required to start translation. The most common start codon is AUG, which is read as methionine or, in bacteria, as formylmethionine. Alternative start codons (depending on the organism), include "GUG" or "UUG"; these codons normally represent valine and leucine, respectively, but, as a start codon, they are translated as methionine or formylmethionine.

The three stop codon
Stop codon
In the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are based on polypeptides, which are unique sequences of amino acids. Most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide...

s have been given names: UAG is amber, UGA is opal (sometimes also called umber), and UAA is ochre. "Amber" was named by discoverers Richard Epstein and Charles Steinberg after their friend Harris Bernstein, whose last name means "amber" in German. The other two stop codons were named "ochre" and "opal" in order to keep the "color names" theme. Stop codons are also called "termination" or "nonsense" codons. They signal release of the nascent polypeptide from the ribosome because there is no cognate tRNA that has anticodons complementary to these stop signals, and so a release factor
Release factor
A release factor is a protein that allows for the termination of translation by recognizing the termination codon or stop codon in a mRNA sequence....

 binds to the ribosome instead.

Effect of mutations

During the process of DNA replication
DNA replication
DNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...

, errors occasionally occur in the polymerization of the second strand. These errors, called mutations, can have an impact on the phenotype of an organism, especially if they occur within the protein coding sequence of a gene. Error rates are usually very low—1 error in every 10–100 million bases—due to the "proofreading" ability of DNA polymerase
DNA polymerase
A DNA polymerase is an enzyme that helps catalyze in the polymerization of deoxyribonucleotides into a DNA strand. DNA polymerases are best known for their feedback role in DNA replication, in which the polymerase "reads" an intact DNA strand as a template and uses it to synthesize the new strand....

s.

Missense mutation
Missense mutation
In genetics, a missense mutation is a point mutation in which a single nucleotide is changed, resulting in a codon that codes for a different amino acid . This can render the resulting protein nonfunctional...

s and nonsense mutation
Nonsense mutation
In genetics, a nonsense mutation is a point mutation in a sequence of DNA that results in a premature stop codon, or a nonsense codon in the transcribed mRNA, and in a truncated, incomplete, and usually nonfunctional protein product. It differs from a missense mutation, which is a point mutation...

s are examples of point mutation
Point mutation
A point mutation, or single base substitution, is a type of mutation that causes the replacement of a single base nucleotide with another nucleotide of the genetic material, DNA or RNA. Often the term point mutation also includes insertions or deletions of a single base pair...

s, which can cause genetic diseases such as sickle-cell disease
Sickle-cell disease
Sickle-cell disease , or sickle-cell anaemia or drepanocytosis, is an autosomal recessive genetic blood disorder with overdominance, characterized by red blood cells that assume an abnormal, rigid, sickle shape. Sickling decreases the cells' flexibility and results in a risk of various...

 and thalassemia
Thalassemia
Thalassemia is an inherited autosomal recessive blood disease that originated in the Mediterranean region. In thalassemia the genetic defect, which could be either mutation or deletion, results in reduced rate of synthesis or no synthesis of one of the globin chains that make up hemoglobin...

 respectively. Clinically important missense mutations generally change the properties of the coded amino acid residue between being basic, acidic polar or non-polar, whereas nonsense mutations result in a stop codon
Stop codon
In the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are based on polypeptides, which are unique sequences of amino acids. Most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide...

.

Mutations that disrupt the reading frame sequence by indels (insertions or deletions
Genetic deletion
In genetics, a deletion is a mutation in which a part of a chromosome or a sequence of DNA is missing. Deletion is the loss of genetic material. Any number of nucleotides can be deleted, from a single base to an entire piece of chromosome...

) of a non-multiple of 3 nucleotide bases are known as frameshift mutation
Frameshift mutation
A frameshift mutation is a genetic mutation caused by indels of a number of nucleotides that is not evenly divisible by three from a DNA sequence...

s. These mutations usually result in a completely different translation from the original, and are also very likely to cause a stop codon
Stop codon
In the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are based on polypeptides, which are unique sequences of amino acids. Most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide...

 to be read, which truncates the creation of the protein. These mutations may impair the function of the resulting protein, and are thus rare in in vivo
In vivo
In vivo is experimentation using a whole, living organism as opposed to a partial or dead organism, or an in vitro controlled environment. Animal testing and clinical trials are two forms of in vivo research...

protein-coding sequences. One reason inheritance of frameshift mutations is rare is that, if the protein being translated is essential for growth under the selective pressures the organism faces, absence of a functional protein may cause death before the organism is viable. Frameshift mutations may result in severe genetic diseases such as Tay-Sachs disease
Tay-Sachs disease
Tay–Sachs disease is an autosomal recessive genetic disorder...

.

Although most mutations that change protein sequences are harmful or neutral, some mutations have a positive effect on an organism. These mutations may enable the mutant organism to withstand particular environmental stresses better than wild-type organisms, or reproduce more quickly. In these cases a mutation will tend to become more common in a population through natural selection
Natural selection
Natural selection is the nonrandom process by which biologic traits become either more or less common in a population as a function of differential reproduction of their bearers. It is a key mechanism of evolution....

. Virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...

es that use RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

 as their genetic material have rapid mutation rates, which can be an advantage, since these viruses will evolve constantly and rapidly, and thus evade the defensive responses of e.g. the human immune system
Immune system
An immune system is a system of biological structures and processes within an organism that protects against disease by identifying and killing pathogens and tumor cells. It detects a wide variety of agents, from viruses to parasitic worms, and needs to distinguish them from the organism's own...

. In large populations of asexually reproducing organisms, for example, E. coli, multiple beneficial mutations may co-occur. This phenomenon is called clonal interference
Clonal interference
Clonal interference is a phenomenon in the population genetics of an asexually reproducing organism. It occurs when two beneficial mutations arise independently in different individuals in a genetically homogeneous population. In the absence of genetic recombination, the mutations cannot join to...

 and causes competition among the mutations.

Degeneracy

Degeneracy is the redundancy of the genetic code. The genetic code has redundancy but no ambiguity (see the codon tables above for the full correlation). For example, although codons GAA and GAG both specify glutamic acid (redundancy), neither of them specifies any other amino acid (no ambiguity). The codons encoding one amino acid may differ in any of their three positions. For example the amino acid glutamic acid
Glutamic acid
Glutamic acid is one of the 20 proteinogenic amino acids, and its codons are GAA and GAG. It is a non-essential amino acid. The carboxylate anions and salts of glutamic acid are known as glutamates...

 is specified by GAA and GAG codons (difference in the third position), the amino acid leucine
Leucine
Leucine is a branched-chain α-amino acid with the chemical formula HO2CCHCH2CH2. Leucine is classified as a hydrophobic amino acid due to its aliphatic isobutyl side chain. It is encoded by six codons and is a major component of the subunits in ferritin, astacin and other 'buffer' proteins...

 is specified by UUA, UUG, CUU, CUC, CUA, CUG codons (difference in the first or third position), while the amino acid serine
Serine
Serine is an amino acid with the formula HO2CCHCH2OH. It is one of the proteinogenic amino acids. By virtue of the hydroxyl group, serine is classified as a polar amino acid.-Occurrence and biosynthesis:...

 is specified by UCA, UCG, UCC, UCU, AGU, AGC (difference in the first, second, or third position).

A position of a codon is said to be a fourfold degenerate site if any nucleotide at this position specifies the same amino acid. For example, the third position of the glycine
Glycine
Glycine is an organic compound with the formula NH2CH2COOH. Having a hydrogen substituent as its 'side chain', glycine is the smallest of the 20 amino acids commonly found in proteins. Its codons are GGU, GGC, GGA, GGG cf. the genetic code.Glycine is a colourless, sweet-tasting crystalline solid...

 codons (GGA, GGG, GGC, GGU) is a fourfold degenerate site, because all nucleotide substitutions at this site are synonymous; i.e., they do not change the amino acid. Only the third positions of some codons may be fourfold degenerate.
A position of a codon is said to be a twofold degenerate site if only two of four possible nucleotides at this position specify the same amino acid. For example, the third position of the glutamic acid
Glutamic acid
Glutamic acid is one of the 20 proteinogenic amino acids, and its codons are GAA and GAG. It is a non-essential amino acid. The carboxylate anions and salts of glutamic acid are known as glutamates...

 codons (GAA, GAG) is a twofold degenerate site. In twofold degenerate sites, the equivalent nucleotides are always either two purine
Purine
A purine is a heterocyclic aromatic organic compound, consisting of a pyrimidine ring fused to an imidazole ring. Purines, including substituted purines and their tautomers, are the most widely distributed kind of nitrogen-containing heterocycle in nature....

s (A/G) or two pyrimidine
Pyrimidine
Pyrimidine is a heterocyclic aromatic organic compound similar to benzene and pyridine, containing two nitrogen atoms at positions 1 and 3 of the six-member ring...

s (C/U), so only transversional substitutions (purine to pyrimidine or pyrimidine to purine) in twofold degenerate sites are nonsynonymous.
A position of a codon is said to be a non-degenerate site if any mutation at this position results in amino acid substitution. There is only one threefold degenerate site where changing to three of the four nucleotides may have no effect on the amino acid (depending on what it is changed to), while changing to the fourth possible nucleotide always results in an amino acid substitution. This is the third position of an isoleucine
Isoleucine
Isoleucine is an α-amino acid with the chemical formula HO2CCHCHCH2CH3. It is an essential amino acid, which means that humans cannot synthesize it, so it must be ingested. Its codons are AUU, AUC and AUA....

 codon: AUU, AUC, or AUA all encode isoleucine, but AUG encodes methionine
Methionine
Methionine is an α-amino acid with the chemical formula HO2CCHCH2CH2SCH3. This essential amino acid is classified as nonpolar. This amino-acid is coded by the codon AUG, also known as the initiation codon, since it indicates mRNA's coding region where translation into protein...

. In computation this position is often treated as a twofold degenerate site.

There are three amino acids encoded by six different codons: serine
Serine
Serine is an amino acid with the formula HO2CCHCH2OH. It is one of the proteinogenic amino acids. By virtue of the hydroxyl group, serine is classified as a polar amino acid.-Occurrence and biosynthesis:...

, leucine
Leucine
Leucine is a branched-chain α-amino acid with the chemical formula HO2CCHCH2CH2. Leucine is classified as a hydrophobic amino acid due to its aliphatic isobutyl side chain. It is encoded by six codons and is a major component of the subunits in ferritin, astacin and other 'buffer' proteins...

, and arginine
Arginine
Arginine is an α-amino acid. The L-form is one of the 20 most common natural amino acids. At the level of molecular genetics, in the structure of the messenger ribonucleic acid mRNA, CGU, CGC, CGA, CGG, AGA, and AGG, are the triplets of nucleotide bases or codons that codify for arginine during...

. Only two amino acids are specified by a single codon. One of these is the amino-acid methionine
Methionine
Methionine is an α-amino acid with the chemical formula HO2CCHCH2CH2SCH3. This essential amino acid is classified as nonpolar. This amino-acid is coded by the codon AUG, also known as the initiation codon, since it indicates mRNA's coding region where translation into protein...

, specified by the codon AUG, which also specifies the start of translation; the other is tryptophan
Tryptophan
Tryptophan is one of the 20 standard amino acids, as well as an essential amino acid in the human diet. It is encoded in the standard genetic code as the codon UGG...

, specified by the codon UGG.
The degeneracy of the genetic code is what accounts for the existence of synonymous mutations.

Degeneracy results because there are more codons than encodable amino acids. For example, if there were two bases per codon, then only 16 amino acids could be coded for (4²=16). Because at least 21 codes are required (20 amino acids plus stop) and the next largest number of bases is three, then 4³ gives 64 possible codons, meaning that some degeneracy must exist.

These properties of the genetic code make it more fault-tolerant for point mutation
Point mutation
A point mutation, or single base substitution, is a type of mutation that causes the replacement of a single base nucleotide with another nucleotide of the genetic material, DNA or RNA. Often the term point mutation also includes insertions or deletions of a single base pair...

s. For example, in theory, fourfold degenerate codons can tolerate any point mutation at the third position, although codon usage bias
Codon usage bias
Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides that encodes a specific amino acid residue in a polypeptide chain or for the termination of translation .There are 64 different codons but only 20...

 restricts this in practice in many organisms; twofold degenerate codons can tolerate one out of the three possible point mutations at the third position. Since transition
Transition (genetics)
In genetics, a transition is a point mutation that changes a purine nucleotide to another purine or a pyrimidine nucleotide to another pyrimidine . Approximately two out of three single nucleotide polymorphisms are transitions....

 mutations (purine to purine or pyrimidine to pyrimidine mutations) are more likely than transversion
Transversion
In molecular biology, transversion refers to the substitution of a purine for a pyrimidine or vice versa. It can only be reverted by a spontaneous reversion. Because this type of mutation changes the chemical structure dramatically, the consequences of this change tend to be more drastic than those...

 (purine to pyrimidine or vice-versa) mutations, the equivalence of purines or that of pyrimidines at twofold degenerate sites adds a further fault-tolerance.

A practical consequence of redundancy is that some errors in the genetic code cause only a silent mutation or an error that would not affect the protein because the hydrophilicity or hydrophobicity is maintained by equivalent substitution of amino acids; for example, a codon of NUN (where N = any nucleotide) tends to code for hydrophobic amino acids. NCN yields amino acid residues that are small in size and moderate in hydropathy; NAN encodes average size hydrophilic residues. These tendencies may result from the shared ancestry of the aminoacyl tRNA synthetases related to these codons.

Despite the redundancy of the genetic code, single-point mutations can still cause dysfunctional proteins. For example, a mutated hemoglobin
Hemoglobin
Hemoglobin is the iron-containing oxygen-transport metalloprotein in the red blood cells of all vertebrates, with the exception of the fish family Channichthyidae, as well as the tissues of some invertebrates...

 gene causes sickle-cell disease
Sickle-cell disease
Sickle-cell disease , or sickle-cell anaemia or drepanocytosis, is an autosomal recessive genetic blood disorder with overdominance, characterized by red blood cells that assume an abnormal, rigid, sickle shape. Sickling decreases the cells' flexibility and results in a risk of various...

. In the mutant hemoglobin, a hydrophilic glutamate (Glu) is substituted by the hydrophobic valine
Valine
Valine is an α-amino acid with the chemical formula HO2CCHCH2. L-Valine is one of 20 proteinogenic amino acids. Its codons are GUU, GUC, GUA, and GUG. This essential amino acid is classified as nonpolar...

 (Val); that is, GAA or GAG becomes GUA or GUG. The substitution of glutamate by valine reduces the solubility of β-globin
Beta globulins
Beta globulins are a group of globular proteins in plasma thatare more mobile in alkaline or electrically charged solutions than gamma globulins, but less mobile than alpha globulins. Beta globin is on chromosome 11.Examples of beta globulins include:...

, which causes hemoglobin
Hemoglobin
Hemoglobin is the iron-containing oxygen-transport metalloprotein in the red blood cells of all vertebrates, with the exception of the fish family Channichthyidae, as well as the tissues of some invertebrates...

 to form linear polymers linked by the hydrophobic interaction between the valine groups, causing sickle-cell deformation of erythrocytes. In general, sickle-cell disease is not caused by a de novo mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...

. It is, rather, selected for in geographic regions where malaria
Malaria
Malaria is a mosquito-borne infectious disease of humans and other animals caused by eukaryotic protists of the genus Plasmodium. The disease results from the multiplication of Plasmodium parasites within red blood cells, causing symptoms that typically include fever and headache, in severe cases...

 is common (in a way similar to thalassemia
Thalassemia
Thalassemia is an inherited autosomal recessive blood disease that originated in the Mediterranean region. In thalassemia the genetic defect, which could be either mutation or deletion, results in reduced rate of synthesis or no synthesis of one of the globin chains that make up hemoglobin...

), as heterozygous
Zygosity
Zygosity refers to the similarity of alleles for a trait in an organism. If both alleles are the same, the organism is homozygous for the trait. If both alleles are different, the organism is heterozygous for that trait...

 people have some resistance to the malarial Plasmodium
Plasmodium
Plasmodium is a genus of parasitic protists. Infection by these organisms is known as malaria. The genus Plasmodium was described in 1885 by Ettore Marchiafava and Angelo Celli. Currently over 200 species of this genus are recognized and new species continue to be described.Of the over 200 known...

parasite (heterozygote advantage
Heterozygote advantage
A heterozygote advantage describes the case in which the heterozygote genotype has a higher relative fitness than either the homozygote dominant or homozygote recessive genotype. The specific case of heterozygote advantage is due to a single locus known as overdominance...

).

These variable codes for amino acids are allowed because of modified bases in the first base of the anticodon of the tRNA, and the base-pair formed is called a wobble base pair
Wobble base pair
In molecular biology, a wobble base pair is a non-Watson-Crick base pairing between two nucleotides in RNA molecules. The four main wobble base pairs are guanine-uracil, inosine-uracil, inosine-adenine, and inosine-cytosine . The thermodynamic stability of a wobble base pair is comparable to that...

. The modified bases include inosine
Inosine
Inosine is a nucleoside that is formed when hypoxanthine is attached to a ribose ring via a β-N9-glycosidic bond....

 and the Non-Watson-Crick U-G basepair.

Variations to the standard genetic code

While slight variations on the standard code had been predicted earlier, none were discovered until 1979, when researchers studying human mitochondrial genes
Human mitochondrial genetics
Human mitochondrial genetics is the study of the genetics of the DNA contained in human mitochondria. Mitochondria are small structures in cells that generate energy for the cell to use, and are hence referred to as the "powerhouses" of the cell....

 discovered they used an alternative code. Many slight variants have been discovered since then, including various alternative mitochondrial codes, and small variants such as translation of the codon UGA as tryptophan in the species Mycoplasma
Mycoplasma
Mycoplasma refers to a genus of bacteria that lack a cell wall. Without a cell wall, they are unaffected by many common antibiotics such as penicillin or other beta-lactam antibiotics that target cell wall synthesis. They can be parasitic or saprotrophic. Several species are pathogenic in humans,...

and translation of CUG as a serine rather than a leucine in some members of the genus Candida
Candida (genus)
Candida is a genus of yeasts. Many species are harmless commensals or endosymbionts of animal hosts including humans, but other species, or harmless species in the wrong location, can cause disease. Candida albicans can cause infections in humans and other animals, especially in immunocompromised...

(see the article on Candida albicans
Candida albicans
Candida albicans is a diploid fungus that grows both as yeast and filamentous cells and a causal agent of opportunistic oral and genital infections in humans. Systemic fungal infections including those by C...

). In bacteria
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...

 and archaea
Archaea
The Archaea are a group of single-celled microorganisms. A single individual or species from this domain is called an archaeon...

, GUG and UUG are common start codons, but in rare cases, certain proteins may use alternative start codons not normally used by that species.

In certain proteins, non-standard amino acids are substituted for standard stop codons, depending on associated signal sequences in the messenger RNA. For example, UGA can code for selenocysteine
Selenocysteine
Selenocysteine is an amino acid that is present in several enzymes .-Nomenclature:...

, and UAG can code for pyrrolysine
Pyrrolysine
Pyrrolysine is a naturally occurring, genetically coded amino acid used by some methanogenic archaea and one known bacterium in enzymes that are part of their methane-producing metabolism. It is similar to lysine, but with an added pyrroline ring linked to the end of the lysine side chain...

. Selenocysteine is now viewed as the 21st amino acid, and pyrrolysine is viewed as the 22nd.

Despite these differences, all known naturally-occurring codes are very similar to each other, and the coding mechanism is the same for all organisms: three-base codons, tRNA, ribosomes, reading the code in the same direction and translating the code three letters at a time into sequences of amino acids.

Expanded genetic code

Since 2001, 40 non-natural amino acids have been added into protein by creating a unique codon (recoding) and a corresponding transfer-RNA:aminoacyl – tRNA-synthetase pair to encode it with diverse physicochemical and biological properties in order to be used as a tool to exploring protein structure
Protein structure
Proteins are an important class of biological macromolecules present in all organisms. Proteins are polymers of amino acids. Classified by their physical size, proteins are nanoparticles . Each protein polymer – also known as a polypeptide – consists of a sequence formed from 20 possible L-α-amino...

 and function or to create novel or enhanced proteins.

H. Murakami and M. Sisido have extended some codons to have four and five bases. Steven A. Benner constructed a functional 65th (in vivo) codon.

Origin

Prior to the elegant unification of the codon table's features with universal homochirality
Homochirality
Homochirality is a term used to refer to a group of molecules that possess the same sense of chirality. Molecules involved are not necessarily the same compound, but similar groups are arranged in the same way around a central atom. In biology homochirality is found in the chemical building blocks...

, many ideas on the table's origins have been proposed. Despite the minor variations that exist, the genetic code used by all known forms of life is nearly universal. However, there is a huge number of possible genetic codes. If amino acids are randomly associated with triplet codons, there will be 1.5 x 1084 possible genetic codes.

Phylogenetic analysis of transfer RNA suggests that tRNA molecules evolved before the present set of aminoacyl-tRNA synthetases.

In theory, the genetic code could be completely random (a "frozen accident"), completely non-random (optimal) or a combination of random and nonrandom. There are enough data to refute the first possibility. For a start, a quick view on the table of the genetic code shows a clustering of amino acid assignments. Furthermore, amino acids that share the same biosynthetic pathway tend to have the same first base in their codons, and amino acids with similar physical properties tend to have similar codons.

There are four themes running through the many theories about the evolution of the genetic code (and hence the origin of these patterns):
  • Chemical principles govern specific RNA interaction with amino acids. Experiments with aptamer
    Aptamer
    Aptamers are oligonucleic acid or peptide molecules that bind to a specific target molecule. Aptamers are usually created by selecting them from a large random sequence pool, but natural aptamers also exist in riboswitches. Aptamers can be used for both basic research and clinical purposes as...

    s showed that some amino acids have a selective chemical affinity for the base triplets that code for them. Recent experiments show that of the 8 amino acids tested, 6 show some RNA triplet-amino acid association. This has been called the stereochemical code. The stereochemical code could have created an ancient core of assignments. The current complex translation mechanism involving tRNA and associated enzymes may be a later development, and maybe protein sequences were directly templated on base sequences.
  • Biosynthetic expansion. The standard modern genetic code grew from a simpler earlier code through a process of "biosynthetic expansion". Here the idea is that primordial life "discovered" new amino acids (for example, as by-products of metabolism
    Metabolism
    Metabolism is the set of chemical reactions that happen in the cells of living organisms to sustain life. These processes allow organisms to grow and reproduce, maintain their structures, and respond to their environments. Metabolism is usually divided into two categories...

    ) and later incorporated some of these into the machinery of genetic coding. Although much circumstantial evidence has been found to suggest that fewer different amino acids were used in the past than today, precise and detailed hypotheses about which amino acids entered the code in what order have proved far more controversial.
  • Natural selection has led to codon assignments of the genetic code that minimize the effects of mutations. A recent hypothesis suggests that the triplet code was derived from codes that used longer than triplet codons (such as quadruplet codons). Longer than triplet decoding would have higher degree of codon redundancy and would be more error resistant than the triplet decoding. This feature could allow accurate decoding in the absence of highly complex translational machinery such as the ribosome
    Ribosome
    A ribosome is a component of cells that assembles the twenty specific amino acid molecules to form the particular protein molecule determined by the nucleotide sequence of an RNA molecule....

     and prior to the time when cells began making ribosomes.
  • Information channels: Information-theoretic
    Information theory
    Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

     approaches see the genetic code as an error-prone information channel. The inherent noise (that is, errors) in the channel poses the organism with a fundamental question: how to construct a genetic code that can withstand the impact of noise while accurately and efficiently translating information? These “rate-distortion” models suggest that the genetic code originated as a result of the interplay of the three conflicting evolutionary forces: the needs for diverse amino-acids, for error-tolerance and for minimal cost of resources. The code emerges at a coding transition when the mapping of codons to amino-acids becomes nonrandom. The emergence of the code is governed by the topology defined by the probable errors and is related to the map coloring problem.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK