Primary structure
The primary structure of peptides and proteins refers to the linear sequence of its amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

 structural units
Structural unit
In polymer chemistry, a structural unit is a building block of a polymer chain. It is the result of a monomer which has been polymerized into a long chain....

. The term "primary structure" was first coined by Linderstrøm-Lang in 1951. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end.

Primary structure of polypeptides

In general, polypeptides are unbranched polymers, so their primary structure
can often be specified by the sequence of amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

s along their backbone.
However, proteins can become cross-linked, most commonly by disulfide bonds, and the primary structure also requires specifying the cross-linking atoms, e.g., specifying the cysteine
Cysteine is an α-amino acid with the chemical formula HO2CCHCH2SH. It is a non-essential amino acid, which means that it is biosynthesized in humans. Its codons are UGU and UGC. The side chain on cysteine is thiol, which is polar and thus cysteine is usually classified as a hydrophilic amino acid...

s involved in the protein's disulfide bonds. Other crosslinks include desmosine...

The chiral centers of a polypeptide chain can undergo racemization
In chemistry, racemization refers to the converting of an enantiomerically pure mixture into a mixture where more than one of the enantiomers are present...

. In particular, the L-amino acids normally found in proteins can spontaneously isomerize at the atom to form D-amino acids, which cannot be cleaved by most protease
A protease is any enzyme that conducts proteolysis, that is, begins protein catabolism by hydrolysis of the peptide bonds that link amino acids together in the polypeptide chain forming the protein....


Finally, the protein can undergo a variety of posttranslational modification
Posttranslational modification
Posttranslational modification is the chemical modification of a protein after its translation. It is one of the later steps in protein biosynthesis, and thus gene expression, for many proteins....

s, which are briefly summarized here.

The N-terminal amino group of a polypeptide can be modified covalently, e.g.,
  • acetylation

The positive charge on the N-terminal amino group may be eliminated by changing it to an acetyl group (N-terminal blocking).

  • formylation

The N-terminal methionine usually found after translation has an N-terminus blocked with a formyl group. This formyl group (and sometimes the methionine residue itself, if followed by Gly or Ser) is removed by the enzyme deformylase.

  • pyroglutamate

An N-terminal glutamine can attack itself, forming a cyclic pyroglutamate group.

  • myristoylation

Similar to acetylation. Instead of a simple methyl group, the myristoyl group has a tail of 14 hydrophobic carbons, which make it ideal for anchoring proteins to cellular membranes.

The C-terminal carboxylate group of a polypeptide can also be modified, e.g.,

  • amidation (see Figure)
The C-terminus can also be blocked (thus, neutralizing its negative charge) by amidation.

  • glycosyl phosphatidylinositol (GPI) attachment
Glycosyl phosphatidylinositol is a large, hydrophobic phospholipid prosthetic group that achors proteins to cellular membranes. It is attached to the polypeptide C-terminus through an amide linkage that then connects to ethanolamine, thence to sundry sugars and finally to the phosphatidylinositol lipid moiety.

Finally, the peptide side chain
Side chain
In organic chemistry and biochemistry, a side chain is a chemical group that is attached to a core part of the molecule called "main chain" or backbone. The placeholder R is often used as a generic placeholder for alkyl group side chains in chemical structure diagrams. To indicate other non-carbon...

s can also be modified covalently, e.g.,
  • phosphorylation
Aside from cleavage, phosphorylation
Phosphorylation is the addition of a phosphate group to a protein or other organic molecule. Phosphorylation activates or deactivates many protein enzymes....

 is perhaps the most important chemical modification of proteins. A phosphate group can be attached to the sidechain hydroxyl group of serine, threonine and tyrosine residues, adding a negative charge at that site and producing an unnatural amino acid. Such reactions are catalyzed by kinase
In chemistry and biochemistry, a kinase is a type of enzyme that transfers phosphate groups from high-energy donor molecules, such as ATP, to specific substrates, a process referred to as phosphorylation. Kinases are part of the larger family of phosphotransferases...

and the reverse reaction is catalyzed by phosphatases. The phosphorylated tyrosines are often used as "handles" by which proteins can bind to one another, whereas phosphorylation of Ser/Thr often induces conformational changes, presumably because of the introduced negative charge. The effects of phosphorylating Ser/Thr can sometimes be simulated by mutating the Ser/Thr residue to glutamate.

  • glycosylation
    Glycosylation is the reaction in which a carbohydrate, i.e. a glycosyl donor, is attached to a hydroxyl or other functional group of another molecule . In biology glycosylation refers to the enzymatic process that attaches glycans to proteins, lipids, or other organic molecules...

A catch-all name for a set of very common and very heterogeneous chemical modifications. Sugar moieties can be attached to the sidechain hydroxyl groups of Ser/Thr or to the sidechain amide groups of Asn. Such attachments can serve many functions, ranging from increasing solubility to complex recognition. All glycosylation can be blocked with certain inhibitors, such as tunicamycin.

  • deamidation
    Deamidation is a chemical reaction in which an amide functional group is removed from an organic compound. In biochemistry, the reaction is important in the degradation of proteins because it damages the amide-containing side chains of the amino acids asparagine and glutamine.In the biochemical...

    (succinimide formation)
In this modification, an asparagine or aspartate side chain attacks the following peptide bond, forming a symmetrical succinimide intermediate. Hydrolysis of the intermediate produces either asparate or the β-amino acid, iso(Asp). For asparagine, either product results in the loss of the amide group, hence "deamidation".

  • hydroxylation
    Hydroxylation is a chemical process that introduces a hydroxyl group into an organic compound. In biochemistry, hydroxylation reactions are often facilitated by enzymes called hydroxylases. Hydroxylation is the first step in the oxidative degradation of organic compounds in air...

Proline residues may be hydroxylates at either of two atoms, as can lysine (at one atom). Hydroxyproline is a critical component of collagen
Collagen is a group of naturally occurring proteins found in animals, especially in the flesh and connective tissues of mammals. It is the main component of connective tissue, and is the most abundant protein in mammals, making up about 25% to 35% of the whole-body protein content...

, which becomes unstable upon its loss. The hydroxylation reaction is catalyzed by an enzyme that requires ascorbic acid
Ascorbic acid
Ascorbic acid is a naturally occurring organic compound with antioxidant properties. It is a white solid, but impure samples can appear yellowish. It dissolves well in water to give mildly acidic solutions. Ascorbic acid is one form of vitamin C. The name is derived from a- and scorbutus , the...

 (vitamin C), deficiencies in which lead to many connective-tissue diseases such as scurvy
Scurvy is a disease resulting from a deficiency of vitamin C, which is required for the synthesis of collagen in humans. The chemical name for vitamin C, ascorbic acid, is derived from the Latin name of scurvy, scorbutus, which also provides the adjective scorbutic...


  • methylation
    In the chemical sciences, methylation denotes the addition of a methyl group to a substrate or the substitution of an atom or group by a methyl group. Methylation is a form of alkylation with, to be specific, a methyl group, rather than a larger carbon chain, replacing a hydrogen atom...

Several protein residues can be methylated, most notably the positive groups of lysine and arginine. Methylation at these sites is used to regulate the binding of proteins to nucleic acids. Lysine residues can be singly, doubly and even triply methylated. Methylation does not alter the positive charge on the side chain, however.

  • acetylation
    Acetylation describes a reaction that introduces an acetyl functional group into a chemical compound...

Acetylation of the lysine amino groups is chemically analogous to the acetylation of the N-terminus. Functionally, however, the acetylation of lysine residues is used to regulate the binding of proteins to nucleic acids. The cancellation of the positive charge on the lysine weakens the electrostatic attraction for the (negatively charged) nucleic acids.

  • sulfation

Tyrosines may become sulfated on their atom. Somewhat unusually, this modification occurs in the Golgi apparatus
Golgi apparatus
The Golgi apparatus is an organelle found in most eukaryotic cells. It was identified in 1898 by the Italian physician Camillo Golgi, after whom the Golgi apparatus is named....

, not in the endoplasmic reticulum
Endoplasmic reticulum
The endoplasmic reticulum is an organelle of cells in eukaryotic organisms that forms an interconnected network of tubules, vesicles, and cisternae...

. Similar to phosphorylated tyrosines, sulfated tyrosines are used for specific recognition, e.g., in chemokine receptors on the cell surface. As with phosphorylation, sulfation adds a negative charge to a previously neutral site.
  • prenylation
    Prenylation, or isoprenylation, or lipidation is the addition of hydrophobic molecules to a protein. It is usually assumed that prenyl groups facilitate attachment to cell membranes, similar to lipid anchor like the GPI anchor, though direct evidence is missing...

    and palmitoylation

The hydrophobic isoprene (e.g., farnesyl, geranyl, and geranylgeranyl groups) and palmitoyl groups may be added to the atom of cysteine residues to anchor proteins to cellular membranes. Unlike the GPI
and myritoyl anchors, these groups are not necessarily added at the termini.
  • carboxylation
A relatively rare modification that adds an extra carboxylate group (and, hence, a double negative charge) to a glutamate side chain, producing a Gla residue. This is used to strengthen the binding to "hard" metal ions such as calcium
Calcium is the chemical element with the symbol Ca and atomic number 20. It has an atomic mass of 40.078 amu. Calcium is a soft gray alkaline earth metal, and is the fifth-most-abundant element by mass in the Earth's crust...


  • ADP-ribosylation

The large ADP-ribosyl group can be transferred to several types of side chains within proteins, with heterogeneous effects. This modification is a target for the powerful toxins of disparate bacteria, e.g., Vibrio cholerae, Corynebacterium diphtheriae and Bordetella pertussis.
  • ubiquitination
    Ubiquitin is a small regulatory protein that has been found in almost all tissues of eukaryotic organisms. Among other functions, it directs protein recycling.Ubiquitin can be attached to proteins and label them for destruction...

    and SUMOylation
    SUMO protein
    Small Ubiquitin-like Modifier or SUMO proteins are a family of small proteins that are covalently attached to and detached from other proteins in cells to modify their function...

Various full-length, folded proteins can be attached at their C-termini to the sidechain ammonium groups of lysines of other proteins. Ubiquitin is the most common of these, and usually signals that the ubiquitin-tagged protein should be degraded.

Most of the polypeptide modifications listed above occur post-translationally, i.e., after the protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

 has been synthesized on the ribosome
A ribosome is a component of cells that assembles the twenty specific amino acid molecules to form the particular protein molecule determined by the nucleotide sequence of an RNA molecule....

, typically occurring in the endoplasmic reticulum
Endoplasmic reticulum
The endoplasmic reticulum is an organelle of cells in eukaryotic organisms that forms an interconnected network of tubules, vesicles, and cisternae...

, a subcellular organelle
In cell biology, an organelle is a specialized subunit within a cell that has a specific function, and is usually separately enclosed within its own lipid bilayer....

 of the eukaryotic cell.

Many other chemical reactions (e.g., cyanylation) have been applied to proteins by chemists, although they are not found in biological systems.

Modifications of primary structure

In addition to those listed above, the most important modification of primary structure is peptide cleavage (See: Protease
A protease is any enzyme that conducts proteolysis, that is, begins protein catabolism by hydrolysis of the peptide bonds that link amino acids together in the polypeptide chain forming the protein....

). Proteins are often synthesized in an inactive precursor form; typically, an N-terminal or C-terminal segment blocks the active site
Active site
In biology the active site is part of an enzyme where substrates bind and undergo a chemical reaction. The majority of enzymes are proteins but RNA enzymes called ribozymes also exist. The active site of an enzyme is usually found in a cleft or pocket that is lined by amino acid residues that...

 of the protein, inhibiting its function. The protein is activated by cleaving off the inhibitory peptide.

Some proteins even have the power to cleave themselves. Typically, the hydroxyl group of a serine (rarely, threonine) or the thiol group of a cysteine residue will attack the carbonyl carbon of the preceding peptide bond, forming a tetrahedrally bonded intermediate [classified as a hydroxyoxazolidine (Ser/Thr) or hydroxythiazolidine (Cys) intermediate]. This intermediate tends to revert to the amide form, expelling the attacking group, since the amide form is usually favored by free energy, (presumably due to the strong resonance stabilization of the peptide group). However, additional molecular interactions may render the amide form less stable; the amino group is expelled instead, resulting in an ester (Ser/Thr) or thioester (Cys) bond in place of the peptide bond. This chemical reaction is called an N-O acyl shift.

The ester/thioester bond can be resolved in several ways:
  • Simple hydrolysis will split the polypeptide chain, where the displaced amino group becomes the new N-terminus. This is seen in the maturation of glycosylasparaginase.

  • A β-elimination reaction also splits the chain, but results in a pyruvoyl group at the new N-terminus. This pyruvoyl group may be used as a covalently attached catalytic cofactor in some enzymes, especially decarboxylases such as S-adenosylmethionine decarboxylase {SAMDC) that exploit the electron-withdrawing power of the pyruvoyl group.

  • Intramolecular transesterification, resulting in a branched polypeptide. In intein
    An intein is a segment of a protein that is able to excise itself and rejoin the remaining portions with a peptide bond. Inteins have also been called "protein introns"....

    s, the new ester bond is broken by an intramolecular attack by the soon-to-be C-terminal asparagine.

  • Intermolecular transesterification can transfer a whole segment from one polypeptide to another, as is seen in the Hedgehog protein autoprocessing.

History of protein primary structure

The proposal that proteins were linear chains of α-amino acids was made nearly simultaneously by two scientists at the same conference in 1902, the 74th meeting of the Society of German Scientists and Physicians, held in Karlsbad. Franz Hofmeister
Franz Hofmeister
Franz Hofmeister was an early protein scientist, and is famous for his studies of salts that influence the solubility and conformational stability of proteins...

 made the proposal in the morning, based on his observations of the biuret reaction in proteins. Hofmeister was followed a few hours later by Emil Fischer
Emil Fischer
Emil Fischer may refer to:* Emil Fischer , German dramatic basso* Franz Joseph Emil Fischer , German chemist, worked with oil and coal* Hermann Emil Fischer , German Nobel laureate in chemistry...

, who had amassed a wealth of chemical details supporting the peptide-bond model. For completeness, the proposal that proteins contained amide linkages was made as early as 1882 by the French chemist E. Grimaux.

Despite these data and later evidence that proteolytically digested proteins yielded only oligopeptides, the idea that proteins were linear, unbranched polymers of amino acids was not accepted immediately. Some well-respected scientists such as William Astbury
William Astbury
William Thomas Astbury FRS was an English physicist and molecular biologist who made pioneering X-ray diffraction studies of biological molecules. His work on keratin provided the foundation for Linus Pauling's discovery of the alpha helix...

 doubted that covalent bonds were strong enough to hold such long molecules together; they feared that thermal agitations would shake such long molecules asunder. Hermann Staudinger
Hermann Staudinger
- External links :* Staudinger's * Staudinger's Nobel Lecture *....

 faced similar prejudices in the 1920s when he argued that rubber
Natural rubber, also called India rubber or caoutchouc, is an elastomer that was originally derived from latex, a milky colloid produced by some plants. The plants would be ‘tapped’, that is, an incision made into the bark of the tree and the sticky, milk colored latex sap collected and refined...

 was composed of macromolecule
A macromolecule is a very large molecule commonly created by some form of polymerization. In biochemistry, the term is applied to the four conventional biopolymers , as well as non-polymeric molecules with large molecular mass such as macrocycles...


Thus, several alternative hypotheses arose. The colloidal protein hypothesis stated that proteins were colloidal assemblies of smaller molecules. This hypothesis was disproved in the 1920s by ultracentrifugation measurements by Theodor Svedberg
Theodor Svedberg
Theodor H. E. Svedberg was a Swedish chemist and Nobel laureate, active at Uppsala University. His work with colloids supported the theories of Brownian motion put forward by Einstein and the Polish geophysicist Marian Smoluchowski...

 that showed that proteins had a well-defined, reproducible molecular weight and by electrophoretic measurements by Arne Tiselius
Arne Tiselius
Arne Wilhelm Kaurin Tiselius was a Swedish biochemist who won the Nobel Prize in Chemistry in 1948.- Biography:Tiselius was born in Stockholm...

 that indicated that proteins were single molecules. A second hypothesis, the cyclol
The cyclol hypothesis is the first structural model of a folded, globular protein. It was developed by Dorothy Wrinch in the late 1930s, and was based on three assumptions. Firstly, the hypothesis assumes that two peptide groups can be crosslinked by a cyclol reaction ; these crosslinks are...

advanced by Dorothy Wrinch, proposed that the linear polypeptide underwent a chemical cyclol rearrangement C=O + HN C(OH)-N that crosslinked its backbone amide groups, forming a two-dimensional fabric. Other primary structures of proteins were proposed by various researchers, such as the diketopiperazine model of Emil Abderhalden
Emil Abderhalden
Emil Abderhalden was a Swiss biochemist and physiologist. His main findings, though disputed already in the 1920s, were not finally rejected until the late 1990s. Whether his misleading findings were based on fraud or simply the result of a lack of scientific rigor remains unclear...

 and the pyrrol/piperidine model of Troensegaard in 1942. Although never given much credence, these alternative models were finally disproved when Frederick Sanger
Frederick Sanger
Frederick Sanger, OM, CH, CBE, FRS is an English biochemist and a two-time Nobel laureate in chemistry, the only person to have been so. In 1958 he was awarded a Nobel prize in chemistry "for his work on the structure of proteins, especially that of insulin"...

 successfully sequenced insulin
Insulin is a hormone central to regulating carbohydrate and fat metabolism in the body. Insulin causes cells in the liver, muscle, and fat tissue to take up glucose from the blood, storing it as glycogen in the liver and muscle....

 and by the crystallographic determination of myoglobin and hemoglobin by Max Perutz
Max Perutz
Max Ferdinand Perutz, OM, CH, CBE, FRS was an Austrian-born British molecular biologist, who shared the 1962 Nobel Prize for Chemistry with John Kendrew, for their studies of the structures of hemoglobin and globular proteins...

 and John Kendrew
John Kendrew
Sir John Cowdery Kendrew, CBE, FRS was an English biochemist and crystallographer who shared the 1962 Nobel Prize in Chemistry with Max Perutz; their group in the Cavendish Laboratory investigated the structure of heme-containing proteins.-Biography:He was born in Oxford, son of Wilford George...


Primary structure in other molecules

Any linear-chain heteropolymer can be said to have a "primary structure" by analogy to the usage of the term for proteins, but this usage is rare compared to the extremely common usage in reference to proteins. In RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

, which also has extensive secondary structure
Secondary structure
In biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...

, the linear chain of bases is generally just referred to as the "sequence" as it is in DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 (which usually forms a linear double helix with little secondary structure). Other biological polymers such as polysaccharides can also be considered to have a primary structure, although the usage is not standard.

Relation to secondary and tertiary structure

The primary structure of a biological polymer to a large extent determines the three-dimensional shape known as the tertiary structure
Tertiary structure
In biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...

, but nucleic acid
Nucleic acid
Nucleic acids are biological molecules essential for life, and include DNA and RNA . Together with proteins, nucleic acids make up the most important macromolecules; each is found in abundance in all living things, where they function in encoding, transmitting and expressing genetic information...

 and protein folding
Protein folding
Protein folding is the process by which a protein structure assumes its functional shape or conformation. It is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil....

 are so complex that knowing the primary structure often doesn't help either to deduce the shape or to predict localized secondary structure
Secondary structure
In biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...

, such as the formation of loops or helices. However, knowing the structure of a similar homologous sequence (for example a member of the same protein family
Protein family
A protein family is a group of evolutionarily-related proteins, and is often nearly synonymous with gene family. The term protein family should not be confused with family as it is used in taxonomy....

) can unambiguously identify the tertiary structure
Tertiary structure
In biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...

 of the given sequence. Sequence families are often determined by sequence clustering
Sequence clustering
In bioinformatics, sequence clustering algorithms attempt to group sequences that are somehow related. The sequences can be either of genomic, "transcriptomic" or protein origin.For proteins, homologous sequences are typically grouped into families...

, and structural genomics
Structural genomics
Structural genomics seeks to describe the 3-dimensional structure of every protein encoded by a given genome. This genome-based approach allows for a high-throughput method of structure determination by a combination of experimental and modeling approaches...

 projects aim to produce a set of representative structures to cover the sequence space of possible non-redundant sequences.

See also

  • Protein sequencing
    Protein sequencing
    Protein sequencing is a technique to determine the amino acid sequence of a protein, as well as which conformation the protein adopts and the extent to which it is complexed with any non-peptide molecules...

  • translation
  • Pseudo amino acid composition
    Pseudo amino acid composition
    Pseudo amino acid composition, or PseAA composition, was originally introduced by Kuo-Chen Chou in 2001 to represent protein samples for improving protein subcellular localization prediction and membrane protein type prediction.- Background :...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.