Intein
Encyclopedia
An intein is a segment of a protein
that is able to excise itself and rejoin the remaining portions (the exteins) with a peptide bond
. Inteins have also been called "protein intron
s".
Intein-mediated protein splicing
occurs after mRNA has been translated into a protein. This precursor protein contains three segments — an N-extein followed by the intein followed by a C-extein. After splicing has taken place, the result is also called an extein.
ATPase
(without intein) and the homologous
gene in yeast (with intein) that was first described as a putative calcium ion transporter. In 1990 Hirata et al. demonstrated that the extra sequence in the yeast gene was transcribed into mRNA and removed itself from the host protein only after translation. Since then, inteins have been found in all three domains of life
(eukaryotes, bacteria, and archaea) and in virus
es.
Most reported inteins also contain an endonuclease
domain that plays a role in intein propagation. In fact, many gene
s have unrelated intein-coding segments inserted at different positions. For these and other reasons, inteins (or more properly, the gene segments coding for inteins) are sometimes called selfish genetic elements, but it may be more accurate to call them parasitic. The difference is that "selfish genes" are "selfish" only insofar as to compete with other genes or allele
s but usually fulfill a function, whereas "parasitic genes" are always functionless.
, which was developed at the same time as inteins were discovered.
The process begins with an N-O or N-S shift when the side chain of the first residue (a serine
, threonine
, or cysteine
) of the intein portion of the precursor protein nucleophilically
attacks the peptide bond
of the residue immediately upstream (that is, the final residue of the N-extein) to form a linear ester
(or thioester
) intermediate. A transesterification
occurs when the side chain of the first residue of the C-extein attacks the newly formed (thio)ester to free the N-terminal end of the intein. This forms a branched intermediate in which the N-extein and C-extein are attached, albeit not through a peptide bond. The last residue of the intein is always an asparagine
, and the amide
nitrogen atom of this side chain cleaves apart the peptide bond between the intein and the C-extein, resulting in a free intein segment with a terminal cyclic imide
. Finally, the free amino grouo
of the C-extein now attacks the (thio)ester linking the N- and C-exteins together. An O-N or S-N shift produces a peptide bond and the functional, ligated
protein.
. There are more than 200 inteins identified to date; sizes range from 100–800 AA
s. Inteins have been engineered for particular applications such as protein semisynthesis
and the selective labeling of protein segments, which is useful for NMR studies of large proteins.
Pharmaceutical inhibition
of intein excision may be a useful tool for drug development
; the protein that contains the intein will not carry out its normal function if the intein does not excise, since its structure will be disrupted.
It has been suggested that inteins could prove useful for achieving allotopic expression
of certain highly hydrophobic
proteins normally encoded by the mitochondrial genome, for example in gene therapy
(de Grey 2000). The hydrophobicity of these proteins is an obstacle to their import into mitochondria. Therefore, the insertion of a non-hydrophobic intein may allow this import to proceed. Excision of the intein after import would then restore the protein to wild-type.
of the organism
in which it is found, and the second part is based on the name of the corresponding gene or extein. For example, the intein found in Thermoplasma acidophilum and associated with Vacuolar ATPase subunit A (VMA) is called "Tac VMA".
Normally, as in this example, just three letters suffice to specify the organism, but there are variations. For example, additional letters may be added to indicate a strain. If more than one intein is encoded in the corresponding gene, the inteins are given a numerical suffix starting from 5′ to 3′
or in order of their identification (for example, "Msm dnaB-1").
The segment of the gene that encodes the intein is usually given the same name as the intein, but to avoid confusion the name of the intein proper is usually capitalized (e.g., Pfu RIR1-1), whereas the name of the corresponding gene segment is italicized (e.g., Pfu rir1-1).
(HEG) domain in addition to the splicing domains. This domain is responsible for the spread of the intein by cleaving DNA at an intein-free allele
on the homologous chromosome
, triggering the DNA double-stranded break repair (DSBR) system, which then repairs the break, thus copying the intein-coding DNA into a previously intein-free site. The HEG domain is not necessary for intein splicing, and so it can be lost, forming a minimal, or mini, intein. Several studies have demonstrated the modular nature of inteins by adding or removing HEG domains and determining the activity of the new construct.
, the catalytic subunit α of DNA polymerase III
, is encoded by two separate genes, dnaE-n and dnaE-c. The dnaE-n product
consists of an N-extein sequence followed by a 123-AA intein sequence, whereas the dnaE-c product consists of a 36-AA intein sequence followed by a C-extein sequence.
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
that is able to excise itself and rejoin the remaining portions (the exteins) with a peptide bond
Peptide bond
This article is about the peptide link found within biological molecules, such as proteins. A similar article for synthetic molecules is being created...
. Inteins have also been called "protein intron
Intron
An intron is any nucleotide sequence within a gene that is removed by RNA splicing to generate the final mature RNA product of a gene. The term intron refers to both the DNA sequence within a gene, and the corresponding sequence in RNA transcripts. Sequences that are joined together in the final...
s".
Intein-mediated protein splicing
Protein splicing
Protein splicing is an intramolecular reaction of a particular protein in which an internal protein segment is removed from a precursor protein with a ligation of C-terminal and N-terminal external proteins on both sides...
occurs after mRNA has been translated into a protein. This precursor protein contains three segments — an N-extein followed by the intein followed by a C-extein. After splicing has taken place, the result is also called an extein.
History
The first intein was discovered in 1988 through sequence comparison between the Neurospora crassa and carrot vacuolarVacuole
A vacuole is a membrane-bound organelle which is present in all plant and fungal cells and some protist, animal and bacterial cells. Vacuoles are essentially enclosed compartments which are filled with water containing inorganic and organic molecules including enzymes in solution, though in certain...
ATPase
ATPase
ATPases are a class of enzymes that catalyze the decomposition of adenosine triphosphate into adenosine diphosphate and a free phosphate ion. This dephosphorylation reaction releases energy, which the enzyme harnesses to drive other chemical reactions that would not otherwise occur...
(without intein) and the homologous
Homology (biology)
Homology forms the basis of organization for comparative biology. In 1843, Richard Owen defined homology as "the same organ in different animals under every variety of form and function". Organs as different as a bat's wing, a seal's flipper, a cat's paw and a human hand have a common underlying...
gene in yeast (with intein) that was first described as a putative calcium ion transporter. In 1990 Hirata et al. demonstrated that the extra sequence in the yeast gene was transcribed into mRNA and removed itself from the host protein only after translation. Since then, inteins have been found in all three domains of life
Three-domain system
The three-domain system is a biological classification introduced by Carl Woese in 1977 that divides cellular life forms into archaea, bacteria, and eukaryote domains. In particular, it emphasizes the separation of prokaryotes into two groups, originally called Eubacteria and Archaebacteria...
(eukaryotes, bacteria, and archaea) and in virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...
es.
Most reported inteins also contain an endonuclease
Endonuclease
Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain, in contrast to exonucleases, which cleave phosphodiester bonds at the end of a polynucleotide chain. Typically, a restriction site will be a palindromic sequence four to six nucleotides long. Most...
domain that plays a role in intein propagation. In fact, many gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
s have unrelated intein-coding segments inserted at different positions. For these and other reasons, inteins (or more properly, the gene segments coding for inteins) are sometimes called selfish genetic elements, but it may be more accurate to call them parasitic. The difference is that "selfish genes" are "selfish" only insofar as to compete with other genes or allele
Allele
An allele is one of two or more forms of a gene or a genetic locus . "Allel" is an abbreviation of allelomorph. Sometimes, different alleles can result in different observable phenotypic traits, such as different pigmentation...
s but usually fulfill a function, whereas "parasitic genes" are always functionless.
Mechanism
The mechanism for the splicing effect is a naturally occurring analogy to the technique for chemically generating medium-sized proteins called native chemical ligationNative chemical ligation
Native chemical ligation or NCL is the most widely used form of chemical ligation, a technique for constructing a large polypeptide from two or more unprotected peptides. In native chemical ligation a peptide containing a C-terminal thioester reacts with another peptide containing an N-terminal...
, which was developed at the same time as inteins were discovered.
The process begins with an N-O or N-S shift when the side chain of the first residue (a serine
Serine
Serine is an amino acid with the formula HO2CCHCH2OH. It is one of the proteinogenic amino acids. By virtue of the hydroxyl group, serine is classified as a polar amino acid.-Occurrence and biosynthesis:...
, threonine
Threonine
Threonine is an α-amino acid with the chemical formula HO2CCHCHCH3. Its codons are ACU, ACA, ACC, and ACG. This essential amino acid is classified as polar...
, or cysteine
Cysteine
Cysteine is an α-amino acid with the chemical formula HO2CCHCH2SH. It is a non-essential amino acid, which means that it is biosynthesized in humans. Its codons are UGU and UGC. The side chain on cysteine is thiol, which is polar and thus cysteine is usually classified as a hydrophilic amino acid...
) of the intein portion of the precursor protein nucleophilically
Nucleophile
A nucleophile is a species that donates an electron-pair to an electrophile to form a chemical bond in a reaction. All molecules or ions with a free pair of electrons can act as nucleophiles. Because nucleophiles donate electrons, they are by definition Lewis bases.Nucleophilic describes the...
attacks the peptide bond
Peptide bond
This article is about the peptide link found within biological molecules, such as proteins. A similar article for synthetic molecules is being created...
of the residue immediately upstream (that is, the final residue of the N-extein) to form a linear ester
Ester
Esters are chemical compounds derived by reacting an oxoacid with a hydroxyl compound such as an alcohol or phenol. Esters are usually derived from an inorganic acid or organic acid in which at least one -OH group is replaced by an -O-alkyl group, and most commonly from carboxylic acids and...
(or thioester
Thioester
Thioesters are compounds with the functional group C-S-CO-C. They are the product of esterification between a carboxylic acid and a thiol. Thioesters are widespread in biochemistry, the best-known derivative being acetyl-CoA.-Synthesis:...
) intermediate. A transesterification
Transesterification
In organic chemistry, transesterification is the process of exchanging the organic group R″ of an ester with the organic group R′ of an alcohol. These reactions are often catalyzed by the addition of an acid or base catalyst...
occurs when the side chain of the first residue of the C-extein attacks the newly formed (thio)ester to free the N-terminal end of the intein. This forms a branched intermediate in which the N-extein and C-extein are attached, albeit not through a peptide bond. The last residue of the intein is always an asparagine
Asparagine
Asparagine is one of the 20 most common natural amino acids on Earth. It has carboxamide as the side-chain's functional group. It is not an essential amino acid...
, and the amide
Amide
In chemistry, an amide is an organic compound that contains the functional group consisting of a carbonyl group linked to a nitrogen atom . The term refers both to a class of compounds and a functional group within those compounds. The term amide also refers to deprotonated form of ammonia or an...
nitrogen atom of this side chain cleaves apart the peptide bond between the intein and the C-extein, resulting in a free intein segment with a terminal cyclic imide
Imide
In organic chemistry, an imide is a functional group consisting of two carbonyl groups bound to nitrogen. These compounds are structurally related to acid anhydrides. The relationship between esters and amides and between imides and anhydrides is analogous, the amine-derived groups are less reactive...
. Finally, the free amino grouo
Amine
Amines are organic compounds and functional groups that contain a basic nitrogen atom with a lone pair. Amines are derivatives of ammonia, wherein one or more hydrogen atoms have been replaced by a substituent such as an alkyl or aryl group. Important amines include amino acids, biogenic amines,...
of the C-extein now attacks the (thio)ester linking the N- and C-exteins together. An O-N or S-N shift produces a peptide bond and the functional, ligated
Ligation
Ligation may refer to:* In molecular biology, the covalent linking of two ends of DNA molecules using DNA ligase* In medicine, the making of a ligature * Chemical ligation, the production of peptides from amino acids...
protein.
Inteins in biotechnology
Inteins are very efficient at protein splicing, and they have accordingly found an important role in biotechnologyBiotechnology
Biotechnology is a field of applied biology that involves the use of living organisms and bioprocesses in engineering, technology, medicine and other fields requiring bioproducts. Biotechnology also utilizes these products for manufacturing purpose...
. There are more than 200 inteins identified to date; sizes range from 100–800 AA
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...
s. Inteins have been engineered for particular applications such as protein semisynthesis
Peptide synthesis
In organic chemistry, peptide synthesis is the production of peptides, which are organic compounds in which multiple amino acids are linked via amide bonds which are also known as peptide bonds...
and the selective labeling of protein segments, which is useful for NMR studies of large proteins.
Pharmaceutical inhibition
Enzyme inhibitor
An enzyme inhibitor is a molecule that binds to enzymes and decreases their activity. Since blocking an enzyme's activity can kill a pathogen or correct a metabolic imbalance, many drugs are enzyme inhibitors. They are also used as herbicides and pesticides...
of intein excision may be a useful tool for drug development
Drug development
Drug development is a blanket term used to define the process of bringing a new drug to the market once a lead compound has been identified through the process of drug discovery...
; the protein that contains the intein will not carry out its normal function if the intein does not excise, since its structure will be disrupted.
It has been suggested that inteins could prove useful for achieving allotopic expression
Allotopic expression
Allotopic expression refers to expression from the nuclear genome of genes that normally are expressed only from the mitochondrial genome. Biomedically engineered AE has been suggested as a possible future tool in gene therapy of certain mitochondria-related diseases , however this view is...
of certain highly hydrophobic
Hydrophobe
In chemistry, hydrophobicity is the physical property of a molecule that is repelled from a mass of water....
proteins normally encoded by the mitochondrial genome, for example in gene therapy
Gene therapy
Gene therapy is the insertion, alteration, or removal of genes within an individual's cells and biological tissues to treat disease. It is a technique for correcting defective genes that are responsible for disease development...
(de Grey 2000). The hydrophobicity of these proteins is an obstacle to their import into mitochondria. Therefore, the insertion of a non-hydrophobic intein may allow this import to proceed. Excision of the intein after import would then restore the protein to wild-type.
Intein naming conventions
The first part of an intein name is based on the scientific nameBinomial nomenclature
Binomial nomenclature is a formal system of naming species of living things by giving each a name composed of two parts, both of which use Latin grammatical forms, although they can be based on words from other languages...
of the organism
Organism
In biology, an organism is any contiguous living system . In at least some form, all organisms are capable of response to stimuli, reproduction, growth and development, and maintenance of homoeostasis as a stable whole.An organism may either be unicellular or, as in the case of humans, comprise...
in which it is found, and the second part is based on the name of the corresponding gene or extein. For example, the intein found in Thermoplasma acidophilum and associated with Vacuolar ATPase subunit A (VMA) is called "Tac VMA".
Normally, as in this example, just three letters suffice to specify the organism, but there are variations. For example, additional letters may be added to indicate a strain. If more than one intein is encoded in the corresponding gene, the inteins are given a numerical suffix starting from 5′ to 3′
Directionality (molecular biology)
Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. The chemical convention of naming carbon atoms in the nucleotide sugar-ring numerically gives rise to a 5′-end and a 3′-end...
or in order of their identification (for example, "Msm dnaB-1").
The segment of the gene that encodes the intein is usually given the same name as the intein, but to avoid confusion the name of the intein proper is usually capitalized (e.g., Pfu RIR1-1), whereas the name of the corresponding gene segment is italicized (e.g., Pfu rir1-1).
Full and mini inteins
Inteins can contain a homing endonuclease geneHoming endonuclease
The homing endonucleases are a type of restriction enzymes typically encoded by introns or inteins. They act on the cellular DNA of the cells that synthesize them, in the opposite alleles of the genes that encode them.- Origin and mechanism :...
(HEG) domain in addition to the splicing domains. This domain is responsible for the spread of the intein by cleaving DNA at an intein-free allele
Allele
An allele is one of two or more forms of a gene or a genetic locus . "Allel" is an abbreviation of allelomorph. Sometimes, different alleles can result in different observable phenotypic traits, such as different pigmentation...
on the homologous chromosome
Homologous chromosome
Homologous chromosomes are chromosome pairs of approximately the same length, centromere position, and staining pattern, with genes for the same characteristics at corresponding loci. One homologous chromosome is inherited from the organism's mother; the other from the organism's father...
, triggering the DNA double-stranded break repair (DSBR) system, which then repairs the break, thus copying the intein-coding DNA into a previously intein-free site. The HEG domain is not necessary for intein splicing, and so it can be lost, forming a minimal, or mini, intein. Several studies have demonstrated the modular nature of inteins by adding or removing HEG domains and determining the activity of the new construct.
Split inteins
Sometimes, the intein of the precursor protein comes from two genes. In this case, the intein is said to be a split intein. For example, in cyanobacteria, DnaEDnaE
DnaE, the gene product of dnaE, is the catalytic α subunit of DNA polymerase III....
, the catalytic subunit α of DNA polymerase III
DNA polymerase III holoenzyme
DNA polymerase III holoenzyme is the primary enzyme complex involved in prokaryotic DNA replication. It was discovered by Thomas Kornberg and Malcolm Gefter in 1970. The complex has high processivity DNA polymerase III holoenzyme is the primary enzyme complex involved in prokaryotic DNA...
, is encoded by two separate genes, dnaE-n and dnaE-c. The dnaE-n product
Gene product
A gene product is the biochemical material, either RNA or protein, resulting from expression of a gene. A measurement of the amount of gene product is sometimes used to infer how active a gene is. Abnormal amounts of gene product can be correlated with disease-causing alleles, such as the...
consists of an N-extein sequence followed by a 123-AA intein sequence, whereas the dnaE-c product consists of a 36-AA intein sequence followed by a C-extein sequence.