Intrinsically unstructured proteins
Encyclopedia
Intrinsically unstructured proteins, often referred to as naturally unfolded proteins or disordered proteins, are proteins characterized by lack of stable tertiary structure
when the protein exists as an isolated polypeptide chain (a subunit
) under physiological conditions in vitro
. The discovery of intrinsically unfolded proteins challenged the traditional protein structure paradigm
, which states that a specific well-defined structure
was required for the correct function of a protein and that the structure defines the function of the protein. This is clearly not the case for intrinsically unfolded proteins that remain functional despite the lack of a well-defined structure. Such proteins, in some cases, can adopt a fixed three dimensional structure after binding to other macromolecules.
sequence, but are similar in amino acid composition (rich in polar uncharged amino acids). Flexible linkers allow the connecting domains to freely twist and rotate through space to recruit their binding partners or for those binding partners to induce larger scale interdomain conformation changes.
The ability of disordered proteins to bind, and thus to exert a function, shows that stability is not a required condition. Many short functional sites, for example Short Linear Motifs
are over-represented in disordered proteins.
Many disordered proteins also reveal low complexity sequences, i.e. sequences with overrepresentation of a few residue
s. While low complexity sequences are a strong indication of disorder, the reverse is not necessarily true, that is, not all disordered proteins have low complexity sequences. Disordered proteins have a low content of predicted secondary structure
.
. Hence, unfolded proteins can be detected by methods that are sensitive to molecular size, density or hydrodynamic drag, such as size exclusion chromatography
, analytical ultracentrifugation, Small angle X-ray scattering (SAXS), and measurements of the diffusion constant. Unfolded proteins are also characterized by their lack of secondary structure
, as assessed by far-UV (170-250 nm) circular dichroism
(esp. a pronounced minimum at ~200 nm) or infrared
spectroscopy.
Unfolded proteins have exposed backbone peptide
groups exposed to solvent, so that they are readily cleaved by protease
s, undergo rapid hydrogen-deuterium exchange
and exhibit a small dispersion (<1 ppm) in their 1H amide chemical shift
s as measured by NMR. (Folded proteins typically show dispersions as large as 5 ppm for the amide protons.)
The primary method to obtain information on disordered regions of a protein is NMR spectroscopy. The lack of electron density in X-ray crystallographic
studies may also be a sign of disorder.
. The aggregation of the intrinsically unstructured protein α-Synuclein is thought to be responsible. The structural flexibility of this protein together with its susceptibility to modification in the cell leads to misfolded and aggregation.
Many key oncogenes have large intrinsically unstructured regions, for example p53 and BRCA1. These regions of the proteins are responsible for mediating many of their interactions.
Since the methods above use different definitions of disorder and they were trained on different datasets, it is difficult to estimate their relative accuracy, but disorder prediction category is a part of biannual CASP
experiment that is designed to test methods according accuracy in finding regions with missing 3D structure.
Tertiary structure
In biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...
when the protein exists as an isolated polypeptide chain (a subunit
Protein subunit
In structural biology, a protein subunit or subunit protein is a single protein molecule that assembles with other protein molecules to form a protein complex: a multimeric or oligomeric protein. Many naturally occurring proteins and enzymes are multimeric...
) under physiological conditions in vitro
In vitro
In vitro refers to studies in experimental biology that are conducted using components of an organism that have been isolated from their usual biological context in order to permit a more detailed or more convenient analysis than can be done with whole organisms. Colloquially, these experiments...
. The discovery of intrinsically unfolded proteins challenged the traditional protein structure paradigm
Paradigm
The word paradigm has been used in science to describe distinct concepts. It comes from Greek "παράδειγμα" , "pattern, example, sample" from the verb "παραδείκνυμι" , "exhibit, represent, expose" and that from "παρά" , "beside, beyond" + "δείκνυμι" , "to show, to point out".The original Greek...
, which states that a specific well-defined structure
Protein structure
Proteins are an important class of biological macromolecules present in all organisms. Proteins are polymers of amino acids. Classified by their physical size, proteins are nanoparticles . Each protein polymer – also known as a polypeptide – consists of a sequence formed from 20 possible L-α-amino...
was required for the correct function of a protein and that the structure defines the function of the protein. This is clearly not the case for intrinsically unfolded proteins that remain functional despite the lack of a well-defined structure. Such proteins, in some cases, can adopt a fixed three dimensional structure after binding to other macromolecules.
Biological role of intrinsic disorder
Many disordered proteins have the binding affinity with their receptors regulated by post-translational modification, thus it has been proposed that the flexibility of disordered proteins facilitates the different conformational requirements for binding the modifying enzymes as well as their receptors. Intrinsic disorder is particularly enriched in proteins implicated in cell signaling, transcription and chromatin remodeling functions.Flexible linkers
Disordered regions are often found as flexible linkers (or loops) connecting two globular or transmembrane domains. Linker sequences vary greatly in length and amino acidAmino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...
sequence, but are similar in amino acid composition (rich in polar uncharged amino acids). Flexible linkers allow the connecting domains to freely twist and rotate through space to recruit their binding partners or for those binding partners to induce larger scale interdomain conformation changes.
Coupled folding and binding
Many unstructured proteins undergo transitions to more ordered states upon binding to their targets. The coupled folding and binding may be local, involving only a few interacting residues, or it might involve an entire protein domain. It was recently shown that the coupled folding and binding allows the burial of a large surface area that would be possible only for fully structured proteins if they were much larger. Moreover, certain disordered regions might serve as "molecular switches" in regulating certain biological function by switching to ordered conformation upon molecular recognition like small molecule-binding, DNA/RNA binding, ion interactions etc..The ability of disordered proteins to bind, and thus to exert a function, shows that stability is not a required condition. Many short functional sites, for example Short Linear Motifs
Short linear motif
In molecular biology Short Linear Motifs are short stretches of protein sequence that mediate protein protein interaction.The first definition was given by Tim Hunt:...
are over-represented in disordered proteins.
Sequence signatures of disorder
Intrinsically unstructured proteins are characterized by a low content of bulky hydrophobic amino acids and a high proportion of polar and charged amino acids. Thus disordered sequences cannot bury sufficient hydrophobic core to fold like stable globular proteins. In some cases, hydrophobic clusters in disordered sequences provide the clues for identifying the regions that undergo coupled folding and binding. Such signatures are the basis of the prediction methods below.Many disordered proteins also reveal low complexity sequences, i.e. sequences with overrepresentation of a few residue
Residue (chemistry)
In chemistry, residue is the material remaining after a distillation or an evaporation, or to a portion of a larger molecule, such as a methyl group. It may also refer to the undesired byproducts of a reaction....
s. While low complexity sequences are a strong indication of disorder, the reverse is not necessarily true, that is, not all disordered proteins have low complexity sequences. Disordered proteins have a low content of predicted secondary structure
Secondary structure
In biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...
.
Identification of intrinsically unstructured proteins
Intrinsically unfolded proteins, once purified, can be identified by various experimental methods. Folded proteins have a high density (partial specific volume of 0.72-0.74 mL/g) and commensurately small radius of gyrationRadius of gyration
Radius of gyration or gyradius is the name of several related measures of the size of an object, a surface, or an ensemble of points. It is calculated as the root mean square distance of the objects' parts from either its center of gravity or an axis....
. Hence, unfolded proteins can be detected by methods that are sensitive to molecular size, density or hydrodynamic drag, such as size exclusion chromatography
Size exclusion chromatography
Size-exclusion chromatography is a chromatographic method in which molecules in solution are separated by their size, and in some cases molecular weight . It is usually applied to large molecules or macromolecular complexes such as proteins and industrial polymers...
, analytical ultracentrifugation, Small angle X-ray scattering (SAXS), and measurements of the diffusion constant. Unfolded proteins are also characterized by their lack of secondary structure
Secondary structure
In biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...
, as assessed by far-UV (170-250 nm) circular dichroism
Circular dichroism
Circular dichroism refers to the differential absorption of left and right circularly polarized light. This phenomenon was discovered by Jean-Baptiste Biot, Augustin Fresnel, and Aimé Cotton in the first half of the 19th century. It is exhibited in the absorption bands of optically active chiral...
(esp. a pronounced minimum at ~200 nm) or infrared
Infrared
Infrared light is electromagnetic radiation with a wavelength longer than that of visible light, measured from the nominal edge of visible red light at 0.74 micrometres , and extending conventionally to 300 µm...
spectroscopy.
Unfolded proteins have exposed backbone peptide
Peptide
Peptides are short polymers of amino acid monomers linked by peptide bonds. They are distinguished from proteins on the basis of size, typically containing less than 50 monomer units. The shortest peptides are dipeptides, consisting of two amino acids joined by a single peptide bond...
groups exposed to solvent, so that they are readily cleaved by protease
Protease
A protease is any enzyme that conducts proteolysis, that is, begins protein catabolism by hydrolysis of the peptide bonds that link amino acids together in the polypeptide chain forming the protein....
s, undergo rapid hydrogen-deuterium exchange
Hydrogen-deuterium exchange
Hydrogen–deuterium exchange is a chemical reaction in which a covalently bonded hydrogen atom is replaced by a deuterium atom, or vice versa. Usually the examined protons are the amides in the backbone of a protein. The method gives information about the solvent accessibility of various parts of...
and exhibit a small dispersion (<1 ppm) in their 1H amide chemical shift
Chemical shift
In nuclear magnetic resonance spectroscopy, the chemical shift is the resonant frequency of a nucleus relative to a standard. Often the position and number of chemical shifts are diagnostic of the structure of a molecule...
s as measured by NMR. (Folded proteins typically show dispersions as large as 5 ppm for the amide protons.)
The primary method to obtain information on disordered regions of a protein is NMR spectroscopy. The lack of electron density in X-ray crystallographic
X-ray crystallography
X-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and causes the beam of light to spread into many specific directions. From the angles and intensities of these diffracted beams, a crystallographer can produce a...
studies may also be a sign of disorder.
Disorder and Disease
Intrinsically unstructured proteins have been implicated in a number of diseases. Aggregation of misfolded proteins is the cause of many synucleinopathiesSynuclein
Synucleins are a family of soluble proteins common to vertebrates, primarily expressed in neural tissue and in certain tumors.- Family members :The synuclein family includes three known proteins: alpha-synuclein, beta-synuclein, and gamma-synuclein...
. The aggregation of the intrinsically unstructured protein α-Synuclein is thought to be responsible. The structural flexibility of this protein together with its susceptibility to modification in the cell leads to misfolded and aggregation.
Many key oncogenes have large intrinsically unstructured regions, for example p53 and BRCA1. These regions of the proteins are responsible for mediating many of their interactions.
De novo prediction of intrinsically unstructured proteins
Computational methods exploit the sequence signatures of disorder to predict whether a protein is disordered given its amino acid sequence. The table below, which was originally adapted from and has been recently updated, shows the main features of software for disorder prediction. Note that different software use different definitions of disorder.Predictor | What is predicted | Based on | Generates and uses multiple sequence alignment? |
---|---|---|---|
PONDR | All regions that are not rigid including random coils, partially unstructured regions, and molten globules | Local aa composition, flexibility, hydropathy, etc | No |
GlobPlot | Regions with high propensity for globularity on the Russell/Linding scale (propensities for secondary structures and random coils) | Russell/Linding scale of disorder | No |
DisEMBL | LOOPS (regions devoid of regular secondary structure); HOT LOOPS (highly mobile loops); REMARK465 (regions lacking electron density in crystal structure) | Neural networks trained on X-ray structure data | No |
SEG | Low-complexity segments that is, “simple sequences” or “compositionally biased regions”. | Locally optimized low-complexity segments are produced at defined levels of stringency and then refined according to the equations of Wootton and Federhen | No |
Disopred2 | Regions devoid of ordered regular secondary structure | Cascaded support vector machine classifiers trained on PSI-BLAST profiles | Yes |
OnD-CRF | The transition between structurally ordered and mobile or disordered amino acids intervals under native conditions. | OnD-CRF applies Conditional Random Fields, CRFs, which rely on features generated from the amino acid sequence and from secondary structure prediction. | No |
NORSp | Regions with No Ordered Regular Secondary Structure (NORS). Most, but not all, are highly flexible. | Secondary structure and solvent accessibility | Yes |
FoldIndex | Regions that have a low hydrophobicity and high net charge (either loops or unstructured regions) | Charge/hydrophaty analyzed locally using a sliding window | No |
Charge/hydropathy method. | Fully unstructured domains (random coils) | Global sequence composition | No |
HCA (Hydrophobic Cluster Analysis) | Hydrophobic clusters, which tend to form secondary structure elements | Helical visualization of amino acid sequence | No |
PreLink | Regions that are expected to be unstructured in all conditions, regardless of the presence of a binding partner | Compositional bias and low hydrophobic cluster content. | No |
IUPred | Regions that lack a well-defined 3D-structure under native conditions | Energy resulting from inter-residue interactions, estimated from local amino acid composition | No |
RONN | Regions that lack a well-defined 3D structure under native conditions | Bio-basis function neural network trained on disordered proteins | No |
MD (Meta-Disorder predictor) | Regions of different "types"; for example, unstructured loops and regions containing few stable intra-chain contacts | A neural-network based meta-predictor that uses different sources of information predominantly obtained from orthogonal approaches | Yes |
GeneSilico Metadisorder | Regions that lack a well-defined 3D structure under native conditions (REMARK-465) | Meta method, which uses other disorder predictors (like RONN, IUPred, POODLE, and many more). Based on them the consensus is calculated according method accuracy (optimized using ANN, filtering and other techniques). Currently the best available method (first 2 places in last CASP CASP CASP, which stands for Critical Assessment of Techniques for Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994... experiment (blind test)) |
Yes |
IUPforest-L | Long disordered regions in a set of proteins | Moreau-Broto auto-correlation function of amino acid indices (AAIs) | No |
MFDp | Different types of disorder including random coils, unstructured regions, molten globules, and REMARK-465-based regions. | An ensemble of 3 SVMs specialized for the prediction of short, long and generic disordered regions, which combines three complementary disorder predictors, sequence, sequence profiles, predicted secondary structure, solvent accessibility, backbone dihedral torsion angles, residue flexibility and B-factors. MFDp (unofficially) secured 3rd place in last CASP CASP CASP, which stands for Critical Assessment of Techniques for Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994... experiment) |
Yes |
Since the methods above use different definitions of disorder and they were trained on different datasets, it is difficult to estimate their relative accuracy, but disorder prediction category is a part of biannual CASP
CASP
CASP, which stands for Critical Assessment of Techniques for Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994...
experiment that is designed to test methods according accuracy in finding regions with missing 3D structure.