Protein design
Encyclopedia
Protein design is the design of new protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

 molecules, either from scratch or by making calculated variations on a known structure. The use of rational design
Rational design
In chemical biology and biomolecular engineering, rational design is the strategy of creating new molecules with a certain functionality, based upon the ability to predict how the molecule's structure will affect its behavior through physical models...

 techniques for proteins is a major aspect of protein engineering
Protein engineering
Protein engineering is the process of developing useful or valuable proteins. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles....

.

The design of minimalist computer models of proteins (lattice protein
Lattice protein
Lattice proteins are highly simplified computer models of proteins which are used to investigate protein folding.Because proteins are such large molecules, there are severe computational limits on the simulated timescales of their behaviour when modeled in all-atom detail...

s), and the secondary structural modification of real proteins, began in the mid-1990s. The de novo design of real proteins became possible shortly afterwards, and in the 21st century it has become a productive field of research. There is great hope that the design of new proteins, small and large, will have applications in medicine
Biomedicine
Biomedicine is a branch of medical science that applies biological and other natural-science principles to clinical practice,. Biomedicine, i.e. medical research, involves the study of physiological processes with methods from biology, chemistry and physics. Approaches range from understanding...

 and bioengineering (see examples below).

Overview

The number of possible amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

 sequences is enormous, but only a subset of them will fold
Protein folding
Protein folding is the process by which a protein structure assumes its functional shape or conformation. It is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil....

 reliably and quickly to a single native state
Native state
In biochemistry, the native state of a protein is its operative or functional form. While all protein molecules begin as simple unbranched chains of amino acids, once completed they assume highly specific three-dimensional shapes; that ultimate shape, known as tertiary structure, is the folded...

. Protein design involves identifying novel sequences within this subset, in particular those with a physiologically active native state. Physically, the native state of a protein is the conformational free energy
Thermodynamic free energy
The thermodynamic free energy is the amount of work that a thermodynamic system can perform. The concept is useful in the thermodynamics of chemical or thermal processes in engineering and science. The free energy is the internal energy of a system less the amount of energy that cannot be used to...

 minimum for the chain. Therefore protein design is the search for sequences which have the chosen structure as a free energy minimum. In a sense it is the reverse of structure prediction
Protein structure prediction
Protein structure prediction is the prediction of the three-dimensional structure of a protein from its amino acid sequence — that is, the prediction of its secondary, tertiary, and quaternary structure from its primary structure. Structure prediction is fundamentally different from the inverse...

: in design, a tertiary structure is specified, and a sequence is identified which will fold to it. Hence it is also referred to as inverse folding.

Protein design requires an understanding of the molecular interactions that stabilize proteins in specific folded configurations; experience has shown, however, that it does not require an understanding of the dynamical process by which proteins fold.

Prion
Prion
A prion is an infectious agent composed of protein in a misfolded form. This is in contrast to all other known infectious agents which must contain nucleic acids . The word prion, coined in 1982 by Stanley B. Prusiner, is a portmanteau derived from the words protein and infection...

 diseases like mad-cow disease
Bovine spongiform encephalopathy
Bovine spongiform encephalopathy , commonly known as mad-cow disease, is a fatal neurodegenerative disease in cattle that causes a spongy degeneration in the brain and spinal cord. BSE has a long incubation period, about 30 months to 8 years, usually affecting adult cattle at a peak age onset of...

 illustrate how important it is that designer proteins possess only one stable conformation. In mad-cow disease, there exists a healthy protein with a fatal weakness: there is another conformation that it can "comfortably" take; the abnormally folded shape has very little free energy and is therefore very stable. For reasons that are not yet fully understood, this mis-folded prion protein can catalyze
Catalysis
Catalysis is the change in rate of a chemical reaction due to the participation of a substance called a catalyst. Unlike other reagents that participate in the chemical reaction, a catalyst is not consumed by the reaction itself. A catalyst may participate in multiple chemical transformations....

 other proteins of its type to also adopt the mis-folded shape, causing a disease-generating cascade of previously functional proteins to quickly mis-fold. They lose the ability to perform their intended function in the new conformation, and have a tendency to form aggregates called plaques
Senile plaques
Senile plaques are extracellular deposits of amyloid in the gray matter of the brain. The deposits are associated with degenerative neural structures and an abundance of microglia and astrocytes...

. The buildup of these aggregates in the brain leads to progressive neuronal death, and eventually death of the entire organism. It is therefore easy to see the importance both that a designer protein have only one possible stable tertiary structure, and that researchers exercise extreme diligence to ensure that this remains the case in all environments – especially in vivo
In vivo
In vivo is experimentation using a whole, living organism as opposed to a partial or dead organism, or an in vitro controlled environment. Animal testing and clinical trials are two forms of in vivo research...

.

Examples of designed proteins

The early 21st century saw the creation of small proteins with real biological functions including chiroselective
Chirality (chemistry)
A chiral molecule is a type of molecule that lacks an internal plane of symmetry and thus has a non-superimposable mirror image. The feature that is most often the cause of chirality in molecules is the presence of an asymmetric carbon atom....

 catalysis, ion detection, and antiviral
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...

 behaviour. Using computational methods, a protein with a novel fold (Top7
Top7
Top7 is an artificial 93-residue protein, classified as a de novo protein since it was designed by Brian Kuhlman and Gautam Dantas in David Baker's laboratory at the University of Washington to have a unique fold not found in nature. The protein was designed ab initio on a computer with the help of...

) was designed in 2003, as well as sensors for unnatural molecules. Recent computational redesign was capable of experimentally switching the cofactor
Cofactor
Cofactor may refer to any of the following:* Cofactor , the signed minor of a matrix* Minor , an alternative name for the determinant of a smaller matrix than that which it describes...

 specificity of Candida boidinii xylose reductase from NADPH
Nicotinamide adenine dinucleotide phosphate
Nicotinamide adenine dinucleotide phosphate, abbreviated NADP or TPN in older notation , is a coenzyme used in anabolic reactions, such as lipid and nucleic acid synthesis, which require NADPH as a reducing agent....

 to NADH
Nicotinamide adenine dinucleotide
Nicotinamide adenine dinucleotide, abbreviated NAD, is a coenzyme found in all living cells. The compound is a dinucleotide, since it consists of two nucleotides joined through their phosphate groups. One nucleotide contains an adenine base and the other nicotinamide.In metabolism, NAD is involved...

.

On the other hand, it is widely believed that not all possible protein structures are designable, which means that there are compact configurations of the chain which no sequences can fold to. In particular, conformations which are poor in secondary structures are unlikely to be designable. The designability of given structures is still poorly understood.

Models of protein structure and function used in protein design

Protein design can be accomplished using computer models
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

, which, while simplifying the problem, are able to generate sequences that fold to the desired structure. Computational protein design algorithms search the sequence-conformation space for sequences that are low in energy when folded to the target structure. This search space is large; currently the most challenging requirement for computational protein design is a fast, yet accurate, energy function that can distinguish optimal sequences from similar suboptimal ones.

Computational protein design algorithms use rotamer libraries and models of protein energetics to evaluate how mutations would affect a protein's structure and function. These energy functions
Force field (chemistry)
In the context of molecular modeling, a force field refers to the form and parameters of mathematical functions used to describe the potential energy of a system of particles . Force field functions and parameter sets are derived from both experimental work and high-level quantum mechanical...

 typically include a combination of molecular mechanics
Molecular mechanics
Molecular mechanics uses Newtonian mechanics to model molecular systems. The potential energy of all systems in molecular mechanics is calculated using force fields...

, statistical
Statistical potential
In protein structure prediction, a statistical potential or knowledge-based potential is an energy function derived from an analysis of known protein structures in the Protein Data Bank....

 (i.e. knowledge-based), and other empirical terms. However, the trend has been towards using more physically based potential energy functions.

Ancestral sequence reconstruction

Ancestral reconstruction
Ancestral Reconstruction
- Trait reconstruction :Ancestral reconstruction is widely use to infer the ecological, phenotypic, or biogeographic traits associated with ancestral nodes in a phylogenetic tree...

 techniques have been used to design proteins with putative ancient functions.

Software

Iterative Protein Redesign and Optimization. IPRO redesigns proteins to increase or give specificity to native or novel substrates and cofactors. This is done by repeatedly randomly perturbing the backbones of the proteins around specified design positions, identifying the lowest energy combination of rotamers, and determining if the new design has a lower binding energy than previous ones. The iterative nature of this process allows IPRO to make additive mutations to the protein sequence that collectively improve the specificity towards the desired substrates and/or cofactors. Experimental testing of predictions by IPRO successfully switched the cofactor preference of Candida boidinii xylose reductase from NADPH to NADH.

EGAD: A Genetic Algorithm for protein Design. A free, open-source software package for protein design and prediction of mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...

 effects on protein folding stabilities and binding affinities. EGAD can also consider multiple structures simultaneously for designing specific binding proteins or locking proteins into specific conformational states. In addition to natural protein residues, EGAD can also consider free-moving ligands with or without rotatable bonds. EGAD can be used with single or multiple processors.

RosettaDesign. A software package, under active development and free for academic use, that has seen extensive successful use. RosettaDesign is accessible via a web server.

SHARPEN. A permissive open-source library for protein design and structure prediction. SHARPEN offers a variety of combinatorial optimization methods (e.g. Monte Carlo, Simulated Annealing, FASTER) and can score proteins using the successful Rosetta all-atom force field
Force field (chemistry)
In the context of molecular modeling, a force field refers to the form and parameters of mathematical functions used to describe the potential energy of a system of particles . Force field functions and parameter sets are derived from both experimental work and high-level quantum mechanical...

 or molecular mechanics force fields (OPLS
OPLS
The OPLS force field was developed by Prof. William L. Jorgensen at Purdue University and later at Yale University.-Functional form:The functional form of the OPLS force field is very similar to that of AMBER:...

aa). In addition to the protein modeling library, SHARPEN includes tools for scalable distributed computing.

WHAT IF software
WHAT IF software
WHAT IF is a computer program used in a wide variety of in silico macromolecular structure research fields such as:* Homology models of protein tertiary structures as well as quaternary structures,...

for protein modelling, design, validation, and visualisation.

Abalone software for protein modelling and visualisation.

See also

  • Ancestral reconstruction
    Ancestral Reconstruction
    - Trait reconstruction :Ancestral reconstruction is widely use to infer the ecological, phenotypic, or biogeographic traits associated with ancestral nodes in a phylogenetic tree...

  • Molecular design software
    Molecular Design software
    Molecular design software is software for molecular modeling, that provides special support for developing molecular models de novo.In contrast to the usual molecular modeling programs such as the molecular dynamics and quantum chemistry programs, such software directly supports the aspects related...

  • PEGylation
    PEGylation
    PEGylation is the process of covalent attachment of polyethylene glycol polymer chains to another molecule, normally a drug or therapeutic protein. PEGylation is routinely achieved by incubation of a reactive derivative of PEG with the target macromolecule...

  • Protein engineering
    Protein engineering
    Protein engineering is the process of developing useful or valuable proteins. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles....

  • Protein structure prediction software
  • Software for molecular modeling
  • Meganucleases
    Meganucleases
    Meganucleases are endodeoxyribonucleases characterized by a large recognition site ; as a result this site generally occurs only once in any given genome. For example, the 18-base pair sequence recognized by the I-SceI meganuclease would on average require a genome twenty times the size of the...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK