CASP
Encyclopedia
CASP, which stands for Critical Assessment of Techniques for Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction
taking place every two years since 1994. CASP provides research groups with an opportunity to objectively test their structure prediction methods and delivers an independent assessment of the state of the art in protein structure modeling to the research community and software users. Even though the primary goal of CASP is to help advance the methods of identifying protein
three-dimensional structure from its amino acid sequence, many view the experiment more as a “world championship” in this field of science. More than 100 research groups from all over the world participate in CASP on a regular basis and it is not uncommon for the entire groups to suspend their other research for months while they focus on getting their servers ready for the experiment and on performing the detailed predictions.
centers) and are kept on hold by the Protein Data Bank
. If the given sequence is found to be related by common descent to a protein sequence of known structure (called a template), comparative protein modeling
may be used to predict the tertiary structure
. Templates can be found using sequence alignment
methods such as BLAST
or FASTA
or protein threading methods, which are better in finding distantly related templates. Otherwise, de novo protein structure prediction must be applied, which is much less reliable but can sometimes yield models with the correct fold. Truly new folds are becoming quite rare among the targets, making that category smaller than desirable.
positions with those in the target structure. The comparison is shown visually by cumulative plots of distances between pairs of equivalents α-carbon
in the alignment of the model and the structure, such as shown in the figure (a perfect model would stay at zero all the way across), and is assigned a numerical score GDT-TS (Global Distance Test — Total Score) describing percentage of well-modeled residues in the model with respect to the target. Free modeling (template-free, or de novo) is also evaluated visually by the assessors, since the numerical scores do not work as well for finding loose resemblances in the most difficult cases. High-accuracy template-based predictions were evaluated in CASP7 by whether they worked for molecular-replacement phasing of the target crystal structure with successes followed up later, and by full-model (not just α-carbon
) model quality and full-model match to the target in CASP8.
Evaluation of the results is carried out in the following prediction categories:
Tertiary structure prediction category was further subdivided into
Starting with CASP7, categories have been redefined to reflect developments in methods. The 'Template based modeling' category includes all former comparative modeling, homologous fold based models and some analogous fold based models. The 'Template free modeling' category includes models of proteins with previously unseen folds and hard analogous fold based models.
The CASP results are published in special supplement issues of the scientific journal Proteins, all of which are accessible through the CASP website. A lead article in each of these supplements describes specifics of the experiment
while a closing article evaluates progress in the field.
Automated assessments for CASP8 (2008)
Automated assessments for CASP7 (2006)
Protein structure prediction
Protein structure prediction is the prediction of the three-dimensional structure of a protein from its amino acid sequence — that is, the prediction of its secondary, tertiary, and quaternary structure from its primary structure. Structure prediction is fundamentally different from the inverse...
taking place every two years since 1994. CASP provides research groups with an opportunity to objectively test their structure prediction methods and delivers an independent assessment of the state of the art in protein structure modeling to the research community and software users. Even though the primary goal of CASP is to help advance the methods of identifying protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
three-dimensional structure from its amino acid sequence, many view the experiment more as a “world championship” in this field of science. More than 100 research groups from all over the world participate in CASP on a regular basis and it is not uncommon for the entire groups to suspend their other research for months while they focus on getting their servers ready for the experiment and on performing the detailed predictions.
Selection of target proteins
In order to ensure that no predictor can have prior information about a protein's structure that would put him/her at an advantage, it is important that the experiment is conducted in a double-blind fashion: Neither predictors nor the organizers and assessors know the structures of the target proteins at the time when predictions are made. Targets for structure prediction are either structures soon-to-be solved by X-ray crystallography or NMR spectroscopy, or structures that have just been solved (mainly by one of the structural genomicsStructural genomics
Structural genomics seeks to describe the 3-dimensional structure of every protein encoded by a given genome. This genome-based approach allows for a high-throughput method of structure determination by a combination of experimental and modeling approaches...
centers) and are kept on hold by the Protein Data Bank
Protein Data Bank
The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....
. If the given sequence is found to be related by common descent to a protein sequence of known structure (called a template), comparative protein modeling
Homology modeling
Homology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein...
may be used to predict the tertiary structure
Tertiary structure
In biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...
. Templates can be found using sequence alignment
Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...
methods such as BLAST
BLAST
In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences...
or FASTA
FASTA
FASTA is a DNA and protein sequence alignment software package first described by David J. Lipman and William R. Pearson in 1985. Its legacy is the FASTA format which is now ubiquitous in bioinformatics.- History :...
or protein threading methods, which are better in finding distantly related templates. Otherwise, de novo protein structure prediction must be applied, which is much less reliable but can sometimes yield models with the correct fold. Truly new folds are becoming quite rare among the targets, making that category smaller than desirable.
Evaluation
The primary method of evaluation is a comparison of the predicted model α-carbonAlpha carbon
The alpha carbon in organic chemistry refers to the first carbon that attaches to a functional group . By extension, the second carbon is the beta carbon, and so on....
positions with those in the target structure. The comparison is shown visually by cumulative plots of distances between pairs of equivalents α-carbon
Alpha carbon
The alpha carbon in organic chemistry refers to the first carbon that attaches to a functional group . By extension, the second carbon is the beta carbon, and so on....
in the alignment of the model and the structure, such as shown in the figure (a perfect model would stay at zero all the way across), and is assigned a numerical score GDT-TS (Global Distance Test — Total Score) describing percentage of well-modeled residues in the model with respect to the target. Free modeling (template-free, or de novo) is also evaluated visually by the assessors, since the numerical scores do not work as well for finding loose resemblances in the most difficult cases. High-accuracy template-based predictions were evaluated in CASP7 by whether they worked for molecular-replacement phasing of the target crystal structure with successes followed up later, and by full-model (not just α-carbon
Alpha carbon
The alpha carbon in organic chemistry refers to the first carbon that attaches to a functional group . By extension, the second carbon is the beta carbon, and so on....
) model quality and full-model match to the target in CASP8.
Evaluation of the results is carried out in the following prediction categories:
- tertiary structureTertiary structureIn biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...
prediction (all CASPs) - secondary structure predictionSecondary structure predictionSecondary structure prediction is a set of techniques in bioinformatics that aim to predict the secondary structures of proteins and nucleic acid sequences based only on knowledge of their primary structure...
(dropped after CASP5) - prediction of structure complexesProtein complexA multiprotein complex is a group of two or more associated polypeptide chains. If the different polypeptide chains contain different protein domain, the resulting multiprotein complex can have multiple catalytic functions...
(CASP2 only; a separate experiment — CAPRICritical Assessment of Prediction of InteractionsCritical Assessment of Prediction of Interactions is a community-wide experiment in modelling the molecular structure of protein complexes, otherwise known as protein–protein docking....
— carries on this subject) - residue-residue contact prediction (starting CASP4)
- disordered regions prediction (starting CASP5)
- domainProtein domainA protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural...
boundary prediction (CASP6–CASP8) - functionFunction (biology)A function is part of an answer to a question about why some object or process occurred in a system that evolved through a process of selection. Thus, function refers forward from the object or process, along some chain of causation, to the goal or success...
prediction (starting CASP6) - model quality assessment (starting CASP7)
- model refinement (starting CASP7)
- high-accuracy template-based prediction (starting CASP7)
Tertiary structure prediction category was further subdivided into
- homology modelingHomology modelingHomology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein...
- fold recognition (also called protein threading; Note, this is incorrect as threading is a method)
- de novo structure prediction, now referred to as 'New Fold' as many methods apply evaluation, or scoring, functions that are biased by knowledge of native protein structures, such as an artificial neural network.
Starting with CASP7, categories have been redefined to reflect developments in methods. The 'Template based modeling' category includes all former comparative modeling, homologous fold based models and some analogous fold based models. The 'Template free modeling' category includes models of proteins with previously unseen folds and hard analogous fold based models.
The CASP results are published in special supplement issues of the scientific journal Proteins, all of which are accessible through the CASP website. A lead article in each of these supplements describes specifics of the experiment
while a closing article evaluates progress in the field.
Result Ranking
Automated assessments for CASP9 (2010)- Official ranking for servers only (147 targets)
- Official ranking for humans and servers (78 targets)
- Ranking by Grishin Lab (for server only)
- Ranking by Grishin Lab (for human and servers)
- Ranking by Zhang Lab
- Ranking by Cheng Lab
Automated assessments for CASP8 (2008)
- Official ranking for servers only
- Official ranking for humans and servers
- Ranking by Zhang Lab
- Ranking by Grishin Lab
- Ranking McGuffin Lab
- Ranking by Cheng Lab
Automated assessments for CASP7 (2006)
See also
- Critical Assessment of Prediction of InteractionsCritical Assessment of Prediction of InteractionsCritical Assessment of Prediction of Interactions is a community-wide experiment in modelling the molecular structure of protein complexes, otherwise known as protein–protein docking....
(CAPRI) - Protein structure predictionProtein structure predictionProtein structure prediction is the prediction of the three-dimensional structure of a protein from its amino acid sequence — that is, the prediction of its secondary, tertiary, and quaternary structure from its primary structure. Structure prediction is fundamentally different from the inverse...