CASP15 (2022) showed enormous progress in modeling multimolecular protein complexes.
The assembly modeling (a.k.a. quaternary structure modeling, oligomeric modeling, multimeric modeling) has been assessed in CASP since 2016 (CASP12). Typically, models were of good accuracy when templates were available for the structure of the whole target complex. After the success of AlphaFold2 in CASP14 (2020), it was expected that deep learning methodology that brought monomeric modeling to qualitatively new level will be extended to multimeric modeling. Indeed, CASP15 showed that newly developed methods are capable of accurate reproducing structures of oligomeric complexes and outperform CASP14 methods by a large margin. In particular, the accuracy of models almost doubled in terms of the Interface Contact Score (ICS a.k.a. F1) and increased by 1/3 in terms of the overall fold similarity score LDDTo (left panel). An impressive example of multimeric modeling is shown in the right panel below.
CASP15: T1113o
model 239_2: F1=92.2; LDDTo=0.913
template-based modeling
Models based on templates identified by sequence similarity remain the most accurate. Over the course of the CASP experiments there have been enormous improvements in this area. However, the overall accuracy improvements that we have seen in the first 10 years of CASP remained unmatched until CASP12 (2016), when a new burst of progress happened [Kryshtafovych et al, 2018]. In two years from 2014 to 2016, the backbone accuracy of the submitted models improved more than in the preceeding 10 years. The next CASP continued the trend [Croll et al, 2019], and the 2014-2018 model accuracy improvement doubled that of 2004-2014 (see left plot). Several factors contributed to this, including more accurate alignment of the target sequence to that of available templates, combining multiple templates, improved accuracy of regions not covered by templates, successful refinement of models, and better selection of models from decoy sets due to improved methods for estimation of model accuracy.
CASP14 marked an extraordinary increase in the accuracy of the computed three-dimensional protein structures with the emergence of the advanced deep learning method AlphaFold2. Models built with this method proved to be competitive with the experimental accuracy (GDT_TS>90) for ~2/3 of the targets and of high accuracy (GDT_TS>80) for almost 90% of the targets (middle plot). The accuracy of CASP14 models for TBM targets significally superseeded accuracy of models that can be built by simple transcription of information from templates, and reached the level of GDT_TS=92 on average, which is significantly higher than the corresponding averages in previous two CASPs (right plot).
ab initio modeling
Modeling proteins with no or marginal similarity to existing structures (ab initio, new fold, non-template or free modeling) is the most challenging task in tertiary structure prediction. Probably the first ab initio model of reasonable accuracy was built in CASP4. Since then CASP witnessed sustained progress in ab initio prediction, but mainly for small proteins (120 residues or less, panel 1; model is in blue, target in orange). In CASP11 for the first time a larger new fold protein (256 residues, sequence identity to known structures <5%) was built with unpresedented before accuracy for targets of this size. CASP11 and CASP12 experiments (2014, 2016) also showed a new trend in building better non-template models by successful utilizing predicted contacts (panel 3) [Abriata et al, 2018]. CASP13 witnessed yet another substantial improvement in accuracy of template-free models mainly due to employing advanced deep learning artificial intelligence techniques coupled with prediction of inter-residue distances at a range of thresholds [Senior et al, 2019], [Xu and Wang, 2019]. The best models submitted on the free modeling targets showed more than 20% increase in accuracy of the backbone, with the average GDT_TS scores going up from 52.9 in CASP12 to 65.7 in CASP13.
CASP14 marked an extraordinary increase in the accuracy of the computed three-dimensional protein structures with the emergence of the advanced deep learning method AlphaFold2. The CASP14 trend line in the historical progress plot (panel 2, black trendline) starts at a GDT_TS of about 95, and finishes at about 85 for difficult targets. Because of experimental errors and artifacts, a GDT_TS of 100 is highly unlikely. In CASP14, about 2/3 of the 96 targets reached GDT_TS values greater than that, and so are considered competitive with experiment in backbone accuracy.
CASP7: T0283-D1
model 321_1: GDT_TS=75
CASP12: T0866-D1
model 325_5: GDT_TS=81
contact prediction
The most notable progress in recent CASPs (2014, 2016) resulted from sustained improvement in methods for predicting three-dimensional contacts between pairs of residues in structures. Average precision of the best CASP12 contact predictor almost doubled compared to that of the best CASP11 predictor (from 27% to 47% - see the plot). Advances in the field as a whole are not any less impressive: 26 methods in CASP12 showed better results than the best method in CASP11. [Schaarschmidt et al, 2018]
Theoretical advance in contact prediction lead to improved accuracy of 3D models, especially for the hardest template-free modeling cases (see models for CASP12 target T0915 below).
CASP13 (2018) registered yet another leap in accuracy of contact prediction, with the average precision of the best contact prediction group increasing by 23% (compared to CASP12) and reaching 70%. There has been no noticeable increase in the accuracy of predicted contacts between CASP13 and CASP14 (left graph).
modeling without constraints
modeling using predicted contacts as constraints
predictors help structural biologists
In early CASPs, generated models have occasionally helped solve structures. For example,
the crystal structure of Sla2 ANTH domain of Chaetomium thermophilum (CASP11 target
T0839 - see the image below) was determined by molecular replacement using CASP models, but these have been
exceptions.
In CASP14, four structures were solved with the aid of AlphaFold2 models. A post-
CASP analysis has shown that models from other groups would also have been effective in
some cases. These are all hard targets with limited or no homology information available for at least
some domains, demonstrating the power of the new methods for all classes of modeling
difficulty. For one other target, provision of the models resulted in correction of a local
experimental error. A detailed account of these cases is provided in the Proteins paper
[Kryshtafovych et al, 2021]
T0839-D1
model: TS184_1 (GDT_TS: 62.8)
refinement
Refinement category assesses ability of methods to refine available models towards a more accurate representation of the experimental structure. CASP10-14 assessments showed two trends in methods development. First, some molecular dynamics methods can consistently even though very modestly improve over the starting models. A group of more aggressive refinement methods showed to be able to provide very impressive examples of substantial improvement, though at the price of consistency (occasionally models move away from the experimental structure rather than towards it).
Below is are some examples of notable refinement in CASP12. The target structure is shown in orange, the starting model in green and the refined model in blue.
[Hovan et al, 2018]
target TR884; model 118_1
starting GDT_TS=66
refined GDT_TS=76
target TR894; model 118_5
starting GDT_TS=75
refined GDT_TS=96
target TR896; model 220_1
starting GDT_TS=61
refined GDT_TS=77
data-assisted modeling
Data-assisted or hybrid modeling, in which low-resolution experimental
data are combined with computational methods, is becoming increasing
important for a range of experimental data, including NMR, chemical
cross-linking and surface labeling, X-ray and neutron scattering,
electron microscopy and FRET. CASP11-CASP13 experiments included a special
sub-category of modeling proteins using such data (CASP14 did not include
data-assisted category due to the COVID-19-associated difficulties in obtaining
experimental data).
Description of the CASP12 data-assisted experiment and the data is provided in
[Ogorzalek et al, 2018]
Examples of a non-assisted model and a cross-linking assisted model from the same predictor (CASP12 group 220) are shown below demonstrating increased accuracy of the assisted prediction.
target T0894
original model 220_1
GDT_TS=24
target Tx894
X-linking -assisted model 220_1
GDT_TS=52
Welcome to the Protein Structure Prediction Center!
Our goal is to help advance the methods of identifying protein
structure from sequence. The Center has been organized to provide the
means of objective testing of these methods via the process of blind
prediction. The Critical Assessment of protein Structure Prediction (CASP)
experiments aim at establishing the current state of the art in
protein structure prediction, identifying what progress has been made,
and highlighting where future effort may be most productively focused.
There have been fifteen previous CASP experiments.
The sixteenth experiment is planned to start in May 2024.
Description of these
experiments and the full data (targets, predictions, interactive tables with
numerical evaluation results, dynamic graphs and prediction visualization tools)
can be accessed following the links:
Raw data for the experiments held so far are archived and stored in our
data archive.
Details of the experiments have been published in a scientific journal
Proteins: Structure, Function and Bioinformatics.
CASP proceedings include papers describing
the structure and conduct of the experiments,
the numerical evaluation measures,
reports from the assessment teams highlighting state of the art in different prediction categories,
methods from some of the most successful prediction teams,
and progress in various aspects of the modeling.
Prediction methods are assessed on the basis of the analysis of a large
number of blind predictions of protein structure. Summary of numerical
evaluation of the tertiary structure prediction methods tested in the
latest CASP experiment can be found
on this web page.
The main numerical measures used in evaluations, data handling procedures,
and guidelines for navigating the data presented on this website
are described in
[1] .
Some of the best performing methods are implemented as
fully automated servers
and therefore can be used by public for protein structure modeling.
To proceed to the latest CASP
experiment click on the logo below: