I identified a list of template-based modeling cases in which our servers (e.g. MULTICOM-CLUSTER, MULTICOM-NOVEL) encountered significant difficulties during the CASP9 experiment.
Our servers’ failures on these cases may be due to various reasons including ones listed by John. However, I notice that, no matter how hard these cases were, there were almost always a few servers managed to do well on these cases. What a remarkable achievement of the community!
I would appreciate any inputs from the community about how to solve these modeling problems. I also hope that these cases may be applied to other servers and would provide useful materials for the community to improve its methods.
T0532: failed to select the best alignment or model
Our server MULTICOM-NOVEL generated the first two models based on two alternative alignments on the same template. Model_1 has score 0.57, model_2 0.71. But MULTICOM-NOVEL failed to put the best model (model 2) at the top. Several servers such as FAMSD, BioSerf, Phyre2, ProQ2, ProfileCRF, Zhang-Server did well on this target. They probably were able to generate and select good alignments and models. Is it possible to distinguish our two models at the alignment level or at the model quality assessment level?
Model 1:
http://sysbio.rnet.missouri.edu/casp9_h ... 0532_1.pdbAlignment file for model 1:
http://sysbio.rnet.missouri.edu/casp9_h ... 0532_1.pirModel 2:
http://sysbio.rnet.missouri.edu/casp9_h ... 0532_2.pdbAlignment file for model 2:
http://sysbio.rnet.missouri.edu/casp9_h ... 0532_2.pirT0540: Failed to find the best single template
A set of local sequence / profile alignment tools used by our servers failed to identify the best template (2KD2 ?). However, Servers such as HHPredA, RaptorX, and even the aging SAM-T02 got this template right. What might contribute to their success?
T0549: Our server (e.g. MULTICOM-NOVEL) found the best template (2KPM?), but failed to generate a good alignment using a number of tools including psi-blast, hhserach. But a few servers (e.g., Jiang_Assembly, RaptorX, Phyre2, ProfileCRF, SAM-T08, BioSerf) did well. How did these servers successfully generate a good alignment or a model?
Our model 1:
http://sysbio.rnet.missouri.edu/casp9_h ... 0549_1.pdbThe alignment file for model 1:
http://sysbio.rnet.missouri.edu/casp9_h ... 0549_1.pir T0550: Failed to select a good template (2DPK) for the first domain of this target
This target has two domains – a hard template based-domain and an ab initio domain. The locally installed hhsearch alignment tool was able to find the 2DPK template in our own template profile database for the first domain with a high e-value (i.e. 40). However, it generated a very short alignment, which was not selected. My question is how to generate a long, better alignment for this case? Here is the ranking and alignment file:
http://sysbio.rnet.missouri.edu/casp9_h ... /T0550.hhr T0551: got the best template, but failed to generate a good alignment.
The local HHSearch tool identified the potentially best template (1PCF) in our template profile database, but generated a short alignment that only covers the half of the sequence. Thus, our modeling failed miserably. I noticed that some servers HHPred, Raptor, Phyre2, GSMetaServer did very well on this target. I would appreciate any input about how to use HHSearch or other features better on the very remotely homologous templates to generate better alignments in this case?
Our alignment file:
http://sysbio.rnet.missouri.edu/casp9_h ... 0551_1.pirOur model file:
http://sysbio.rnet.missouri.edu/casp9_h ... 0551_1.pdb T0557: our servers (e.g. MULTICOM-CLUSTER) used the best template, but failed to generate a good alignment or to use multiple templates? They identified the best template 3LMM, which is also used by other servers such as QUARK and BAKER-ROSETTASERVER. But these two servers generated a significantly better model? Was it because they used multiple templates, a better alignment or both?
Our alignment file:
http://sysbio.rnet.missouri.edu/casp9_h ... 0557_1.pirThe model file:
http://sysbio.rnet.missouri.edu/casp9_h ... 0557_1.pdbT0562: MULTICOM-NOVEL got the best template (3LWX), but failed to generate a good alignment or mistakenly used other less similar templates (1SU0, 2QQ4). Other servers such as Bilab-ENABLE used the single template and generated the best alignment for this target. I was keen to learn how Bilab-Enable managed to generate a better alignment successfully? What factors were taken into account? Did multiple templates cause a problem in our modeling in this case?
Our alignment file:
http://sysbio.rnet.missouri.edu/casp9_h ... 0562_1.pirThe model file:
http://sysbio.rnet.missouri.edu/casp9_h ... 0562_1.pdbT0564: all of our profile alignment tools failed to get the good template (1WJJ). I was keen to learn how other servers such as Raptor, HHpred, Seok-Server was able to select it successfully? Was it due to high quality profiles, better alignment strategy or other features?
T0568: our servers got good templates (2PN5 & 2P9R), but wasn’t able to model the front uncovered regions (54 residues) well. Our servers tried to refine the front tail, but it didn’t seem to help. Some servers such as Phyre2, SAM-T08, Pcomb, GSmetaserver, QUARK, BAKER-ROSETTASERVER did well on this target. I was wondering what made the difference. Was it due to front end refinement or a better alignment?
Our alignment file:
http://sysbio.rnet.missouri.edu/casp9_h ... 0568_1.pirOur model file:
http://sysbio.rnet.missouri.edu/casp9_h ... 0568_1.pdb T0579: all our profile alignment tools failed to find the best template (2QQR), which is a two domain protein. But quite a few servers got this right? I was wondering what approaches / information were used to successfully identify this template by these servers?
T0588: our servers used some reasonable templates (1QAZ, 1RW9), but was not able to use one of the best templates (3EV1 ?) that was also identified. I was wondering how other servers (e.g. RaptorX, Zhang-Server) chose the better template such as 3EV1?
T0598: MULTICOM-CLUSTER used two good templates (2OSO, 2OSD), but generated a worse model than other servers such as Zhang-Server, pro-sp3-TASSER, and gws using templates (2OSO, 2OSD, 2Z9F, 2C0J, 3CUE). Was it because these servers generated better loops or tails using some refinement protocol?
Our alignment file:
http://sysbio.rnet.missouri.edu/casp9_h ... 0598_1.pirThe model file:
http://sysbio.rnet.missouri.edu/casp9_h ... 0598_1.pdbT0602, MULTICOM-NOVEL used a good template (3A7M), but failed to generate a good alignment or model. I was keen to learn how other servers such as Seok-server, Zhang-Server, chunk-TASSER successfully managed to generate one of the best models for this target using the same template? Which part (e.g. model generation, alignment, or loop modeling) contributed to the success?
The alignment file:
http://sysbio.rnet.missouri.edu/casp9_h ... 0602_1.pirThe model file:
http://sysbio.rnet.missouri.edu/casp9_h ... 0602_1.pdbT0604 is a three-domain protein, most servers including ours failed on the first domain. I was keen to learn how other servers such as Zhang-server, pro-sp3-TASSER got the best template for this domain successfully?
T0612: our servers were able to get the core of the two-layer beta sheets correct using a template (3FRP, 3FN9). However, none of these templates provides a good conformation to pack the first two stands with the rest of beta sheets. A few servers such as Zhang-Server did very well on this target. I was keen to learn how Zhang-Server managed to pack the first two strands? Was it due to refinement or better alignments?
T0628: Our server did well on the first domain, but failed on the second domain using one template 2E2O. Other servers such as Zhang-Server using multiple templates including 2E2O and BAKER-ROSETTASERVER using one template 1HUX did very well. I was wondering what went wrong in our case. The problem could be caused by alignment or template ranking. I was keen to learn what contributed to the success of the other two servers on this target?
T0630: there was a serious challenge in selecting the best template and generate a good alignment in this case. There are several templates available such as 2IF6, 2JYX, 2HBW, 2EVR. The best template is 2IF6 which can covers the entire target. Other templates only can cover either the beta-barrel region or helix regions. Our pairwise model selection mistakenly chose the model generated from 2JYX, 2HBW, 2EVR because they are predominant. Another challenge lied in alignment with 2IF6, which could lead to a model having a long loop from residue 38 to residue 62. All these challenges confused our servers, which even predicted that the target had two domains. I was keen to learn how other servers such as RaptorX, Jiang_Assembly managed to rank template 2IF6 out of many other templates at the top and generated a good alignment?
Here are the five models predicted by MULTICOM-CLUSTER where model 5 based on 2IF6 is the best. Other models are based on other templates.
The five models are:
http://sysbio.rnet.missouri.edu/casp9_h ... 0630_1.pdbhttp://sysbio.rnet.missouri.edu/casp9_h ... 0630_2.pdbhttp://sysbio.rnet.missouri.edu/casp9_h ... 0630_3.pdbhttp://sysbio.rnet.missouri.edu/casp9_h ... 0630_4.pdbhttp://sysbio.rnet.missouri.edu/casp9_h ... 0630_5.pdbThe alignment file for model 5:
http://sysbio.rnet.missouri.edu/casp9_h ... 0630_5.pir