Evaluation of CASP9 Quality Assessment

by **M.Pawlowski** on Thu Dec 02, 2010 10:11 am

Evaluation of CASP9 Quality Assessment (QA) Predictors

The benchmark was made for 4 types of CASP9 target datasets:

single-domain targets:
1)Human/Server and Server only
2)Human/Server
single- and multiple- domain targets
3)Human/Server and Server only
4)Human/Server

Marcin Pawlowski

by **arneelof** on Thu Dec 02, 2010 1:27 pm

Just a reminder, the Pearson correlation coefficient between different measures is in the order 0.8 to 0.9 so it can be questioned if any differences really are relevant (see http://bioinfo.se/papers/17894353.pdf).

Perhaps more relevant is the sum of GDT_TS for the highest ranked model.

by **M.Pawlowski** on Thu Dec 02, 2010 2:05 pm

Thanks for your replay
I completely agree with you.
It is my next plan to do a benchmark based on the sum of GDT_TS ( or even positive z-score of GDT_TS) of the highest scored models . If I find some time tomorrow I will do it.

Another quite interesting benchmark, and in my opinion perhaps the most informative one, would be the head-to-head QA predictors comparison (e.g. based on eighter GDT_TS of highest scored model or correlation)
To my best knowledge nobody has made a such test so far.
Marcin

by **M.Pawlowski** on Thu Dec 02, 2010 2:27 pm

Of course,
by head-to-head comparison test I meant a total number of cases where a QA predictor won statistically significantly with other QA predictors

by **mcguffin** on Fri Dec 03, 2010 1:05 am

Yes, using Pearson correlation alone is a bit limited, particularly because the data is very often non-linear.

If we want to use "correlations" then Kendall's Tau and/or Spearman's Rho are better. These scores more accurately reflect the ability of methods to rank models correctly. You can have situations where you have a low Pearsons R, but still have a reasonably good ranking. Conversely you have situations where Pearsons R is very high, but the ranking is way off.

I agree that the measurement of the observed model quality of top ranked models is also very useful, but only if used along with an appropriate significance test i.e. paired Wilcoxon signed rank sum tests. If the GTD_TS sum of top models is used alone to select a "winning" method, then one or two incorrect models can adversely affect the difference in perceived performance of methods, where there may actually be no significant difference.

I've been saying this for several years - I go into more detail here:
http://www.biomedcentral.com/1471-2105/8/345

by **terashig** on Fri Dec 03, 2010 1:12 am

I have a technical question:

How did you treat the models containing multiple segments(multiple-models) ?

like....
T0621TS035_1_1.lga:SUMMARY(GDT) 79 169 4.0 14 2.84 7.988 7.067 0.476
T0621TS035_1_2.lga:SUMMARY(GDT) 93 169 4.0 34 3.06 19.822 16.448 1.077
T0621TS060_1_1.lga:SUMMARY(GDT) 79 169 4.0 14 2.84 7.988 7.067 0.476
T0621TS060_1_2.lga:SUMMARY(GDT) 93 169 4.0 36 1.96 20.562 17.674 1.746
T0621TS060_2_1.lga:SUMMARY(GDT) 79 169 4.0 21 3.03 12.130 10.113 0.672
T0621TS060_2_2.lga:SUMMARY(GDT) 93 169 4.0 40 2.56 20.118 17.318 1.505
T0621TS060_3_1.lga:SUMMARY(GDT) 79 169 4.0 17 2.93 9.320 8.481 0.561
T0621TS060_3_2.lga:SUMMARY(GDT) 93 169 4.0 17 3.28 10.947 9.585 0.504
T0621TS060_4_1.lga:SUMMARY(GDT) 79 169 4.0 15 2.13 9.172 7.836 0.674
T0621TS060_4_2.lga:SUMMARY(GDT) 93 169 4.0 37 2.71 18.047 15.882 1.318

There are 3229 multiple-models.

by **arneelof** on Fri Dec 03, 2010 11:32 am

I would actually be even more interested in if any method could do any significant per-target correlation among the 10% top models.

I think the tests in CASPs are way too easy for MQAPs as there are too many really bad models.

Yours

Arne

by **M.Pawlowski** on Sun Dec 05, 2010 8:08 am

“How did you treat the models containing multiple segments ?”

Good technical question, thank you.

I did not took care about it, now I think that I should have considered this, there are at least 3 ways how one should treat such multiple-models,

remove them,
select randomly of them,
select one of them using some criteria.

I am sure to do the benchmark again with one of above-mentioned, perhaps I will remove all such models from the database.

If you (anybody) has any suggestion, please let me know,

by **M.Pawlowski** on Sun Dec 05, 2010 8:09 am

Yes,
It can be done for various types of the 10% top models, according to:
1) official GDT_TS,
2) models submitted by 10% top SERVER PREDICTORS,

The second test would be really important for biologists and modellers

by **terashig** on Sun Dec 05, 2010 9:28 am

remove them,
select randomly of them,
select one of them using some criteria.

I think the multiple-models cause unnatural low correlations, so
select a segment which has higher gdt_ts for each model,
OR
remove them
is better.

ex:
GROUP1_TS1_1 GDT_TS=70.0 <--select
GROUP1_TS1_2 GDT_TS=20.0
GROUP2_TS1_1 GDT_TS=10.0
GROUP2_TS1_2 GDT_TS=40.0 <--select

Evaluation of CASP9 Quality Assessment

Evaluation of CASP9 Quality Assessment

Re: Evaluation of CASP9 Quality Assessment

Re: Evaluation of CASP9 Quality Assessment

Re: Evaluation of CASP9 Quality Assessment

Re: Evaluation of CASP9 Quality Assessment

Re: Evaluation of CASP9 Quality Assessment

Re: Evaluation of CASP9 Quality Assessment

Re: Evaluation of CASP9 Quality Assessment

Re: Evaluation of CASP9 Quality Assessment

Re: Evaluation of CASP9 Quality Assessment

Who is online