The difficulty of assessing domain predictions - any ideas?

by **mcguffin** on Sun Dec 07, 2008 1:24 am

At the CASP business session I said that I felt the main problem with domain prediction was the assessment. To clarify, I certainly did not mean this as a criticism of the assessors, past or present! Rather, the point I was trying to make was that the assessment of domain predictions remains a very difficult problem, and we should continue to discuss how to assess this category if it is to be kept in the future.

The main problem seems to be the subjectivity of domain definition. Could this be fixed by agreeing on some automated tool to use for domain parsing (just for the domain assessment category), such as PUU, PDP etc. or a consensus? This way everyone could agree where the goal posts were at the start. It would also be very helpful for developers in the benchmarking of their methods as it would give them a standard definition to aim for.

Another issue that remains is the treatment of single domains. In domain prediction I feel that there are two questions that methods should aim to answer:
1. How many domains does this protein have?
2. Where are they located in the sequence?

The first question is very important and it seems to have been a bit neglected. In my view it is unhelpful to throw away all single domains for the assessment, which has often been done. Most proteins have single domains, and it is reasonable to test methods for their ability to discrimnate these from multiple domains. The over prediction of multiple domains may not be helpful, so should it be more heavily penalised in the future?

The difficulty of assessing domain predictions - any ideas?

The difficulty of assessing domain predictions - any ideas?

Who is online