Discussion: Assembly (quaternary structure) prediction

PostPosted: Wed Apr 04, 2018 4:30 pm
by akryshtafovych
Details on the CASP12 Assembly evaluation are provided in . CASP12 assessors suggested a series of changes to the submission and evaluation procedures.

CASP13 assessor (Jose Duarte)
summarized these suggestions and provided his own opinions as follows:

Prediction submission
1. As is currently done, targets of unknown oligo state should be distinguished from known monomers.
2. Oligo state should be provided in a machine-parsable format.
3. Targets involved in heteromeric complexes have to be linked in a parsable manner (such as being released in a single FASTA file), and not only mentioned in the additional information.
4. Following on the last point, splitting into multiple model submissions requires assessors to guess whether an assembly prediction was intended (via contacts/clashes), which is clearly problematic (as demonstrated by the high-scoring T0884-T0885 TS239-1, which passes the thresholds but has no meaningful interface). It also is very confusing for cases like T0861-T0862-T0870, where the A2B2C2 stoichiometry does not imply that each subunit forms a dimer independently. Providing a single target specification and accepting a single PDB model of all chains would clarify the predictors' intent.

Assembly scoring
5. Using jaccard distance (0 good, 1 bad) for one metric and F1 (0 bad, 100 good) for another metric was confusing. At a minimum, switching them to have the same range and direction would be clearer while still keeping the z-scores identical for comparison with CASP12. Switching both to use jaccard distance would be even clearer, but could impede the score comparison of future editions with CASP12.

CASP organizers
in response to the recommendations of the Assembly assessor, are changing the assembly format and the procedure on how the assembly targets are released and assembly predictions are accepted. These changes are described in detail at the casp13 format page () and casp13 registration page (). In brief, we would request a whole-structure submission for assembly targets (like a typical PDB multimer, no separate submission for subunits). Assembly targets will be named differently (starting with 'H' for hetero-mers and 'O' for homo-mers), so there is no confusion about the predictors' intentions. We will be sending servers sequences of the heteromers as one FASTA file, and additionally providing the stoichiometry information (to the best of our knowledge at the time of the target release) as a separate parameter. This information will also be posted on the Target List page for regular groups.

PostPosted: Wed Apr 04, 2018 4:56 pm
by djones
I think this is an interesting idea, but possibly a bit of a major change in submission protocol with server tests just two weeks away.

Is that submission format change just for 3-D servers or are you planning to send those multisequence submissions to RR servers as well?
It's definitely a bit late in the day to start trying to implement automatic inter-chain contact prediction algorithms on those servers.

PostPosted: Wed Apr 04, 2018 5:22 pm
by akryshtafovych
No, we thought of doing that only for 3D assembly submissions, as the current RR format allows submission of inter-chain contacts (I could track such an opportunity back to at least CASP3, even though I don't remember seeing a single inter-chain RR prediction). Having said that, I should also say that we obviously can send hetero-meric sequences and stoichiometry info in machine-parsable format to RR servers too.

PostPosted: Thu Apr 19, 2018 3:14 am
by aleixlafita
About the assessor points:

1. Assigning different identifiers to assembly targets can solve this problem. However, will the oligomeric models submitted to the assembly targets also be assessed for tertiary structure? I think this is something that should be clear to predictors, whether they have to submit separate models or not. My opinion is that they should also be included (only one oligomeric submission for both assessments). In that case and considering it is an homo-oligomer, would only chain A be assessed in tertiary structure category?

5. For the CASP12 assembly assessment paper we already used a 0-1 scale for both F1 and Jaccard measures, in order to improve the clarity of the assessment results.

About the RR format:

In the case of oligomers, the contact prediction of intra-chain residue contacts (native fold contacts) and inter-chain contacts (interface) should be distinguished in the prediction. Some methods claim to be able to distinguish wether a residue contact is within a chain or across chains. The contact therefore involves the same residue numbers/identifiers, but from different chains of the same target. How would that be represented in the RR format, in case a predictor wants to include it?