Format - CASP

CASP_Commons

Prediction Center

Home

Login

Registration

Feedback

Submission Rules and Format

Submission rules for all types of groups

Predictions in CASP_Commons may be submitted in 2 formats:

  TS    # Atomic coordinates 
  QA    # Model accuracy assessment

One team may make a prediction of a target by submitting up to five models in the TS categories, and two models in the QA category (see the QA format section for the timeline example of a typical QA prediction).
Each submission file should contain prediction for only one target.
Each submission file should contain only one of the allowed format categories.
Submission files in RR and QA categories should contain only one model.
Submission files in TS category may contain either one or several models. Each model should begin with the MODEL record, end with the END record, and contain no target residue repetitions. You may specify only one set of required header fields (PFRMAT, TARGET, AUTHOR, METHOD) above the first MODEL record in the prediction file. A multiple-model file will be split into separate files (one model per file) and each model (up to 5) will be sent separately to the verification server.
Submission of a duplicate model (same target, format category, group, model index) will replace previously accepted model, provided it is received before the deadline.
Each submission must begin with the PFRMAT, TARGET and AUTHOR records, contain the METHOD field and at least one block starting with the MODEL and ending with the END record.
Each submitted model is automatically verified by the format verification server. In case of successful submission no confirmation email will be sent. A unique model ACCESSION CODE is composed from the number of the target, prediction format category, prediction group number, and model index.
```
   Example:

   Accession code  C0001TS005_2  has the following components:
     C0001   target number
     TS      Tertiary Structure (PFRMAT TS)
     005     prediction group 5
     2       model index 2 
```
The accepted predictions could be viewed using Model Viewer link from the CASP_Commons web page.
If the submission contains an error, the regular group leader or server contact person will be immediately notified through email. If your prediction is rejected for format inconsistency, you will have the possibility to correct problems and re-send prediction(s) within the target prediction time window.

Submission rules for expert groups (usually, 3-week deadline in TS category, 3 day deadline for QA)

Predictions can be submitted by a group leader or a group member with submission privileges. The group leader can set the privileges (regular member or submitter) for every member of his group using the 'Review member status' option from 'My CASP_Commons profile' link. Members of prediction groups who intend to submit predictions should receive submission permission from the group leader first and then use the 12-digit Registration Code of the group to submit predictions for that group.
Models for regular deadline groups should be submitted directly by e-mail to models AT predictioncenter.org or using the CASP_Commons model submission facility.
When sending predictions by email, please send them in the body of the message.
When sending predictions by email, please remember to use only the email address registered with the Prediction Center as origination points (make sure we have the updated email address for you on file - check for this your "My Personal Data" link from the menu). If you temporary cannot use the registered email address for submission, please use the submission form instead.
Time for returning regular group predictions is set separately for each target. Usually regular deadline predictors have around 3 weeks from the date of target release to return a prediction.
Predictions in TS categories should contain sensible residue error estimates (in Angstroms) in the column reserved for the B-factor value in the PDB format.

Submission rules for server groups (3-day deadline)

CASP_Commons queries will NOT be sent to the registered servers from the CASP distribution server. Please submit queries to your server yourself and make sure that we receive your predictions at models AT predictioncenter.org by the server prediction deadline. TS servers are requested to return predictions in 5 days from the target release time. No additional time for corrections will be allotted, but corrections will be accepted within the original prediction window. Please, send your corrections manually to the address specified in the REPLY-E-MAIL field of the original query. Remember, that corrections can be submitted only by a group leader or a group member with submission privileges. The group leader can set the privileges (regular member or submitter) for every member of his group using the 'Review member status' option from 'My CASP_Commons profile' link. Members of prediction groups who intend to submit predictions should receive submission permission from the group leader first.
Server models must be submitted in the body of the email as a plain text. Subject of the email preferrably should contain the target number and the group name.
Each submission may contain several models. If server returns more than 5 models, the models numbered 6 and higher will be ignored. In the QA category please designate your model as MODEL 2.

Format description

All submissions should contain records described below. Each of these records must begin with a standard keyword. In all submissions standard keywords must begin in the first column of a record. The keyword set is as follows:

PFRMAT     Format specification code:  TS , QA 
TARGET     Target identifier from the CASP_Commons target table
AUTHOR     XXXX-XXXX-XXXX   Registration code of the Group Leader or Server Group Name 
SCORE      Reliability of the model (optional) 
REMARK     Comment record (may appear anywhere after the first 3 required lines, optional)
METHOD     Records describing the methods used
MODEL      Beginning of the data section for the submitted model
PARENT     Specifies structure template used to generate the TS model 
TER        Terminates chain in the oligomeric TS model
END        End of the submitted model

Models should be submitted in Plain Text format.

Record PFRMAT should appear on the first line of the prediction and is used for all submissions.

   PFRMAT TS
     TS  indicates that submission contains 3D atomic coordinates
         in standard PDB format

   PFRMAT QA
     QA  indicates that submission contains estimates of model accuracy

Record TARGET should appear on the second line of the prediction and is used for all submissions.

   TARGET Txxxx
     Txxxx indicates id of the target predicted.

Record AUTHOR should appear on the third line of the prediction and is used for all submissions.

 For all groups:
   AUTHOR XXXX-XXXX-XXXX
          XXXX-XXXX-XXXX indicates the Group Registration code.
          This is the code obtained by the group leader upon registration.

	  Note: Members of prediction groups who intend to submit predictions
          should receive submission permissions from the group leader and 
	  use the registration code of the Group for all predictions submitted by 
	  that group. If sending predictions by email, please send them from the 
	  registered emails of the group leader or group submitter. 
	  If you temporary can not use these emails for submission, please login 
	  to our website and then use our web-based submission facility. 

 Servers alternatively can be identified using their registered group names: 
   AUTHOR MY_SERVER_NAME     
      or 
   REMARK AUTHOR MY_SERVER_NAME
          where MY_SERVER_NAME is a name selected for the server group at registration

SCORE Optional. This record may be used to report a model reliability score. It will not influence the evaluation.

REMARK Optional. PDB style 'REMARK' records may be used anywhere in the submission. These records may contain any text and will in general not influence evaluation.

Records METHOD are used for all submissions.
These records describe the method used. Predictors are urged to provide a concise description of the method, including data libraries used, and values of default and non-default parameters.

Record MODEL is used for all submissions.
Signifies the beginning of model data.

   MODEL  n  
     n          Model index n is used to indicate predictor's ranking
                according to her/his belief which TS model is closest to the 
                target structure (1 <= n <= 5). Model index is included
                automatically in the ACCESSION CODE. All models with index
                higher than 5 will be discarded.

In QA category, predictors are requested to use model index '2' for all submissions.

Record PARENT is required only for the submissions in the TS format.
PARENT record indicates structure templates used to generate the MODEL (see description of the TS format below). One PARENT record is required for every prediction.

   PARENT N/A
     Indicates that a prediction is not directly based on any known
     structure. Note that this is the only indication in the file that the
     prediction is ab initio, so is a critical piece of information.

   PARENT 1abc_A
     Indicates that the model is
     based on a single PDB entry 1abc chain A (use _A to indicate chain A).
     All template-based predictions should be submitted with this form 
     of the PARENT record. Note that, in order to be accepted, the code 
     must correspond to a current PDB entry.

   PARENT 1cdc 2def_g [3hij_k ...]
     Indicates that the model is based on more than one structural template. 
     Up to five PDB chains may be listed here with additional detailed information 
     included in the METHOD records.

Record TER is used to terminate chains in TS predictions.

TER

Atomic coordinates (PFRMAT TS).
Standard PDB atom records are used for the atomic coordinates. Format of the submission requires that 80 column long records are used. These may be spaces when needed (see target template PDB files as provided in specific target descriptions available through the CASP_Commons target table).

Coordinate section for each model should begin with a single PARENT record and terminate with a TER record.

It is requested that coordinate data be supplied for at least all non-hydrogen main chain atoms, i.e. the N, CA, C and O atoms of every residue.

For any given MODEL, no target residue may be repeated in the prediction.

For quaternary structure predictions, coordinates for all chains should be submitted in the same frame of reference. Chains should be labeled 'A', 'B', ... according to the provided template for each target.

Atoms for which a prediction has been made must contain a value between 0.01 and 1.00 (usually "1.00") in the occupancy field; those for which no prediction has been made must either contain "0.00" in that field or be skipped altogether.

In place of temperature factor field, the error estimates, in Angstroms, should be provided. We require all predictors to submit the error estimates as these will be used in the evaluation. Models with all residues having the same 'B-factor' will be rejected. If your software predicts per-residue B-factor-like score instead of distance in Angstroms - please convert your B-score to distance d inverting the formula B=(8pi^2*d^2)/3 (or indicate nature of your score in the REMARKS).

Estimation of model accuracy (PFRMAT QA).

In QA category, predictors are requested to use model index '2' for all submissions.

Data are inserted between MODEL and END records of the submission file. You may submit your quality assessment prediction in one of the two different modes:
QMODE 1 : global model quality score (MQS - one number per model)
QMODE 2 : MQS and error estimates on per-residue basis.

The first line of data should specify mode identifier, i.e. QMODE (see Example 3).

In both modes, the first column in each line contains model identifier (file name of the accepted 3D prediction). The second column contains the accuracy score for a model as a whole (MQS). The accuracy score is a real number between 0.0 and 1.0 (1.0 being a perfect model). If you don't provide MQS for a model please put "X" in the corresponding place. If you don't want to additionally provide error estimates on per residue basis (QMODE 1), your data table will consist of these two columns only.

If you do additionally provide residue error estimates (QMODE 2), each consecutive column should contain error estimate in Angstroms for all the consecutive resides in the target (i.e., column 3 corresponds to residue 1 in the target, column 4 - to residue 2 and so on). This way data constitute a table (Number_of_models_for_the_target) BY (Number_of_residues_in_the_target + 1). Do not skip columns if you are not predicting error estimates for some residues - instead put "X" in the corresponding column.
Please specify in the REMARKS what you consider to be an error estimate for a residue (CA location error, geometrical center error, etc.).

Note 1. Please, be advised that a QA record line may be very long and that some editors/mailing programs may force line wrap potentially causing unexpected parsing errors. To avoid this problem we recommend that you split long lines into shorter sublines (50-100 columns of data) by yourself. Our parser will consider consecutive sublines (starting with the line containing evaluated model name and ending with the line containing the next model name or tag END) a part of the same logical line.

Note 2. Please, be advised that model quality predictions in CASP are evaluated by comparing submitted estimates of global reliability and per-residue accuracy of structural models with the values obtained from CASP model evaluation packages (LGA, LDDT, CAD-score and others). Since the evaluation score that is used across the categories in CASP is GDT_TS, predictors should strive to predict this score in QMODE1 (QA1). Predicted per-residue distances in QMODE2 should ideally reproduce those extracted from the LGA optimal model-target superpositions.

END record is used for all predictions and indicates the end of a single model submission.

Example 1. Atomic coordinates (Tertiary Structure)

The primary CASP_Commons format used for tertiary structure prediction

PFRMAT TS
TARGET C0001
AUTHOR 1234-5678-9000
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
MODEL  1 
PARENT 1abc 1def_A
ATOM      1  N   GLU     1      10.982  -9.774   1.377  1.00  0.50
ATOM      2  CA  GLU     1       9.623  -9.833   1.984  1.00  0.50
ATOM      3  C   GLU     1       8.913 -11.104   1.521  1.00  0.50
ATOM      4  O   GLU     1       9.187 -11.630   0.461  1.00  0.50
ATOM      5  CB  GLU     1       8.814  -8.614   1.546  1.00  0.50
ATOM      6  CG  GLU     1       7.372  -8.754   2.039  1.00  0.50
ATOM      7  CD  GLU     1       7.339  -8.625   3.562  1.00  0.50
ATOM      8  OE1 GLU     1       8.370  -8.307   4.131  1.00  0.50
ATOM      9  OE2 GLU     1       6.284  -8.846   4.132  1.00  0.50
ATOM     10  N   THR     2       7.998 -11.599   2.304  1.00  1.60
ATOM     11  CA  THR     2       7.266 -12.832   1.907  1.00  1.60
ATOM     12  C   THR     2       6.096 -12.456   1.005  1.00  1.60
ATOM     13  O   THR     2       5.008 -12.217   1.466  1.00  1.60
ATOM     14  CB  THR     2       6.731 -13.533   3.157  1.00  1.60
ATOM     15  OG1 THR     2       7.662 -13.379   4.220  1.00  1.60
ATOM     16  CG2 THR     2       6.526 -15.019   2.864  1.00  1.60
ATOM     17  N   VAL     3       6.308 -12.396  -0.278  1.00  1.70
ATOM     18  CA  VAL     3       5.190 -12.030  -1.187  1.00  1.70
ATOM     19  C   VAL     3       3.954 -12.870  -0.844  1.00  1.70
ATOM     20  O   VAL     3       2.834 -12.471  -1.090  1.00  1.70
ATOM     21  CB  VAL     3       5.608 -12.274  -2.641  1.00  1.70
ATOM     22  CG1 VAL     3       5.542 -13.771  -2.959  1.00  1.70
ATOM     23  CG2 VAL     3       4.664 -11.514  -3.573  1.00  1.70
ATOM     24  N   GLU     4       4.146 -14.029  -0.272  1.00  1.70
ATOM     25  CA  GLU     4       2.976 -14.882   0.086  1.00  1.60
ATOM     26  C   GLU     4       2.153 -14.190   1.175  1.00  1.50
ATOM     27  O   GLU     4       0.942 -14.141   1.109  1.00  1.40
ATOM     28  CB  GLU     4       3.465 -16.238   0.597  1.00  1.30
ATOM     29  CG  GLU     4       2.336 -17.264   0.479  1.00  1.20
ATOM     30  CD  GLU     4       2.929 -18.671   0.391  1.00  1.10
ATOM     31  OE1 GLU     4       4.056 -18.846   0.823  1.00  1.00
ATOM     32  OE2 GLU     4       2.246 -19.551  -0.108  1.00  0.90
TER
END

Example 2. Multichain predictions

PFRMAT TS
TARGET C0001
AUTHOR 1234-5678-9000
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
MODEL  1 
PARENT N/A
ATOM      1  N   GLU A   1      22.576  19.032  -5.026  1.00  0.00
ATOM      2  CA  GLU A   1      22.879  20.313  -4.321  1.00  0.00
ATOM      3  CB  GLU A   1      22.285  21.478  -5.449  1.00  0.00
ATOM      4  CG  GLU A   1      23.018  21.946  -6.707  1.00  0.00
ATOM      5  CD  GLU A   1      24.351  22.625  -6.434  1.00  0.00
ATOM      6  OE1 GLU A   1      25.379  21.908  -6.380  1.00  0.00
ATOM      7  OE2 GLU A   1      24.381  23.879  -6.291  1.00  0.00
ATOM      8  O   GLU A   1      22.237  20.962  -2.117  1.00  0.00
ATOM      9  C   GLU A   1      21.857  20.684  -3.261  1.00  0.00
ATOM     10  N   VAL A   2      20.585  20.675  -3.601  1.00  0.00
ATOM     11  CA  VAL A   2      19.530  21.006  -2.624  1.00  0.00
ATOM     12  CB  VAL A   2      18.277  21.590  -3.319  1.00  0.00
ATOM     13  CG1 VAL A   2      17.182  21.859  -2.270  1.00  0.00
ATOM     14  CG2 VAL A   2      18.656  22.833  -4.079  1.00  0.00
ATOM     15  O   VAL A   2      18.770  18.750  -2.603  1.00  0.00
ATOM     16  C   VAL A   2      19.096  19.721  -1.933  1.00  0.00
ATOM     17  N   HIS A   3      19.115  19.700  -0.603  1.00  0.00
ATOM     18  CA  HIS A   3      18.780  18.489   0.122  1.00  0.00
ATOM     19  CB  HIS A   3      19.559  18.445   1.410  1.00  0.00
ATOM     20  CG  HIS A   3      21.015  18.684   1.224  1.00  0.00
ATOM     21  CD2 HIS A   3      21.767  19.803   1.367  1.00  0.00
ATOM     22  ND1 HIS A   3      21.851  17.721   0.702  1.00  0.00
ATOM     23  CE1 HIS A   3      23.072  18.220   0.589  1.00  0.00
ATOM     24  NE2 HIS A   3      23.048  19.478   0.985  1.00  0.00
ATOM     25  O   HIS A   3      16.777  19.181   1.220  1.00  0.00
ATOM     26  C   HIS A   3      17.296  18.417   0.409  1.00  0.00
TER 
PARENT 1abc
ATOM   1321  N   GLU B   1     -22.603 -17.981  -4.847  1.00  0.00
ATOM   1322  CA  GLU B   1     -22.889 -19.285  -4.180  1.00  0.00
ATOM   1323  CB  GLU B   1     -22.342 -20.410  -5.372  1.00  0.00
ATOM   1324  CG  GLU B   1     -23.122 -20.828  -6.619  1.00  0.00
ATOM   1325  CD  GLU B   1     -24.447 -21.511  -6.324  1.00  0.00
ATOM   1326  OE1 GLU B   1     -25.468 -20.792  -6.207  1.00  0.00
ATOM   1327  OE2 GLU B   1     -24.479 -22.769  -6.227  1.00  0.00
ATOM   1328  O   GLU B   1     -22.172 -20.020  -2.026  1.00  0.00
ATOM   1329  C   GLU B   1     -21.830 -19.701  -3.172  1.00  0.00
ATOM   1330  N   VAL B   2     -20.572 -19.685  -3.557  1.00  0.00
ATOM   1331  CA  VAL B   2     -19.485 -20.056  -2.630  1.00  0.00
ATOM   1332  CB  VAL B   2     -18.260 -20.619  -3.392  1.00  0.00
ATOM   1333  CG1 VAL B   2     -17.131 -20.932  -2.393  1.00  0.00
ATOM   1334  CG2 VAL B   2     -18.674 -21.832  -4.184  1.00  0.00
ATOM   1335  O   VAL B   2     -18.711 -17.807  -2.553  1.00  0.00
ATOM   1336  C   VAL B   2     -19.020 -18.800  -1.909  1.00  0.00
ATOM   1337  N   HIS B   3     -18.990 -18.829  -0.580  1.00  0.00
ATOM   1338  CA  HIS B   3     -18.623 -17.648   0.178  1.00  0.00
ATOM   1339  CB  HIS B   3     -19.356 -17.649   1.494  1.00  0.00
ATOM   1340  CG  HIS B   3     -20.819 -17.875   1.353  1.00  0.00
ATOM   1341  CD2 HIS B   3     -21.571 -18.995   1.480  1.00  0.00
ATOM   1342  ND1 HIS B   3     -21.667 -16.890   0.896  1.00  0.00
ATOM   1343  CE1 HIS B   3     -22.894 -17.378   0.809  1.00  0.00
ATOM   1344  NE2 HIS B   3     -22.864 -18.650   1.156  1.00  0.00
ATOM   1345  O   HIS B   3     -16.586 -18.389   1.177  1.00  0.00
ATOM   1346  C   HIS B   3     -17.129 -17.592   0.414  1.00  0.00
TER
END

Example 3. Estimates of model accuracy prediction

(A) Global Model Quality Score

PFRMAT QA
TARGET T0999
AUTHOR 1234-5678-9000
METHOD Description of methods used
MODEL 2
QMODE 1
C1901TS001_1 0.8 
C1901TS003_1 0.4 
C1901TS005_1 0.2 
END

(B) Residue-based Quality Assessment (fragment of the table). Note, that this case includes case (A) and there is no need to submit QMODE 1 predictions additionlly to QMODE 2.

PFRMAT QA
TARGET T0999
AUTHOR 1234-5678-9000
REMARK Residue's error estimate is the CA-CA distance in Angstroms in the optimal model-target superposition
METHOD Description of methods used
MODEL 2
QMODE 2
C1901TS001_1 0.8 10.0 6.5 5.0 2.0 1.0  
5.0 4.3 4.6 ...
C1901TS003_1 0.7 8.0 5.5 4.5 X X 
4.5 4.2 5.0 ...
END

Protein Structure Prediction Center
Sponsored by the US National Institute of General Medical Sciences (NIH/NIGMS)
Please address any questions or queries to:
© 2007-2018, University of California, Davis