RR Results Table HELP


Contact

Contact definition: A pair of residues is considered in contact if the distance between their Cb atoms (Ca in case of GLY) is smaller than 8.0Å.

Contact Range

The parameter indicates the separation along the sequence of two residues in contact. Three types of contacts are defined: Contacts between residues separated by less than 6 residues are usually associated with secondary structure and are not evaluated.

List Size

In order to evaluate performance of contact predictors on the same number of contacts, we trim the lists to N contacts with the highest probabilities. The List Size parameter N determines the length of contact lists and takes values: L/2, L/5 and 10, where L - length of the sequence. Parameter FL corresponds to all predicted contacts (different for different predictions).

F1

F1-score is the harmonic mean of precision and recall (see below) and is calculated according to the formula:
F1 = 2*precision*recall/(precision+recall).

Prec

Prec = TP/Np,
where Np=TP+FP is the number of predicted contacts,
TP and FP are the numbers of correctly and incorrectly predicted contacts, correspondingly.

Recall

Recall = TP/Nc,
where TP is the number of correctly predicted contacts,
Nc is the number of all contacts in the target structure.

AUC_PR

Area Under the precision-recall Curve.

MCC

Matthew's Correlation Coefficient calculated by formula:
MCC = (TP*TN - FP*FN)/sqrt((TP + FP)(TP + FN)(TN + FP)(TN + FN))
where TP and FP are teh numbers of correctly and incorrectly predicted contacts, correspondingly,
TN is the number of non-contacts in the target structure not appearing in the prediction list,
FN is the number of contacts in the target structure missing in the prediction list.

ES

Entropy Score:
The score calculates the relative drop of the entropy introduced by a set of distance constraints (in our case - correctly predicted residue-residue contacts) with the respect to the reference value of the entropy for the protein of a given length without constraints. The score is calculated by formula (ref.):
ES = 100% * (Entropy|0 - Entropy|C) / Entropy|0 ,
where
Entropy|0 is the entropy value for the protein without constraints,
Entropy|C is the entropy value given a set of constraints C.
 
Entropy|x = AVERAGE_over_all_pairs_of_residues (LOG(UpperLimit - LowerLimit)),
where
x = '0' or 'C',
LowerLimit (both for contacts and non-contacts) = 3.2Å
UpperLimit for contacts = 8Å
UpperLimit for non-contacts = diameter of gyration (DG).
The diameter of gyration is calculated by formula (ref):
DG=5.54L^0.34 (L - length of the protein sequence).

ES(ext)

Entropy Score (extended):
A version of the ES score (see above) with
UpperLimit for non-contacts = 3.8Å * N, where N is number of residues in the protein.

Prec(prob), Recall(prob)

The statistics calculated such that the number of true positives in the numerator of the corresponding formulae are weighted by the probabilities submitted with the predicted contact pairs:
Prec(prob) = TP_pw/Np,
Recall(prob) = TP_pw/Nc,
where
TP_pw is the sum of predicted probabilities of correctly predicted contacts in the selected list size,
Np is the number of predicted contacts,
Nc is the number of contacts in target structure.

F1(prob)

The harmonic mean of prec(prob) and recall(prob) (see above):
F1 = 2*Prec(prob)*Recall(prob)/(Prec(prob)+Recall(prob)).

Prec-FDR(prob)

Difference between the probability-weighted precision and probability-weighted False Discovery Rate:
Prec-FDR(prob) = (TP_pw-FP_pw)/Np,
where
TP_pw is the sum of probabilities of correctly predicted contacts,
FP_pw is the sum of probabilities of wrongly predicted contacts,
Np is the number of predicted contacts.

Prec(pwa), Prec-FDR(pwa)

The statistics calculated such that number of true positives and false positives in the corresponding formulae are weighted by the probabilities submitted with the predicted contact pairs:
Prec(pwa) = TP_pw/(TP_pw+FP_pw),
Prec-FDR(pwa) = (TP_pw-FP_pw)/(TP_pw+FP_pw),
where
TP_pw is the sum of probabilities of correctly predicted contacts,
FP_pw is the sum of probabilities of wrongly predicted contacts.

Prec(SS), Recall(SS), MCC(SS), F1(SS)

The corresponding statistics calculated for continuous residue stretches of helical or beta-stranded secondary structure (SS). Long secondary structure elements (>10 residues) are divided in subelements for a better accuracy of the score. The contact between two SS elements is considered as predicted if the list of residue-residue contacts contains at least one contact linking these SS elements.

Recall(SS Tp)

The recall calculated for contacts between secondary structure elements (see above) provided the incorrect residue-residue contacts are filtered out first. (true positives are counted only).

Count domains

The parameter indicates the number of domains predicted by a group.
[Home] Protein Structure Prediction Center