RR Results Table HELP
Contact definition: A pair of residues is considered in contact if the distance
between their Cb atoms (Ca in case of GLY) is smaller than 8.0Å.
The parameter indicates the separation along the sequence of two residues in contact.
Three types of contacts are defined:
- long range contacts (separation >= 24);
- medium range contacts (12 <= separation <= 23);
- short range contacts (6 <= separation).
Contacts between residues separated by less than 6 residues are usually associated
with secondary structure and are not evaluated.
In order to evaluate performance of contact predictors on the same number
of contacts, we trim the lists to N contacts with the highest probabilities.
The List Size parameter N determines the length of contact lists and
takes values: L/2, L/5 and 10, where L - length of the sequence.
Parameter FL corresponds to all predicted contacts (different for
different predictions).
F1-score is the harmonic mean of precision and recall (see below) and is calculated according to the formula:
F1 = 2*precision*recall/(precision+recall).
Prec = TP/Np,
where Np=TP+FP is the number of predicted contacts,
TP and FP are the numbers of correctly and incorrectly predicted contacts, correspondingly.
Recall = TP/Nc,
where TP is the number of correctly predicted contacts,
Nc is the number of all contacts in the target structure.
Area Under the precision-recall Curve.
Matthew's Correlation Coefficient calculated by formula:
MCC = (TP*TN - FP*FN)/sqrt((TP + FP)(TP + FN)(TN + FP)(TN + FN))
where TP and FP are teh numbers of correctly and incorrectly predicted contacts, correspondingly,
TN is the number of non-contacts in the target structure not appearing in the prediction list,
FN is the number of contacts in the target structure missing in the prediction list.
Entropy Score:
The score calculates the relative drop of the entropy introduced
by a set of distance constraints (in our case - correctly predicted
residue-residue contacts) with the respect to the reference value
of the entropy for the protein
of a given length without constraints.
The score is calculated by formula (ref.):
ES = 100% * (Entropy|0 - Entropy|C) / Entropy|0 ,
where
Entropy|0 is the entropy value for the protein without constraints,
Entropy|C is the entropy value given a set of constraints C.
Entropy|x = AVERAGE_over_all_pairs_of_residues (LOG(UpperLimit - LowerLimit)),
where
x = '0' or 'C',
LowerLimit (both for contacts and non-contacts) = 3.2Å
UpperLimit for contacts = 8Å
UpperLimit for non-contacts = diameter of gyration (DG).
The diameter of gyration is calculated by formula (ref):
DG=5.54L^0.34 (L - length of the protein sequence).
Entropy Score (extended):
A version of the ES score (see above) with
UpperLimit for non-contacts = 3.8Å * N, where N is number of residues in the protein.
The statistics calculated such that the number of true positives
in the numerator of the corresponding formulae are weighted by the
probabilities submitted with the predicted contact pairs:
Prec(prob) = TP_pw/Np,
Recall(prob) = TP_pw/Nc,
where
TP_pw is the sum of predicted probabilities of correctly predicted contacts in the selected list size,
Np is the number of predicted contacts,
Nc is the number of contacts in target structure.
The harmonic mean of prec(prob) and recall(prob) (see above):
F1 = 2*Prec(prob)*Recall(prob)/(Prec(prob)+Recall(prob)).
Difference between the probability-weighted precision and probability-weighted False Discovery Rate:
Prec-FDR(prob) = (TP_pw-FP_pw)/Np,
where
TP_pw is the sum of probabilities of correctly predicted contacts,
FP_pw is the sum of probabilities of wrongly predicted contacts,
Np is the number of predicted contacts.
The statistics calculated such that number of true positives and
false positives in the corresponding formulae are weighted by the
probabilities submitted with the predicted contact pairs:
Prec(pwa) = TP_pw/(TP_pw+FP_pw),
Prec-FDR(pwa) = (TP_pw-FP_pw)/(TP_pw+FP_pw),
where
TP_pw is the sum of probabilities of correctly predicted contacts,
FP_pw is the sum of probabilities of wrongly predicted contacts.
The corresponding statistics calculated for continuous residue stretches
of helical or beta-stranded secondary structure (SS). Long secondary
structure elements (>10 residues) are divided in subelements for a better
accuracy of the score. The contact between two SS elements is considered
as predicted if the list of residue-residue contacts contains at least
one contact linking these SS elements.
The recall calculated for contacts between secondary structure elements
(see above) provided the incorrect residue-residue contacts
are filtered out first. (true positives are counted only).
The parameter indicates the number of domains predicted by a group.
Protein Structure Prediction Center