LCS and GDT description
Longest Continuous Segments under specified CA RMSD cutoff (LCS).
The algorithm identifies all the longest continuous segments of residues
in the prediction deviating from the target by not more than specified
CA RMSD cutoff using many different superpositions.
Each residue in a prediction is assigned to the longest of such segments
provided if is a part of that segment. The absolutely longest continuous
segment in prediction under given RMSD cutoff is reported as well.
For different values of the CA RMSD cutoff (1.0 A, 2.0 A, and 5.0 A) the results
of the analysis are reported.
This measure can be used to evaluate ab initio 3D and comparative modeling
predictions.
Global Distance Test (GDT). The algorithm identifies in the prediction
the sets of residues deviating from the target by not more than
specified CA DISTANCE cutoff using many different superpositions.
Each residue in a prediction is assigned to the largest set of the residues
(not necessary continuous) deviating from the target by no more than a
specified distance cutoff.
This measure can be used to evaluate ab-initio 3D and comparative modeling
predictions.
For different values of DISTANCE cutoff (0.5 A, 1.0 A, 1.5 A, ... 10.0 A),
several measures are reported:
NUMBER_OF_CA_max - the number of CA's from the "largest set" that
can fit under specified distance cutoff
PERCENT_OF_CA_Tg - percent of CA's from the "largest set" comparing
to the total number of CA's in target
FRAGMENT: Beg-End - beginning and end of the segment containing the
"largest set" of CA's
RMS_LOCAL - RMSD (root mean square deviation) calculated on the
"largest set" of CA's
RMS_ALL_CA - RMSD calculated on all CA after superposition of
the prediction structure to the target structure
based on the "largest set" of CA's
The goal of introducing these two measures (GDT and LCS) is to provide a
tool that can be used for better detection of relatively good or bad parts
of the model.
- Using LCS we can localize the "best" continuous (along
the sequence) parts of the model that can fit under
RMSD thresholds: 1A, 2A, and 5A
Three blue lines represent the longest continuous sets
of residues that can fit under 1A, 2A, and 5A cutoff,
respectively.
- Using GDT we can localize the "best" sets of residues
(not necessary continuous) that can fit under DISTANCE
thresholds: 0.5A, 1.0A, 1.5A ,..., 10.0A
There are three blue lines on the GDT plot.
Each line represents the set of 5, 10, or 50 percent of
residues that can fit under specific distance cutoff (axis Y).
So, the lowest line represents residues (axis X) from the 5
percent sets of all target residues. Middle line identifies
those residues from the 10 percent sets, and highest from
50 percent sets.
The differences between LCS and GDT are the following:
1) LCS (Longest Continuous Segment) is based on RMSD cutoff.
2) The goal of LCS is to localize the longest continuous segment
of residues that can fit under RMSD cutoff.
3) Each residue in a prediction is assigned to the longest continuous
segment provided if is a part of that segment.
4) The data provided in the result files contains the LCS calculated
under three selected values of CA RMSD cutoff: 1A, 2A, and 5A
5) GDT (Global Distance Test) is based on the DISTANCE cutoff.
6) The goal of GDT is to localize the largest set of residues
(not necessary continuous) deviating from the target by no more than
a specified DISTANCE cutoff.
7) Each residue in a prediction is assigned to the largest set of the
residues provided if is a part of that set.
8) The data provided in the result files contains the GDT calculated under
several values of DISTANCE cutoff: 0.5, 1.0, 1.5, ... , 10.0 Angstroms.
Results of the analysis given by LCS algorithm show rather local features of
the model, while the residues considered in GDT come from the whole model
structure (they do not have to maintain the continuity along the sequence).
The GDT procedure is the following. Each three-residue segment and each
continuous segment found by LCS is used as a starting point to give an
initial equivalencies (model-target CA pairs) for a superposition.
The list of equivalencies is iteratively extended to produce the largest
set of residues that can fit under considered distance cutoff.
For collecting data about largest sets of residues the iterative
superposition procedure (ISP) is used.
The goal of the ISP method is to exclude from the calculations atoms
that are more than some threshold (cutoff) distance between the
model and the target structure after the transform is applied.
Starting from the initial set of atoms (C-alphas) the algorithm is the
following:
a) obtain the transform
b) apply the transform
c) identify all atom pairs for which distance is larger than the
threshold
d) re-obtain the transform, excluding those atoms
e) repeat b) - d) until the set of atoms used in calculations
is the same for two cycles running
-------