Citing LGA:
Zemla A., "LGA - a Method for Finding 3D Similarities in Protein Structures",
Nucleic Acids Research, 2003, Vol. 31, No. 13, pp. 3370-3374.
[PubMed]
LGA program is being developed for structure comparative analysis of two selected 3D protein structures or fragments of 3D protein structures. Structure comparative analysis can be made in two general modes:
The data for LGA processing should contain two sets of 3D structures coordinates (molecule1 and molecule2) in the format of the PDB standard ATOM records. As a result of LGA processing user will get the rotated coordinates of the first structure (molecule1) , and (optionally) the coordinates of the second structure (target - molecule2, not changed).
For the purpose of structure similarity search and ordering of models (Molecule1: templates, PDB files), the target (Molecule2, frame of reference) should be fixed and then user may sort models (see SUMMARY line from the LGA output) by the number N of superimposed residues (under one selected DIST cutoff), or by GDT_TS (average from four fixed distance cutoffs), or LGA_S value (weighted results from the full set of distance cutoffs, see [1]).
Using LGA system you can choose several options:
-1 standard RMSD
-2 RMSD using ISP (Iterative Superposition Procedure)
-3 LCS and GDT analysis
-4 LGA structure alignment analysis
-d:f.f distance cutoff DIST (f.f Angstroms, 0.10 <= f.f <= 10.0)
(use with -2, -3, or -4 options)
NOTE: lower value of DIST => tighter superposition
larger value of DIST => superposition can be more relaxed
-lw:n 2*n+1 (n>0) is the length of the residue window ("Lesk window"
see [4]) on which the value of local RMSD is calculated
(can be used with -1, -2, -3, -4 options)
-sda facilitates the selection of residues for calculation: sequence
dependent analysis (residue numbering, and chain ID should be the
same in both structures)
-sia facilitates the selection of residues for calculation: sequence
independent analysis
NOTE: you can use -sia option with -1, -2, or -3. In this
case the same number of the first residues from both
structures will be taken for LGA processing.
-atom:CA CA atoms will be used for calculations. NOTE (special character
in the PARAMATER-OPTIONS line): use , instead of '
(for example: H5,1 to select H5'1 atom)
-ch1:A chain A selected from molecule1
-ch2:B chain B selected from molecule2
-ah:i ATOM or HETATM records are used for calculations:
i=0 both
i=1 ATOM
i=2 HETATM
-aa1:n1:n2 range of residues from the molecule1 used for calculations
-9999 < n1 < n2 < 9999
-aa2:n1:n2 range of residues from the molecule2 used for calculations
-9999 < n1 < n2 < 9999
-gap1:n1:n2 range of residues from the molecule1 removed from calculations
-9999 < n1 < n2 < 9999
-gap2:n1:n2 range of residues from the molecule2 removed from calculations
-9999 < n1 < n2 < 9999
-er1:s1:s2 exact range of residues from the molecule1 used for calculations
(s1 , s2 - strings: s1 = 13L_A < s2 = 45_B )
Up to 10 er1 parameters are allowed (WARNING: no overlaps)
-er2:s1:s2 exact range of residues from the molecule2 used for calculations
(s1 , s2 - strings: s1 = 16 < s2 = 245A )
Up to 10 er2 parameters are allowed (WARNING: no overlaps)
-aa generates a list of all residues from the molecule1 and
molecule2 (AAMOL* records)
-o0 only summary results (no coordinates) are reported as an
output from the program
-o1 summary results and the coordinates of molecule_1 (rotated) are reported
as a result of the analysis
-o2 summary results, the coordinates of molecule_1 (rotated) and molecule_2
(target, not changed) both are reported as a result of the analysis
There is the default set of parameters: -4 -sia -o1 -d:5.0
If two structures from PDB have to be analyzed then please use the following notation:
1cpi_A for PDB entry: 1cpi, chain: 'A' 1sip for PDB entry: 1sip, chain: ' 'and specifying NMR MODEL:
1bve_B_5 for PDB entry: 1bve, chain: 'B', model: 5 1awo___7 for PDB entry: 1awo, chain: ' ', model: 7
If your data (two structures) is already prepared as one file then please check if each one of the two 3D structures begins with MOLECULE and ends with END record:
MOLECULE name1 ATOM 1 N ILE 2 1.002 23.117 39.181 1.00 82.49 N ATOM 2 CA ILE 2 1.295 23.768 40.454 1.00 83.70 C --------- ATOM 400 CD1 LEU 54 14.696 9.978 30.085 1.00 56.40 C ATOM 401 CD2 LEU 54 12.844 11.030 31.407 1.00 31.93 C END MOLECULE name2 ATOM 419 N LEU A 57 13.121 3.012 34.495 1.00 40.04 N ATOM 420 CA LEU A 57 13.125 1.748 35.211 1.00 43.79 C --------- ATOM 558 C GLU A 74 7.298 12.565 26.328 1.00 43.72 C ATOM 559 O GLU A 74 6.545 13.347 26.910 1.00 49.34 O END
# Molecule1: number of CA atoms 13 , all atoms 98 , name name1
# Molecule2: number of CA atoms 18 , all atoms 141 , name name2
# PARAMETERS: -4 -sia -o2 -d:2.7
# Sequence Independent Analysis
# Structure alignment analysis
--- residue-residue equivalences reported by LGA ---
# Molecule1 Molecule2 DISTANCE
LGA I 2 L 57_A -
LGA V 3 L 58_A -
LGA T 4 - - -
LGA Q 5 - - -
LGA L 46 Q 59_A 1.586
LGA K 47 K 60_A 0.967
LGA P 48 W 61_A 1.470
LGA T 49 E 62_A 0.595
LGA P 50 N 63_A 2.575
LGA E 51 G 64_A 2.643
LGA G 52 E 65_A 1.206
LGA D 53 C 66_A -
LGA L 54 A 67_A 0.911
LGA - - Q 68_A -
LGA - - K 69_A -
LGA - - K 70_A -
LGA - - I 71_A -
LGA - - I 72_A -
LGA - - A 73_A -
LGA - - E 74_A -
In DISTANCE column the distances in Angstroms between
corresponding residues are reported when final global
superposition is applied ("-" is present when residues
are not aligned under selected distance cutoff DIST).
# N1 N2 DIST N RMSD Seq_Id LGA_S LGA_Q
SUMMARY(LGA) 13 18 2.7 8 1.65 12.50 15.123 0.456
| | | | | | | |
number of residues | | | | | | |
from mol1 (model) | | | | | | |
| | | | | | |
number of residues from | | | | | |
mol2 (target) | | | | | |
| | | | | |
selected distance cutoff DIST | | | | |
| | | | |
N number of residues superimposed under | | | |
distance cutoff DIST | | | |
| | | |
RMSD calculated on N residues superimposed under | | |
distance cutoff DIST | | |
| | |
Sequence Identity. Percent of identical residues from the | |
total of N aligned under distance DIST | |
| |
LGA_S score (0.00 - 100.00) calculated with reference to the number |
of residues in target (mol2 name2 - here 18 residues) |
|
LGA_Q (quality) score calculated with use of the formula: Q=0.1*N/(0.1+RMSD)
(Q below 2.0 indicates rather weak alignment)
Unitary ROTATION matrix and the shift VECTOR superimpose MOLECULES (1=>2)
X_new = 0.727720 * X + 0.667720 * Y + 0.156762 * Z + -87.991600
Y_new = 0.113780 * X + -0.342916 * Y + 0.932450 * Z + 10.198357
Z_new = 0.676371 * X + -0.660726 * Y + -0.325520 * Z + 54.671837
Euler angles from the ROTATION matrix (XYZ convention, two solutions)
Psi = -2.028563 1.113030 [ DEG: -116.2281 63.7719 ]
Theta = -0.742825 -2.398768 [ DEG: -42.5607 -137.4393 ]
Phi = 0.155095 -2.986497 [ DEG: 8.8863 -171.1137 ]
LGA-Parameters -4 -sia -o2 -d:2.7
REMARK ----------------------------------------------------------
REMARK Citing LGA:
REMARK Zemla A., LGA - a Method for Finding 3D Similarities in
REMARK Protein Structures, Nucleic Acids Research, 2003, V. 31,
REMARK No. 13, pp. 3370-3374.
REMARK ----------------------------------------------------------
REMARK Superimposed MOLECULES (1=>2)
REMARK 1: name1
REMARK 2: name2
REMARK Structure alignment analysis. DISTANCE 2.70
REMARK ----------------------------------------------------------
MOLECULE name1 (rotated coordinates)
ATOM 1 N ILE 2 27.152 -15.625 42.659 1.00 82.49 N
ATOM 2 CA ILE 2 28.468 -15.120 43.039 1.00 83.70 C
---------
ATOM 400 CD1 LEU 54 9.942 -3.565 43.787 1.00 56.40 C
ATOM 401 CD2 LEU 54 12.007 -4.798 43.080 1.00 31.93 C
END
MOLECULE name2 (unchanged coordinates)
ATOM 419 N LEU A 57 13.121 3.012 34.495 1.00 40.04 N
ATOM 420 CA LEU A 57 13.125 1.748 35.211 1.00 43.79 C
---------
ATOM 558 C GLU A 74 7.298 12.565 26.328 1.00 43.72 C
ATOM 559 O GLU A 74 6.545 13.347 26.910 1.00 49.34 O
END
# Sequence Dependent Analysis
# GDT and LCS analysis
LCS - RMSD CUTOFF 5.00 length segment l_RMS g_RMS
LONGEST_CONTINUOUS_SEGMENT: 255 7 - 261 3.12 3.12
LCS_AVERAGE: 100.00
LCS - RMSD CUTOFF 2.00 length segment l_RMS g_RMS
LONGEST_CONTINUOUS_SEGMENT: 117 145 - 261 1.99 3.47
LCS_AVERAGE: 37.19
LCS - RMSD CUTOFF 1.00 length segment l_RMS g_RMS
LONGEST_CONTINUOUS_SEGMENT: 44 163 - 206 0.97 3.29
LCS_AVERAGE: 12.60
LCS_GDT MOLECULE-1 MOLECULE-2 LENGTH_OF_THE
LCS_GDT RESIDUE RESIDUE CONTINUOUS
LCS_GDT NAME NUMBER NAME NUMBER SEGMENT GLOBAL DISTANCE TEST - GDT_DATA_COLUMNS NUMBER OF THE RESIDUES: 255
LCS_GDT S 7 S 7 37 96 255 27 121 185 206 216 224 226 230 234 237 239 241 242 243 246 248 249 251 252 253
LCS_GDT V 8 V 8 37 96 255 50 132 188 206 216 224 226 230 234 237 239 241 242 243 246 248 249 251 252 253
LCS_GDT K 9 K 9 37 96 255 36 126 188 206 216 224 226 230 234 237 239 241 242 243 246 248 249 251 252 253
LCS_GDT G 10 G 10 37 96 255 21 117 187 206 216 224 226 230 234 237 239 241 242 243 246 248 249 251 252 253
LCS_GDT L 11 L 11 37 96 255 50 132 188 206 216 224 226 230 234 237 239 241 242 243 246 248 249 251 252 253
LCS_GDT V 12 V 12 37 96 255 50 132 188 206 216 224 226 230 234 237 239 241 242 243 246 248 249 251 252 253
...........................................................................
LCS_GDT A 256 A 256 42 117 255 29 131 188 206 216 224 226 230 234 237 239 241 242 243 246 248 249 251 252 253
LCS_GDT I 257 I 257 42 117 255 20 126 188 206 216 224 226 230 234 237 239 241 242 243 246 248 249 251 252 253
LCS_GDT R 258 R 258 42 117 255 6 88 160 198 209 224 226 230 234 237 239 241 242 243 246 248 249 251 252 253
LCS_GDT M 259 M 259 14 117 255 3 17 53 90 176 204 217 228 234 237 239 241 242 243 246 248 249 251 252 253
LCS_GDT Q 260 Q 260 6 117 255 3 5 25 55 104 158 208 220 233 237 239 241 242 243 246 248 249 251 252 253
LCS_GDT P 261 P 261 6 117 255 4 13 25 36 96 125 179 207 222 234 238 241 242 243 246 248 249 251 252 253
LCS_AVERAGE LCS_A: 49.93 ( 12.60 37.19 100.00 )
GLOBAL_DISTANCE_TEST (the largest set of residues that can fit under specified DISTANCE_CUTOFF)
GDT DIST_CUTOFF 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50 6.00 6.50 7.00 7.50 8.00 8.50 9.00 9.50 10.00
GDT NUMBER_CA 50 132 188 206 216 224 226 230 234 237 239 241 242 243 246 248 249 251 252 253
GDT PERCENT_CA 19.61 51.76 73.73 80.78 84.71 87.84 88.63 90.20 91.76 92.94 93.73 94.51 94.90 95.29 96.47 97.25 97.65 98.43 98.82 99.22
GDT RMS_LOCAL 0.36 0.68 0.89 1.01 1.14 1.27 1.32 1.49 1.58 1.68 1.74 1.89 1.92 2.00 2.25 2.39 2.47 2.67 2.72 2.86
GDT RMS_ALL_CA 3.25 3.24 3.23 3.23 3.24 3.25 3.26 3.30 3.24 3.25 3.26 3.22 3.22 3.21 3.16 3.16 3.16 3.14 3.14 3.13
# N1 N2 DIST N RMSD GDT_TS LGA_S LGA_Q
SUMMARY(GDT) 261 255 4.0 230 1.49 80.000 81.425 14.460
# Molecule1 Molecule2 DISTANCE
LGA S 7 S 7 2.026
LGA V 8 V 8 1.301
LGA K 9 K 9 1.719
LGA G 10 G 10 2.103
LGA L 11 L 11 1.277
LGA V 12 V 12 1.302
LGA A 13 A 13 1.558
LGA V 14 V 14 1.387
...........................................................................
After setting an option: -lw:3
the LGA records will look like below:
# Molecule1 Molecule2 DISTANCE RMSD(lw:3)
LGA S 7 S 7 2.026 -
LGA V 8 V 8 1.301 -
LGA K 9 K 9 1.719 -
LGA G 10 G 10 2.103 0.586
LGA L 11 L 11 1.277 0.596
LGA V 12 V 12 1.302 0.652
LGA A 13 A 13 1.558 0.487
LGA V 14 V 14 1.387 0.412
where in the last column for each residue a RMSD value is
calculated on 3+1+3=7 residues window. This information can be
very helpful to detect local similarity of structures when such
a similarity is difficult to capture from global superposition.
-------------------------------------------------------------------------------
Running program with an option: -aa
the following list will be generated:
AAMOL1 I 2 1
AAMOL1 V 3 2
AAMOL1 T 4 3
AAMOL1 Q 5 4
AAMOL1 L 46 5
AAMOL1 K 47 6
AAMOL1 P 48 7
AAMOL1 T 49 8
AAMOL1 P 50 9
AAMOL1 E 51 10
AAMOL1 G 52 11
AAMOL1 D 53 12
AAMOL1 L 54 13
AAMOL2 L 57 1
AAMOL2 L 58 2
AAMOL2 Q 59 3
AAMOL2 K 60 4
AAMOL2 W 61 5
AAMOL2 E 62 6
AAMOL2 N 63 7
AAMOL2 G 64 8
AAMOL2 E 65 9
AAMOL2 C 66 10
AAMOL2 A 67 11
AAMOL2 Q 68 12
AAMOL2 K 69 13
AAMOL2 K 70 14
AAMOL2 I 71 15
AAMOL2 I 72 16
AAMOL2 A 73 17
AAMOL2 E 74 18
If you attach to the end of your "mol1.mol2" file a list above
(AAMOL* records), then the only residues from that list will be
used for calculations.
...........................................................................
To select for calculations the exact set of residues user may also use
the following options: -er1:s1:s2 , -er2:s1:s2
For example we would like to perform LCS and GDT analysis ("-3" option)
to compare two structures (Molecule1 and Molecule2) in selected regions.
Using the set of parameters below:
-3 -sia -o1 -d:5.0 -er1:10:23 -er2:45_B:50_B -er2:56_B:63_B
the following residue correspondence will be established:
Molecule1 Molecule2
10 45_B
11 46_B
12 47_B
13 48_B
14 49_B
15 50_B
16 56_B
17 57_B
18 58_B
19 59_B
20 60_B
21 61_B
22 62_B
23 63_B
and only these residue-pairs will be used for "-3" calculations.
Remember:
The options -1, -2, -3 work on already established residue-residue
correspondence. The residue-residue correspondence will not be changed
during calculations.
If user needs to find structure alignment (automatically establish the
residue-residue correspondence), then option "-4" has to be used.
...........................................................................
LCS and GDT description
Longest Continuous Segments under specified CA RMSD cutoff (LCS).
The algorithm identifies all the longest continuous segments of residues
in the model deviating from the target by not more than specified
CA RMSD cutoff using many different superpositions.
Each residue in a prediction is assigned to the longest of such segments
provided if is a part of that segment (see LCS_GDT records).
For different values of the CA RMSD cutoff (1.0 A, 2.0 A, and 5.0 A) the
absolutely longest continuous segment in the model is reported as well.
Global Distance Test (GDT). The algorithm identifies in the model
the sets of residues deviating from the target by not more than
specified CA DISTANCE cutoff using many different superpositions.
Each residue from the model is assigned to the largest set of the residues
(not necessary continuous) deviating from the target by no more than a
specified distance cutoff (see LCS_GDT records: GDT_DATA_COLUMNS).
For different values of DISTANCE cutoff (0.5 A, 1.0 A, 1.5 A, ... 10.0 A)
the several measures are reported:
NUMBER_CA - the number of CA's from the "largest set" that can fit
under specified distance cutoff
PERCENT_CA - percent of CA's from the "largest set" comparing to the
total number of CA's in target (see GDT_Pn below)
RMS_LOCAL - RMSD (root mean square deviation) calculated on the
"largest set" of CA's
RMS_ALL_CA - RMSD calculated on all CA after superposition of the
prediction structure to the target structure based on
the "largest set" of CA's
GDT_TS = (GDT_P1 + GDT_P2 + GDT_P4 + GDT_P8)/4.0
where GDT_Pn is an estimation of the percent of residues that can
fit under distance cutoff <= n.0 Angstroms
The GDT procedure is the following. Each three-residue segment and each
continuous segment found by LCS is used as a starting point to give an
initial equivalencies (model-target CA pairs) for a superposition.
The list of equivalencies is iteratively extended to produce the largest
set of residues that can fit under considered distance cutoff.
For collecting data about largest sets of residues the
Iterative Superposition Procedure (ISP) is used.
The goal of the ISP method is to exclude from the calculations atoms
that are more than some threshold (cutoff) distance between the
model and the target structure after the transform is applied.
Starting from the initial set of atoms (C-alphas) the algorithm is the
following:
a) obtain the transform
b) apply the transform
c) identify all atom pairs for which distance is larger than the
threshold
d) re-obtain the transform, excluding those atoms
e) repeat b) - d) until the set of atoms used in calculations
is the same for two cycles running
Results of the analysis given by LCS algorithm show rather local features of
the model, while the residues considered in GDT come from the whole model
structure (they do not have to maintain the continuity along the sequence).
From this point of view GDT can detect the kind of GLOBAL level of structure
similarity.
REFERENCES
[1] A. Zemla: "LGA - a Method for Finding 3D Similarities in Protein Structures",
Nucleic Acids Research, 2003, Vol. 31, No. 13, pp. 3370-3374.
[2] A. Zemla, C. Venclovas, A. Reinhardt, K. Fidelis, T. J. Hubbard: "Numerical
criteria for the evaluation of ab initio predictions of protein structure",
PROTEINS: Structure, Function, and Genetics, Suppl.1, 1997, pp. 140-150.
[3] A. Zemla, C. Venclovas, J. Moult, K. Fidelis: "Processing and Analysis
of CASP3 Protein Structure Predictions", PROTEINS: Structure, Function,
and Genetics, Suppl.3, 1999, pp. 22-29.
[4] Arthur M. Lesk: "CASP2: Report on ab initio predictions",
PROTEINS: Structure, Function, and Genetics, Suppl.1, 1997, pp. 151-166.
[5] A. Zemla, C. Venclovas, J. Moult, K. Fidelis: "Processing and evaluation of
predictions in CASP4", PROTEINS: Structure, Function, and Genetics,
Volume 45, Issue S5, 2001, pp. 13-21.
[6] S. Cristobal, A. Zemla, D. Fischer, L. Rychlewski, A. Elofsson: "A study
of quality measures for protein threading models", BMC Bioinformatics
2001 2: 5 (1 August 2001).
-------------------------------------------------------------------------------
Changes, improvements, development:
-------------------------------------------------------------------------------
### Date: 15 Oct 1999
First version of the LGA program was tested.
### Date: 21 Mar 2000
An extensive analysis of the structure comparison results from PROSUP and LGA programs
used to evaluate CASP3 models was performed. Evaluation results were compared with Alexey
Murzin's "Fold recognition" CASP3 assessment.
### Date: 10 May 2000
An analysis of the LGA performance and other structure comparison programs was
performed. Collaborative work with: S. Cristobal, D. Fischer, L. Rychlewski,
and A. Elofsson.
### Date: 29 Aug 2000
The results of the comparison of different measures used for the analysis of the
quality of protein structure predictions were prepared for the manuscript [6]:
S. Cristobal, A. Zemla, D. Fischer, L. Rychlewski, A. Elofsson: "A study
of quality measures for protein threading models", BMC Bioinformatics
2001 2: 5, 2001.
### Date: 20 Mar 2001
Thanks to the suggestion from Daniel Barsky (barsky@llnl.gov) an option to
perform calculation on selected CA atoms was included (AAMOL1 and AAMOL2 records).
### Date: 06 Sep 2001
"Lesk window" option was included to the program. RMSD value calculated
on length=2*n+1 residue window (-lw:n).
### Date: 15 Jul 2002
Thanks to the suggestion from Dat H. Nguyen (nguyend@gps01.llnl.gov) an option to
perform calculations on chosen atoms (NOT only CA) was included.
-atom:CB CB atoms will be used for calculations. NOTE (special character
in the PARAMATER-OPTIONS line): use , instead of '
(for example: H5,1 to select H5'1 atom)
-ah:i ATOM or HETATM records are used for calculations:
i=0 both (default)
i=1 ATOM
i=2 HETATM
### Date: 05 Jan 2003
Thanks to the discussions with Michael Levitt (michael.levitt@stanford.edu) the
accuracy of LGA (GDT_TS) calculations was improved, and the problem with erroneous
calculations on "singular structures" (compressed coordinates, very small distances
between atoms) was reduced.
### Date: 02 Mar 2003
Thanks to the discussions with Nick Grishin (grishin@chop.swmed.edu)
LGA_S scoring function was improved.
### Date: 11 Oct 2003
Thanks to the suggestion from Bernhard Rupp (br@llnl.gov) the calculation of Euler
angles has been included:
The convention used (XYZ):
psi is about x-axis
theta is about y-axis
phi is about z-axis
and the translation formulas are the following:
theta=-asin(r[1][3]);
psi=atan2(r[2][3],r[3][3]);
phi=atan2(r[1][2],r[1][1]);
c1 = cos(theta); s1 = sin(theta);
c2 = cos(psi); s2 = sin(psi);
c3 = cos(phi); s3 = sin(phi);
r[1][1] = c1 * c3;
r[2][1] = s1 * s2 * c3 - c2 * s3;
r[3][1] = s1 * c2 * c3 + s2 * s3;
r[1][2] = c1 * s3;
r[2][2] = s1 * s2 * s3 + c2 * c3;
r[3][2] = s1 * c2 * s3 - s2 * c3;
r[1][3] = -s1;
r[2][3] = c1 * s2;
r[3][3] = c1 * c2;
LGA reports ROTATION matrix, VECTOR and Euler angles in the following format:
Unitary ROTATION matrix and the shift VECTOR superimpose MOLECULES (1=>2)
X_new = -0.051329 * X + -0.215884 * Y + -0.975069 * Z + 6.470616
Y_new = 0.713412 * X + -0.691165 * Y + 0.115472 * Z + -6.793733
Z_new = -0.698862 * X + -0.689699 * Y + 0.189491 * Z + 65.934860
Euler angles from the ROTATION matrix (XYZ convention, two solutions):
Psi = -1.302667 1.838925 [ DEG: -74.6373 105.3627 ]
Theta = 0.773806 2.367787 [ DEG: 44.3358 135.6642 ]
Phi = 1.642622 -1.498971 [ DEG: 94.1153 -85.8847 ]
### Date: 21 Dec 2003
Alignment verification module has been improved.
### Date: 11 Jan 2004
New options: -er1:s1:s2 and -er2:s1:s2 have been included. This allows to select
for calculations the exact ranges of residues from molecule1 and molecule2.
Example: -er1:10_A:16_A -er1:B:B -er2:8_A:20_A -er2:7S_B:7_C
where: -er1:10_A:16_A selects in molecule1 the residues 10-16 (chain A)
-er1:B:B selects in molecule1 all residues from chain B
-er2:8_A:20_A selects in molecule2 the residues 8-20 (chain A)
-er2:7S_B:7_C selects in molecule2 the residues 7S_B (residue 7 insertion S
from chain B) up to 7_C (residue 7 from chain C)