PFRMAT AL 
TARGET T0078 
AUTHOR 3873-9906-1225 
REMARK Submission 1 
REMARK Work by Gidon Moont (1) , Lawrence Kelley (1), 
REMARK Bob MacCallum (1), Marcel Turcotte (1) Mansoor Saqi (2) 
REMARK and Michael Sternberg (1) (m.sternberg@icrf.icnet.uk) 
REMARK (1) Biomolecular Modelling Laboratory, 
REMARK Imperial Cancer Research Fund 
REMARK (1) Lincoln's Inn Fields, London WC2A 3PX, UK 
REMARK (2) Bioinformatics Group, GlaxoWellcome, Stevenage, UK 
METHOD 
METHOD Method outline 
METHOD --------------- 
METHOD *************************** 
METHOD - SEE NEW METHOD 3D-PSSM --- 
METHOD ***************************** 
METHOD unknown = target, library of known folds = template 
METHOD (0) Initial check for remote homology of target 
METHOD to templates of known structures using PSI-BLAST 
METHOD (1) Secondary structure & sequence target against fold 
METHOD template library using FOLDFIT 
METHOD (2) Multiple structure / multiple sequence matching 
METHOD  against fold template library (3D-PSSM) *** NEW METHOD*** 
METHOD (3) Local hydrophobicity and predicted secondary structure 
METHOD matched for target and template using SIVA (MacCallum & 
METHOD Thornton) 
METHOD (4) Filter top hits from above against topological rules 
METHOD for folds derived by an artificial intelligent type machine 
METHOD learning approach  (PROGOL) , Turcotte, Muggleton & 
METHOD Sternberg) 
METHOD (5) Evaluation of above results in terms of literature and 
METHOD function of target. 
METHOD 
METHOD General features of approach 
METHOD ----------------------------- 
METHOD 
METHOD (i) The fold (template) library consists of non-redundant 
METHOD SCOP domains with <40% sequence identity per family (called 
METHOD SCOP40). 
METHOD 
METHOD (ii) Secondary structure prediction from multiple alignment 
METHOD (homologues gathered with PSI-BLAST) DSC (King & 
METHOD Sternberg); PHD (Rost & Sander); JPRED (Barton) 
METHOD 
METHOD Method details 
METHOD -------------- 
METHOD 
METHOD (1) FOLDFIT (Russell,R.B., Saqi, M.A.S., Bates,P.A., 
METHOD Sayle,R.A.  & Sternberg, M.J.E. (1998). Prot Eng 11, 1-9.) 
METHOD The target is represented by sequence and predicted 
METHOD secondary structure and scanned against known secondary 
METHOD structure and sequence for template in fold library. 
METHOD Different weights for secondary structure and sequence are 
METHOD used to obtain different possible top hits. 
METHOD 
METHOD (2) 3D-PSSM - Structures within the same SCOP fold family 
METHOD are aligned in 3D and if structures can be superposed well 
METHOD then each is used together with all homologous 
METHOD sequences in sequence database found by PSI-BLAST. 
METHOD These 3D-PSSMs were generated for each template. 
METHOD The target is matched against each template, 
METHOD (3D-PSSM, Kelley, MacCallum, Saqi & Sternberg, unpublished). 
METHOD NOW INCLUDING PREDICTED SECONDARY STRUCTURE 
METHOD as in FOLDFIT. 
METHOD 
METHOD (3) Vector-based alignment of per-residue hydrophobicity 
METHOD and DSC predicted secondary structure probabilities for 
METHOD both target and template. This approach could also 
METHOD be used in the absence of known structures for library 
METHOD sequences.  Algorithm is SIVA (MacCallum & Thornton, 
METHOD unpublished) 
METHOD 
METHOD (4) Using an artificial intelligence based machine learning 
METHOD algorithm (PROGOL, Muggleton et al), we have obtained 
METHOD expert system type rules governing protein folds (Turcotte, 
METHOD Muggleton & Sternberg).  These rules include data on 
METHOD patterns and types of secondary structures including 
METHOD length, loop length and hydrophobicity.  Top hits from all 
METHOD the above methods were screened against rules for the folds 
METHOD to assess their likelihood. 
METHOD 
METHOD (5) Visual inspection of results. 
METHOD 
METHOD T0078 
METHOD top hit with 3DPSSMs was alpha/beta hydrolase 
METHOD high score with SIVA was also alpha/beta hydrolase 
METHOD In the a/b hydrolase is another thioestesterase (1tht_B) 
METHOD that was taken as the template. 
METHOD Manual alignment identified conserved Ser Asp His 
METHOD in unknow that could be equivalenced to 
METHOD catalytic triad of thioesterase. 
METHOD Hence model. 
MODEL 1 
PARENT 1tht_B 
S   -2 I 8 
S   -1 A 9 
M    1 H 10 
G    2 V 11 
Q    3 L 12 
A    4 R 13 
L    5 V 14 
K    6 N 15 
N    7 N 16 
L    8 G 17 
L    9 Q 18 
T   10 E 19 
L   11 L 20 
L   12 H 21 
N   13 V 22 
L   14 W 23 
E   15 E 24 
K   16 T 25 
I   17 P 26 
E   18 P 27 
E   19 K 28 
G   20 E 29 
L   21 N 30 
F   22 V 31 
R   23 P 32 
G   24 F 33 
Q   25 K 34 
S   26 N 35 
E   27 N 36 
D   28 T 37 
L   29 I 38 
G   30 L 39 
L   31 I 40 
R   32 A 41 
Q   33 S 42 
V   34 G 43 
F   35 F 44 
Q   42 H 50 
A   43 F 51 
L   44 A 52 
Y   45 G 53 
A   46 L 54 
A   47 A 55 
K   48 E 56 
E   49 Y 57 
T   50 L 58 
V   51 S 59 
P   52 T 60 
E   53 N 61 
E   54 G 62 
R   55 F 63 
L   56 H 64 
V   57 V 65 
H   58 F 66 
S   59 R 67 
F   60 Y 68 
H   61 D 69 
S   62 S 70 
T   80 T 87 
L   81 T 88 
R   82 G 89 
D   83 K 90 
G   84 N 91 
N   85 S 92 
S   86 L 93 
F   87 C 94 
S   88 T 95 
A   89 V 96 
R   90 Y 97 
R   91 H 98 
V   92 W 99 
A   93 L 100 
A   94 Q 101 
I   95 T 102 
Q   96 K 103 
N   97 G 104 
G   98 T 105 
K   99 Q 106 
P  100 N 107 
I  101 I 108 
F  102 G 109 
Y  103 L 110 
M  104 I 111 
T  105 A 112 
A  106 A 113 
S  107 S 114 
F  108 L 115 
Q  109 S 116 
A  110 A 117 
P  111 R 118 
E  112 V 119 
A  113 A 120 
G  114 Y 121 
F  115 E 122 
E  116 V 123 
H  117 I 124 
Q  118 S 125 
S  139 F 131 
L  140 L 132 
A  141 I 133 
H  142 T 134 
L  143 A 135 
L  144 V 136 
L  169 L 189 
K  170 D 190 
G  171 S 191 
H  172 T 192 
V  173 L 193 
A  174 D 194 
E  175 K 195 
P  176 V 196 
H  177 A 197 
R  178 N 198 
Q  179 T 199 
V  180 S 200 
W  181 V 201 
I  182 P 202 
R  183 L 203 
A  184 I 204 
N  185 A 205 
G  186 F 206 
S  187 T 207 
V  188 A 208 
P  189 N 209 
D  190 N 210 
D  191 D 211 
L  192 D 212 
R  193 W 213 
V  194 V 214 
H  195 K 215 
Q  196 Q 216 
Y  197 E 217 
L  198 E 218 
L  199 V 219 
G  200 Y 220 
Y  201 D 221 
A  202 M 222 
S  203 L 223 
F  219 G 229 
L  220 H 230 
E  221 C 231 
P  222 K 232 
G  223 L 233 
I  224 Y 234 
Q  225 S 235 
I  226 L 236 
A  227 L 237 
T  228 G 238 
I  229 S 239 
D  230 S 240 
H  231 H 241 
S  232 D 242 
M  233 L 243 
W  234 G 244 
F  239 R 251 
N  240 N 252 
L  241 F 253 
N  242 Y 254 
E  243 Q 255 
W  244 S 256 
L  245 V 257 
L  246 T 258 
Y  247 K 259 
S  248 A 260 
V  249 A 261 
E  250 I 262 
S  251 A 263 
T  252 M 264 
S  253 D 265 
A  254 G 266 
S  255 G 267 
S  256 S 268 
G  259 L 269 
F  260 E 270 
V  261 I 271 
R  262 D 272 
G  263 V 273 
E  264 D 274 
F  265 F 275 
Y  266 I 276 
T  267 E 277 
Q  268 P 278 
D  269 D 279 
G  270 F 280 
V  271 E 281 
L  272 Q 282 
V  273 L 283 
A  274 T 284 
S  275 I 285 
T  276 A 286 
V  277 T 287 
Q  278 V 288 
E  279 N 289 
G  280 E 290 
V  281 R 291 
M  282 R 292 
R  283 L 293 
N  284 K 294 
H  285 A 295 
TER 
END 
