Preliminary results

by **test** on Sun Sep 07, 2008 11:17 am

Nice writing.

I'm not sure I care for this recent fad of trying to use hydrogen bonds for model assessment.

It's such a comprehensively flawed concept, that I'm amazed we are still discussing it - but here are
some pertinent comments:

1. As someone has already pointed out, it is only useful for beta sheets - zero usefulness for all-alpha proteins. Even
in beta sheets it's no use for simple beta meanders where the same hydrogen bond pattern can be observed across
a wide range of sheet curvatures. Why use a method which can only be applied to a subset of protein fold types?
The argument should really just finish there, but to continue...

2. Hydrogen bonding is a complex quantum mechanical phenomenon - any purely geometric definition of a hydrogen
bond will be a crude approximation. Assuming we are not going to do semi-empirical quantum calculations, for example, which
crude approximation of a hydrogen bond do we opt to use? The old distance-based DSSP definition? Baker and Hubbard?
Dreiding/CHARMm potential? What cutoff do we set for the minimum energy permissible for a hydrogen bond? What about
steric hindrance, bifurcation or competition with surrounding solvent in accessible areas of the model?

3. What's so special about hydrogen bonds anyway? Why not also look at the similarity of accessible atomic surface area and that way
take the non-polar parts of the model into account? That could even be applied to all protein fold classes - not that I'm seriously
recommending this criterion, I hasten to add!

4. The only reason these hydrogen bond evaluation schemes have any perceived value is that they encompass geometric information
beyond the C-alpha trace. It's plainly daft to evaluate high resolution models on just C-alpha positions but why not just address that issue
directly rather than adding the fuzziness of hydrogen bond definitions into the mix? Use main chain RMSDs or even all-atom RMSDs if you want
more resolution than C-alphas can provide. A main chain atom RMSD of zero will by definition produce exactly the same main chain hydrogen bond list between two models (using simple geometric HB definitions at least). A C-alpha RMSD of zero will not necessarily produce the same main chain hydrogen bond list due to the inaccuracy inherent in building main chain coordinates from C-alpha traces.

In my view we should be replacing GDT-HA with geometric definitions based on both main chain and side chain atom distances not mixtures of C-alpha metrics combined with arbitrary hydrogen bond definitions.

For example, we could define something like this:

GDT(C-alpha / 2A cutoff) + GDT(C-alpha / 1A cutoff) + GDT(main chain / 0.5A cutoff) + GDT(side chain atoms / 0.5A cutoff)
---------------------------------------------------------------------------------------------------------------------------------------------------
4

This would produce a score that gives some credit for basic alignment accuracy (the C-alpha components), some credit
for main chain geometry (including main chain hydrogen bonds) and the last bit of credit for putting the side chain atoms in the
right places (which will even include side chain hydrogen bonding). Of course the selection of terms and distance-cutoffs is something that
could (and no doubt should) be tuned.

by **guest2** on Sun Sep 07, 2008 11:47 am

Agree ! Modeling of main chain atoms other than CA is just a geometric problem, not much physics or
chemistry principle involved, therefore no insight can be gained from "bad modeling" or "good modeling" of
them. For most of the easy, especially extreme easy targets, a "copy and paste" from template will do
good job.

I'm not sure I care for this recent fad of trying to use hydrogen bonds for model assessment.

It's such a comprehensively flawed concept, that I'm amazed we are still discussing it - but here are
some pertinent comments:

1. As someone has already pointed out, it is only useful for beta sheets - zero usefulness for all-alpha proteins. Even
in beta sheets it's no use for simple beta meanders where the same hydrogen bond pattern can be observed across
a wide range of sheet curvatures. Why use a method which can only be applied to a subset of protein fold types?
The argument should really just finish there, but to continue...

2. Hydrogen bonding is a complex quantum mechanical phenomenon - any purely geometric definition of a hydrogen
bond will be a crude approximation. Assuming we are not going to do semi-empirical quantum calculations, for example, which
crude approximation of a hydrogen bond do we opt to use? The old distance-based DSSP definition? Baker and Hubbard?
Dreiding/CHARMm potential? What cutoff do we set for the minimum energy permissible for a hydrogen bond? What about
steric hindrance, bifurcation or competition with surrounding solvent in accessible areas of the model?

3. What's so special about hydrogen bonds anyway? Why not also look at the similarity of accessible atomic surface area and that way
take the non-polar parts of the model into account? That could even be applied to all protein fold classes - not that I'm seriously
recommending this criterion, I hasten to add!

4. The only reason these hydrogen bond evaluation schemes have any perceived value is that they encompass geometric information
beyond the C-alpha trace. It's plainly daft to evaluate high resolution models on just C-alpha positions but why not just address that issue
directly rather than adding the fuzziness of hydrogen bond definitions into the mix? Use main chain RMSDs or even all-atom RMSDs if you want
more resolution than C-alphas can provide. A main chain atom RMSD of zero will by definition produce exactly the same main chain hydrogen bond list between two models (using simple geometric HB definitions at least). A C-alpha RMSD of zero will not necessarily produce the same main chain hydrogen bond list due to the inaccuracy inherent in building main chain coordinates from C-alpha traces.

In my view we should be replacing GDT-HA with geometric definitions based on both main chain and side chain atom distances not mixtures of C-alpha metrics combined with arbitrary hydrogen bond definitions.

For example, we could define something like this:

GDT(C-alpha / 2A cutoff) + GDT(C-alpha / 1A cutoff) + GDT(main chain / 0.5A cutoff) + GDT(side chain atoms / 0.5A cutoff)
---------------------------------------------------------------------------------------------------------------------------------------------------
4

This would produce a score that gives some credit for basic alignment accuracy (the C-alpha components), some credit
for main chain geometry (including main chain hydrogen bonds) and the last bit of credit for putting the side chain atoms in the
right places (which will even include side chain hydrogen bonding). Of course the selection of terms and distance-cutoffs is something that
could (and no doubt should) be tuned.

by **kevin_karplus** on Sun Sep 07, 2008 12:08 pm

Given a 100% accuracy of CA-trace, what else information a main chain H-bond can give you? I guess only
side-chain H-bond prediction is a relevent challenging problem that CASP needs to address this time or
in the future.
It is actually fairly difficult to construct an accurate model from just a CA trace. Even for the backbone there are still n-1 degrees of freedom for the rotation of the peptide planes.

The H-bond test is more sensitive to proper peptide-plane orientation than an RMSD test, and less sensitive to small rotations of secondary and super-secondary units. An Hbond test is testing local quality of the model, rather than global superposition. I'm not going to argue that it is a better test than GDT—in fact, for first-approximation tests of quality GDT does a much better job. But, it does do a nice complementary test that captures features of quality that GDT misses and which rmsd is not very good at either.

Both backbone and sidechain Hbond tests are relevant, but I suspect that we are all doing very badly on sidechain hbonds.

by **test** on Sun Sep 07, 2008 12:33 pm

If your purpose of using HB is to test only local quality of a model, then it only makes sense to apply HB to those very easy targets (say, GDT score > 0.8).
For hard targets, if the GDT score is bad, I don't see why we need to do a local test.

Given a 100% accuracy of CA-trace, what else information a main chain H-bond can give you? I guess only
side-chain H-bond prediction is a relevent challenging problem that CASP needs to address this time or
in the future.
It is actually fairly difficult to construct an accurate model from just a CA trace. Even for the backbone there are still n-1 degrees of freedom for the rotation of the peptide planes.

The H-bond test is more sensitive to proper peptide-plane orientation than an RMSD test, and less sensitive to small rotations of secondary and super-secondary units. An Hbond test is testing local quality of the model, rather than global superposition. I'm not going to argue that it is a better test than GDT—in fact, for first-approximation tests of quality GDT does a much better job. But, it does do a nice complementary test that captures features of quality that GDT misses and which rmsd is not very good at either.

Both backbone and sidechain Hbond tests are relevant, but I suspect that we are all doing very badly on sidechain hbonds.

by **guest** on Sun Sep 07, 2008 12:47 pm

Regarding main chain HB score. It reflects only N-O contact correctness(+ some angle criteria). If one wants to
assess local property, other ( main chain ) contacts should have the same weights.

by **djones** on Mon Sep 08, 2008 5:12 am

Good idea. But the cutoffs of 2/1/0.5 are too small. Credit should also be given to those with an error of 3A-4A or even 5A for the TBM models, because they are indeed different from an error of 6A or 8A. In your equation, errors in the region of 0.5A is over-counted. No matter you count C-alpha or main-chain or side-chain atoms, they are highly corrected.

As I said, I would certainly consider the choice of terms and distance thresholds topics for a bit of research. Nevertheless I don't think we can easily hold onto the ideal of having one score which "rules them all" i.e. works equally well for lousy (template-free), fairly good (TBM) and excellent (refinement) models. The distances I chose, whilst they are fairly arbitrary guesses, are probably reasonable for assessing what it is that we expect high resolution refinement programs to do with high homology models.

Currently, GDT-TS works well for medium quality models - basically it's a good "fold recognition" evaluation score - to use the old CASP parlance. It works poorly for template-free modelling, where it cannot distinguish between random models and topologically correct models (hence the visual assessment efforts that previous assessors have been forced to use). For the very best template-free models it does allow them to be compared usefully with template-based models, but for the majority of template-free models, the scores are not very discriminative (although this may be more of a criticism of the models than the score). It goes without saying that it is useless for high resolution modelling because it only looks at the C-alpha positions.

by **kevin_karplus** on Mon Sep 08, 2008 8:10 am

As I said, I would certainly consider the choice of terms and distance thresholds topics for a bit of research. Nevertheless I don't think we can easily hold onto the ideal of having one score which "rules them all" i.e. works equally well for lousy (template-free), fairly good (TBM) and excellent (refinement) models. The distances I chose, whilst they are fairly arbitrary guesses, are probably reasonable for assessing what it is that we expect high resolution refinement programs to do with high homology models.

Well said. I think that we do not have any measures that are much good for finding value in the trash---the low-quality models are not well sorted by GDT, and it probably does better than any other measure (other than closely related ones like TM-score). GDT is good in the mid-range, where templates or fragment-based modeled produce a model of the correct fold, but with considerable difference in the details. We need to look at many new measures for high-resolution models, since none of the current measures quite captures quality.

by **kevin_karplus** on Mon Sep 08, 2008 8:13 am

Regarding main chain HB score. It reflects only N-O contact correctness(+ some angle criteria). If one wants to
assess local property, other ( main chain ) contacts should have the same weights.

Backbone hbonds capture most of the backbone correctness that is missing from C-alpha traces, in a compact way.

Sidechain hbonds capture a lot of the interesting things going on with sidechains, but no one seems to be able to produce models that get them right (except for copying from very close tempates).

There are many other properties of high-resolution models that are worth considering, but H-bonds are particularly important to proteins, so worth examining in their own right.

by **guest** on Tue Oct 28, 2008 7:14 pm

Regarding main chain HB score. It reflects only N-O contact correctness(+ some angle criteria). If one wants to
assess local property, other ( main chain ) contacts should have the same weights.

Backbone hbonds capture most of the backbone correctness that is missing from C-alpha traces, in a compact way.

Not necessarily, examples were mentioned in DJones's and related posts. For helical proteins and some
common patterns of beta-sheet not defining overall topology, main-chain HB score is misleading.

by **prash108nc** on Thu Apr 15, 2010 10:50 am

I feel like 8A is too large a distance to be
meaningful even for FM targets (however, 8A gives us a complacent feeling of good protein modeling)

Preliminary results

Re: Preliminary results

Re: Preliminary results

Re: Preliminary results

Re: Preliminary results

Re: Preliminary results

Re: Preliminary results

Re: Preliminary results

Re: Preliminary results

Re: Preliminary results

Re: Preliminary results

Who is online