Nice writing.
I'm not sure I care for this recent fad of trying to use hydrogen bonds for model assessment.
It's such a comprehensively flawed concept, that I'm amazed we are still discussing it - but here are
some pertinent comments:
1. As someone has already pointed out, it is only useful for beta sheets - zero usefulness for all-alpha proteins. Even
in beta sheets it's no use for simple beta meanders where the same hydrogen bond pattern can be observed across
a wide range of sheet curvatures. Why use a method which can only be applied to a subset of protein fold types?
The argument should really just finish there, but to continue...
2. Hydrogen bonding is a complex quantum mechanical phenomenon - any purely geometric definition of a hydrogen
bond will be a crude approximation. Assuming we are not going to do semi-empirical quantum calculations, for example, which
crude approximation of a hydrogen bond do we opt to use? The old distance-based DSSP definition? Baker and Hubbard?
Dreiding/CHARMm potential? What cutoff do we set for the minimum energy permissible for a hydrogen bond? What about
steric hindrance, bifurcation or competition with surrounding solvent in accessible areas of the model?
3. What's so special about hydrogen bonds anyway? Why not also look at the similarity of accessible atomic surface area and that way
take the non-polar parts of the model into account? That could even be applied to all protein fold classes - not that I'm seriously
recommending this criterion, I hasten to add!
4. The only reason these hydrogen bond evaluation schemes have any perceived value is that they encompass geometric information
beyond the C-alpha trace. It's plainly daft to evaluate high resolution models on just C-alpha positions but why not just address that issue
directly rather than adding the fuzziness of hydrogen bond definitions into the mix? Use main chain RMSDs or even all-atom RMSDs if you want
more resolution than C-alphas can provide. A main chain atom RMSD of zero will by definition produce exactly the same main chain hydrogen bond list between two models (using simple geometric HB definitions at least). A C-alpha RMSD of zero will not necessarily produce the same main chain hydrogen bond list due to the inaccuracy inherent in building main chain coordinates from C-alpha traces.
In my view we should be replacing GDT-HA with geometric definitions based on both main chain and side chain atom distances not mixtures of C-alpha metrics combined with arbitrary hydrogen bond definitions.
For example, we could define something like this:
GDT(C-alpha / 2A cutoff) + GDT(C-alpha / 1A cutoff) + GDT(main chain / 0.5A cutoff) + GDT(side chain atoms / 0.5A cutoff)
---------------------------------------------------------------------------------------------------------------------------------------------------
4
This would produce a score that gives some credit for basic alignment accuracy (the C-alpha components), some credit
for main chain geometry (including main chain hydrogen bonds) and the last bit of credit for putting the side chain atoms in the
right places (which will even include side chain hydrogen bonding). Of course the selection of terms and distance-cutoffs is something that
could (and no doubt should) be tuned.