Abstract
The resonant frequencies, or chemical shifts, of nuclear magnetic resonance (NMR) active nuclei in proteins are determined by covalent and through-space interactions and, more generally, the electronic environment surrounding each nucleus. However, the precise nature of the correlation between protein three-dimensional (3D) structure and chemical shift remains largely unsolved. Thus, chemical shift prediction is a non-trivial task. This study tests the accuracy of three existing structure-based chemical shift prediction algorithms (SHIFTS, SHIFTX, PROSHIFT) against REFDB, a large database of experimentally determined, and manually re-referenced ¹H, ¹³C, and ¹⁵N chemical shifts.We report that the accuracy of backbone chemical shift predictions for each program is lower than that originally reported. This suggests these programs over-fit the data used in their construction. We then compare two novel methods for chemical shift prediction based on support vector machines (SVM) and bagging respectively. Each method was trained on REFDB using predictions made by SHIFTS, SHIFTX, and PROSHIFT as features. In cross-validated experiments, bagging is shown to be superior to SVMs, while both methods are substantially better than SHIFTS, SHIFTX, and PROSHIFT. Our results suggest that meta-methods for chemical shift prediction yield increased accuracy for chemical shift prediction.