Abstract
Protein structure comparison algorithms can be used to identify distantly related proteins or to categorize differences in binding specificities. When they are presented in different conformations, distantly related proteins can go unrecognized unless flexible representations of whole protein structures are used. Such representations offer a sophisticated description of backbone motion, but they do not incorporate the potential motion of every atom. Thus, existing representations, both rigid and flexible, cannot compensate for atomic motions that can make binding sites with similar binding preferences appear different. To bridge this gap, this paper presents a tool for comparing protein binding sites despite conformational changes in the binding site. Our method employs ensemble clustering techniques to incorporate the diversity of binding site variations observed in conformational samples of binding site motion. We applied the method on protein conformations of serine proteases and enolase superfamilies. Our results demonstrate that this approach can distinguish proteins with similar binding preferences in the presence of considerable binding site flexibility.