Abstract
Sequence conservation related to protein function has been discovered via protein sequence alignment and pattern mining. In contrast, our motivation is to mine structure conservation via frequent itemset mining from the viewpoint of structure. In order to describe local structure, neighborhood residue sphere (NRS) is proposed, which is a sphere with 10 A radius of each residue with the combination of sequence and spatial information. Currently, we obtain 56,164 NRSs among 456 EC labels of local conserved region out of total 646 EC labels. In EC label prediction, our experimental results reveal 80.61% Confidence and 53% Accuracy while selecting 1,000 proteins with sequence identity less than 60% from 13,373 enzymes among 563 EC labels. Due to the coverage rate is around 80% higher than CSA and Protemot, the Confidence is almost doubled in comparing with CSA and Protemot. In this study, we choose alternative to figure out function-related local structure without using protein binding site information of protein-ligand complexes.