Abstract
Antimicrobial peptides might become crucial in fighting antibiotic resistant bacteria and other infections. Next Generation Sequencing technologies are generating a large amount of data where peptides with antimicrobial activity could be found. Therefore, algorithms that can efficiently determine whether or not a short sequence of amino acids is antimicrobial are needed. In this context, Quantitative Structure-Activity Relationship modeling has paved the way toward the association of the physicochemical properties of peptides to their biological activity. Nowadays, there are algorithms that can compute thousands of physicochemical properties known as molecular descriptors. However, some of these descriptors are irrelevant and some might even mislead the correct classification of the peptide activity. To mitigate this problem, a descriptor selection process must be performed, this will help to improve the classification accuracy and to decrease the computational time required for classification. In a recent work, a general method to weight and select features has been proposed. The method models the descriptor selection problem as a multi-objective optimization problem (MOOP). The main idea is to optimize simultaneously the intra- and inter-class distances. We follow this approach and apply it to the feature selection problem for the classification of antimicrobial peptides. To this aim we modify the original MOOP formulation to avoid bringing together non-antimicrobial peptides. Preliminary results indicate that our approach can substantially reduce the number of required molecular descriptors and improve the performance of classification with respect to the original formulation.