Abstract
Interactions between drugs (also known as drug-drug interactions or DDIs), which may cause adverse affects, are of much concern; predicting, anticipating and avoiding them is key for improving patient safety and treatment outcome. Knowledge of DDIs is important for physicians to avoid adverse effects when prescribing two drugs simultaneously. DDIs are often published in the biomedical literature; however, gathering information about DDIs is time consuming given the shear volume of publications. Automatic text classification can speed up access to documents related to DDIs. However, the biomedical literature contains a relatively small number of publications relevant to DDIs, compared to the vast amount of irrelevant publications. This imbalance can lead to incorrect classification. While methods addressing class imbalance have been introduced to correctly identify items in the minority (relevant) class to improve recall, they often misclassify items in the majority (irrelevant) class, which leads to low precision. To reduce the number of irrelevant documents misclassified as relevant (false positive), we develop a two-stage cascade classifier. In each step, we separate publication abstracts that are DDI-relevant from those that are either drug-irrelevant or drug-relevant but DDI-irrelevant. We compare our classifier with other popular learning methods that aim to handle imbalance, applying the methods to a well-curated corpus consisting of DDI-relevant and DDI-irrelevant PubMed abstracts. Our method achieves higher precision and F1 measure than other methods while maintaining similar recall.