Abstract
Continued surveillance of post-marketing Adverse Drug Events (ADEs) is considered essential for patient safety, and Electronic Health Records (EHRs) serve as a critical source for identifying relevant information. But effective EHR knowledge discovery and data mining is not trivial because involved data usually have significantly different semantics among each other. Semantic technologies are believed to greatly assist in this regard; unfortunately, semantic technologies and conventional data mining remain largely separate disciplines, and the fusion of these two disciplines is still in its infancy. This position paper explores two semantics-driven frequent data pattern mining algorithms for EHR knowledge discovery, aiming at more effective ADE monitoring in a population. By effectively utilizing human knowledge formally encoded in EHR domain ontologies, our proposed algorithms will enhance the identification of the drug ADE causality out of large amounts of heterogeneous data sets. Through mining a large corpus of representative EHRs at semantic level, we will be able to compile a comprehensive list of ADE endpoints by obtaining critical, but originally hidden and implicit, frequent data patterns. Ultimately, our software to be developed will significantly facilitate effective ADE monitoring and prediction. Moreover, our research is expected to produce broader impacts on the pharmaceutical industry by reducing the R & D cost for new drug discovery and on transforming current pharmacovigilance methods to reduce adverse events and hence improve human health.