Abstract
Text classification is one of the fundamental tasks in text mining. In the medical domain, there have been a number of studies on text classification in modern medicine clinical notes written in English. However, very limited text classification research has been conducted on clinical notes written in Chinese, especially traditional Chinese medicine (TCM) clinical records. The goal of this study was to investigate features and machine learning classification algorithms for TCM clinical text classification. We collected 7,037 TCM clinical records of famous TCM doctors as our dataset, and investigated the effects of different types of features and classification algorithms. Additionally, we proposed a novel method to combine deep learning text representation with TCM domain knowledge, which results in the best classification performance.