Abstract
In this paper, we investigate combining semi-tied covariance matrices and Random Forests (RFs) based phonetic decision trees (PDTs) for acoustic modeling in conversational speech recognition. We first use the RF method to train multiple PDTs for each phone state unit, and generate multiple sets of acoustic models accordingly. We then apply semi-tied covariance matrices to each set of acoustic models to improve their fit to data. In decoding search we combine the likelihood scores from the multiple acoustic models for each speech frame. The viability of semi-tied covariance matrices with different tying classes are studied from their effects on the diversity of RF-based acoustic models as well as on the word accuracy of our task of telehealth automatic captioning. Experimental results indicate that semi-tied covariance matrices help enhance the diversity of the RFs-PDTs based acoustic models as well as increase word accuracy.