Abstract
The paper describes Spoken Document Retrieval system based on Speech Recognizer, Vector Space Model and high-order Markov chain phonemic transcribing method. Relevance of a document and an query is estimated by a weighted cosine measure. Phonemic transcribing allows to transform a recognized text of a spoken content to phoneme sequences. This phoneme sequences are used for retrieval by text user's query which is transformed to phoneme sequences too. Words matching is performed on phoneme-level based on Kullback-Leibler divergence.