Abstract
Recorded meetings are useful only if people can find, access, and browse them easily. Key-frames and video skims are useful representations that can enable quick previewing of the content without actually watching a meeting recording from beginning to end. This paper proposes a new method for creating meeting video skims based on audio and visual activity analysis together with text analysis. Audio activity analysis is performed by analyzing sound directions-indicating different speakers-and audio amplitude. Detection of important visual events in a meeting is achieved by analyzing the localized luminance variations in consideration with the omni-directional property of the video captured by our meeting recording system. Text analysis is based on the term frequency-inverse document frequency measure. The resulting video skims better capture the important meeting content compared to the skims obtained by uniform sampling.