Abstract
Eyesight and speech are two channels that humans naturally use to communicate with each other. However both the eye tracking and the speech recognition technique available today are still far from perfect. Our goal is find how to effectively make use of these error-prone information from both modes, in order to use one mode to correct errors of another mode, overcome the immature of recognition techniques, resolve the ambiguity of the user?s speaking, and improve the interaction speed. The integration strategies and the evaluation experiment demonstrate that these two modalities can be used multimodally to improve the usability and efficiency of user interface, which would not be available to speech-only or gaze-only systems.