Abstract
This paper proposes a simultaneous modeling of spectrum and F0 for voice conversion based on MSD (Multi-Space Probability Distribution) models. As a conventional technique, a spectral conversion based on GMM (Gaussian Mixture Model) has been proposed. Although this technique converts spectral feature sequences nonlinearly based on GMM, F0 sequences are usually converted by a simple linear function. This is because F0 is undefined in unvoiced segments. To overcome this problem, we apply MSD models. The MSD-GMM allows to model continuous F0 values in voiced frames and a discrete symbol representing unvoiced frames within an unified framework. Furthermore, the MSD-HMM is adopted to model long term correlations in F0 sequences.