Voice conversion based on simultaneous modelling of spectrum and F0

Kaori Yutani; Yosuke Uto; Yoshihiko Nankaku; Akinobu Lee; Keiichi Tokuda

doi:10.1109/ICASSP.2009.4960479

Acoustics, Speech, and Signal Processing, IEEE International Conference on

Voice conversion based on simultaneous modelling of spectrum and F0

Year: 2009, Pages: 3897-3900

DOI Bookmark: 10.1109/ICASSP.2009.4960479

Authors

Kaori Yutani, Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya, Japan
Yosuke Uto, Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya, Japan
Yoshihiko Nankaku, Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya, Japan
Akinobu Lee, Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya, Japan
Keiichi Tokuda, Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya, Japan

Abstract

This paper proposes a simultaneous modeling of spectrum and F0 for voice conversion based on MSD (Multi-Space Probability Distribution) models. As a conventional technique, a spectral conversion based on GMM (Gaussian Mixture Model) has been proposed. Although this technique converts spectral feature sequences nonlinearly based on GMM, F0 sequences are usually converted by a simple linear function. This is because F0 is undefined in unvoiced segments. To overcome this problem, we apply MSD models. The MSD-GMM allows to model continuous F0 values in voiced frames and a discrete symbol representing unvoiced frames within an unified framework. Furthermore, the MSD-HMM is adopted to model long term correlations in F0 sequences.

Like what you’re reading?

Already a member?

Get this article FREE with a new membership!

Reducing F0 Frame Error of F0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend
Acoustics, Speech, and Signal Processing, IEEE International Conference on
Probablistic modelling of F0 in unvoiced regions in HMM based speech synthesis
Acoustics, Speech, and Signal Processing, IEEE International Conference on
F0 Control Model for Mandarin Singing Voice Synthesis
2007 Second International Conference on Digital Telecommunications (ICDT'07)
Fundamental frequency modeling using wavelets for emotional voice conversion
2015 International Conference on Affective Computing and Intelligent Interaction (ACII)
Prosody Modeling from Tone to Intonation in Chinese using a Functional F0 Model
2008 Second International Symposium on Universal Communication
Emotional voice conversion using deep neural networks with MCC and F0 features
2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS)
Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum
Acoustics, Speech, and Signal Processing, IEEE International Conference on
Speaker identification using Gaussian mixture models based on multi-space probability distribution
Acoustics, Speech, and Signal Processing, IEEE International Conference on
A Study on Jitter, Shimmer and F0 of Mandarin Infant Voice by Developing an Applied Method of Voice Signal Processing
International Congress on Image and Signal Processing (CISP 2008)
Voice conversion: From spoken vowels to singing vowels
2010 IEEE International Conference on Multimedia and Expo

Voice conversion based on simultaneous modelling of spectrum and F0

Authors

Abstract

Related Articles