Abstract
The paper proposes a methodology for expressive speech synthesis by identifying, estimating and modelling the parameters responsible for generation of vocal affect. The existing systems take into account one or more of the prosodic factors and try to model affect based on information in prosody or spectrum or more rarely, both. This paper analyses the incompleteness of the kind of representation used in literature and puts forth a strategy to simultaneously convert prosody and spectral features for effective emotion conversion. Initial studies on prosodic parameter modification proves that certain common characteristics appear across all archetypal emotions which can be kept constant during model development. A prototype system is proposed wherein it takes extracted speech parameters, models it based on target emotion required and builds a generalized model for emotion incorporation which can be used in social environments.