Speaker and Style Adaptation Using Average Voice Model
for Style Control in HMM-based Speech Synthesis


Target Speaker: female speaker FTY

切符を買うのは自動販売機からである。 (Japanese sentence)
/ki-cl-pu-o ka-u-no-wa ji-do-o-ha-N-ba-i-ki ka-ra-de-a-ru/
(It is from a ticket machine that we buy the ticket.)

  Speaker-dependent MRHSMM (Conventional) (Proposed)
Initial Model
target speaker
(450 sentences × 4 styles)
target speaker
(50 sentences × 4 styles)
male speaker MMI
(450 sentences × 4 styles)
Average Voice Model
(450 sentences × 9 speakers)
Adaptation Data
no adaptation no adaptation 50 sentences × 4 styles 50 sentences × 4 styles
Neutral 
Sad
Joyful 
Rough (Impolite)

 


Style Control Using Proposed Technique

Target Speaker: male speaker MMI

今度は河豚の季節に行ってみたい。 (Japanese sentence)
/ko-N-do-wa fu-gu-no kI-se-tsu-ni i-cl-te-mi-ta-i/
(Next time, I'd like to visit there in the season for blowfish.)

Sad 0.5 Sad 1.0 Sad 1.5 Joyful 1.5 Joyful 1.0 Joyful 0.5 Neutral Rough 0.5 Rough 1.0 Rough 1.5

Reference:
Makoto Tachibana, Shinsuke Izawa, Takashi Nose, Takao Kobayashi,
``Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis,''
Proc. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008, Las Vegas, USA (2008.04)

Style Control for HSMM-Based Speech Synthesis

Style Control for HMM-Based Speech Synthesis


Demos
Top page