High-fidelity Parallel WaveGAN with Multi-band Harmonic-plus-Noise Model - LY Corporation R&D

Publications

CONFERENCE (INTERNATIONAL) High-fidelity Parallel WaveGAN with Multi-band Harmonic-plus-Noise Model

Min-Jae Hwang (Search Solutions Inc), Ryuichi Yamamoto, Eunwoo Song (NAVER), Jae-Min Kim (NAVER)

The 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH 2021)

August 30, 2021

This paper proposes a multi-band harmonic-plus-noise (HN) Parallel WaveGAN (PWG) vocoder. To generate a high-fidelity speech signal, it is important to well-reflect the harmonic-noise characteristics of the speech waveform in the time-frequency domain. However, it is difficult for the conventional PWG model to accurately match this condition, as its single generator inefficiently represents the complicated nature of harmonic-noise structures. In the proposed method, the HN WaveNet models are employed to overcome this limitation, which enable the separate generation of the harmonic and noise components of speech signals from the pitch-dependent sine wave and Gaussian noise sources, respectively. Then, the energy ratios between harmonic and noise components in multiple frequency bands (i.e., subband harmonicities) are predicted by an additional harmonicity estimator. Weighted by the estimated harmonicities, the gain of harmonic and noise components in each subband is adjusted, and finally mixed together to compose the full-band speech signal. Subjective evaluation results showed that the proposed method significantly improved the perceptual quality of the synthesized speech.

Paper : High-fidelity Parallel WaveGAN with Multi-band Harmonic-plus-Noise Model open into new tab or window (external link)