Publications
CONFERENCE (INTERNATIONAL) Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation
Yu Nakagome (Waseda University), Masahito Togami, Tetsuji Ogawa (Waseda University), Tetsunori Kobayashi (Waseda University)
The 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH 2021)
August 30, 2021
This study presents a framework to enable efficient and stable adversarial learning of unsupervised multichannel source separation models. When the paired data, i.e., the mixture and the corresponding clean speech, are not available for training, it is promising to exploit generative adversarial networks (GANs), where a source separation system is treated as a generator and trained to bring the distribution of the separated (fake) speech closer to that of the clean (real) speech. The separated speech, however, contains many errors, especially when the system is trained unsupervised and can be easily distinguished from the clean speech. A real/fake binary discriminator therefore will stop the adversarial learning process unreasonably early. This study aims to balance the convergence of the generator and discriminator to achieve efficient and stable learning. For that purpose, the autoencoder-based discriminator and more stable adversarial loss, which are designed in boundary equilibrium GAN (BEGAN), are introduced. In addition, generator-specific distortions are added to real examples so that the models can be trained to focus only on source separation. Experimental comparisons demonstrated that the present stabilizing learning techniques improved the performance of multiple unsupervised source separation systems.
Paper : Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation (external link)