カンファレンス (国内) Diffusion-Mixing Process for Speech Source Separation

シャイブラー ロビン, Ji Youna (NAVER), Chung Soo-Whan (NAVER), Byun Jaeuk (NAVER), Choe Soyeon (NAVER), Choi Min-Seok (NAVER)

日本音響学会 2023年春季研究発表会 (ASJ 2023 spring)


Score-based generative modelling (SGM) has come to prominence for the high-quality generation of images [3]. More recently, diffusion-based methods for speech synthesis and enhancement have been proposed. However, the important case of separating sources that are from the same distribution, e.g. speech, has not been tackled. We propose a framework based on the stochastic differential equation (SDE) formulation of SGM. In particular, we propose a new SDE to model the process that takes the separated sources to a Gaussian distribution whose mean is the mixture. We can then train a neural network to approximate the score function of this process. Since the reverse SDE exists, we can solve it to separate the sources.