カンファレンス (国際) Diffusion-based Generative Speech Source Separation

Robin Scheibler, Youna Ji (NAVER Cloud), Soo-Whan Chung (NAVER Cloud), Jaeuk Byun (NAVER Cloud), Soyeon Choe (NAVER Cloud), Min-Seok Choi (NAVER Cloud)

2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)


We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to a Gaussian distribution centered on their mixture. This formulation lets us apply the machinery of score-based generative modelling. First, we train a neural network to approximate the score function of the marginal probabilities of the diffusion-mixing process. Then, we use it to solve the reverse time SDE that progressively separates the sources starting from their mixture. We propose a modified training strategy to handle model mismatch and source permutation ambiguity. Experiments on the WSJ0_2mix dataset demonstrate the potential of the method. Furthermore, the method is also suitable for speech enhancement and shows performance competitive with prior work on the VoiceBank-DEMAND dataset.

Paper : Diffusion-based Generative Speech Source Separation新しいタブまたはウィンドウで開く (外部サイト)