Joint Training of Deep Neural Networks for Multi-Channel Dereverberation and Speech Source Separation - LY Corporation R&D

Publications

CONFERENCE (INTERNATIONAL) Joint Training of Deep Neural Networks for Multi-Channel Dereverberation and Speech Source Separation

Masahito Togami

2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)

May 04, 2020

In this paper, we propose a joint training of two deep neural networks (DNNs) for dereverberation and speech source separation. The proposed method connects the first DNN, the dereverberation part, the second DNN, and the speech source separation part in a cascade manner. The proposed method does not train each DNN separately. Instead, an integrated loss function which evaluates an output signal after dereverberation and speech source separation is adopted. The proposed method estimates the output signal as a probabilistic variable. Recently, in the speech source separation context, we proposed a loss function which evaluates the estimated posterior probability density function (PDF) of the output signal. In this paper, we extend this loss function into a loss function which evaluates not only speech source separation performance but also speech derevereberation performance. Since the output signal of the dereverberation part is converted into the input feature of the second DNN, gradient of the loss function is back-propagated into the first DNN through the input feature of the second DNN. Experimental results show that the proposed joint training of two DNNs is effective. It is also shown that the posterior PDF based loss function is effective in the joint training context.

Speech Processing

Paper : Joint Training of Deep Neural Networks for Multi-Channel Dereverberation and Speech Source Separation open into new tab or window (external link)