Publications
カンファレンス (国際) Spatial constraint on multi-channel deep clustering
Masahito Togami
2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019)
2019.5.12
In this paper, a multi-channel deep clustering technique which combines two types of spatial information is proposed. The first one is an estimated direction-of-arrival (DOA) at each time-frequency point, which is utilized as an input feature of the proposed neural network. Instead of stacking embeddings of all pairs of microphones as in the conventional multi-channel deep clustering, the proposed method only requires one embedding. Therefore, the computational cost can be reduced in the inference stage. The second one is the time-frequency activity of each speech source estimated by multichannel Wiener filtering (MWF). The MWF is inserted between two consecutive bidirectional long-short-term memory (BLSTM) layers. The estimated time-frequency activity of each speech source by the MWF is transformed into an input feature of the next BLSTM layer. The proposed MWF insertion enhances the consistency of the embedding vectors along the time-axis. Experimental results show that multi-channel deep clustering with the proposed input feature based on the estimated DOA can separate speech sources better than the conventional multi-channel deep clustering that stacks embeddings of all the pairs of the microphones. Furthermore, the proposed MWF insertion is shown to be able to reduce distortion of output signal and improve signal-to-interference ratio.
Paper : Spatial constraint on multi-channel deep clustering (外部サイト)