Publications
CONFERENCE (INTERNATIONAL) Computer-Resource-Aware Deep Speech Separation with a Run-Time-Specified Number of BLSTM Layers
Masahito Togami, Yoshiki Masuyama (Waseda University), Tatsuya Komatsu, Kazuyoshi Yoshii (Kyoto University), Tatsuya Kawahara (Kyoto University)
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2020 (APSIPA ASC 2020)
December 07, 2020
Recently, deep neural networks (DNNs) with multiple bidirectional long short term memory (BLSTM) layers have been successfully applied to supervised multi-channel speech separation. When it is applied for industrial products, one shortage is that the number of the BLSTM layers is not variable according to the available computational resource once the DNN is trained. Since available computational resource varies from device to device, it is preferable that the number of the BLSTM layers can be changed for optimal performance. In this paper, we propose a DNN based speech separation, in which each BLSTM layer is connected with a signal processing layer. It can output a separated speech signal, which can also be fed into the successive BLSTM layer. The proposed method trains two types of BLSTM layers. The first one is utilized for initialization of speech separation. The second one is utilized for enhancing separation performance. The proposed method can increase the number of the BLSTM layers by stacking the second type of the BLSTM layer to improve separation performance. Experimental results show that the proposed method is effective.
Paper : Computer-Resource-Aware Deep Speech Separation with a Run-Time-Specified Number of BLSTM Layers (external link)