CONFERENCE (INTERNATIONAL) Neural Diarization with Non-Autoregressive Intermediate Attractors

Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa (Waseda University)

2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)

June 04, 2023

End-to-end neural diarization (EEND) with encoder-decoder-based attractors (EDA) is a promising method to handle the whole speaker diarization problem simultaneously with a single neural network. While the EEND model can produce all frame-level speaker labels simultaneously, it disregards output label dependency. In this work, we propose a novel EEND model that introduces the label dependency between frames. The proposed method generates non-autoregressive intermediate attractors to produce speaker labels at the lower layers and conditions the subsequent layers with these labels. While the proposed model works in a non-autoregressive manner, the speaker labels are refined by referring to the whole sequence of intermediate labels. The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance. The proposed method with the deeper net-work benefits more from the intermediate labels, resulting in better performance and training throughput than EEND-EDA.

Paper : Neural Diarization with Non-Autoregressive Intermediate Attractorsopen into new tab or window (external link)