Publications

CONFERENCE (INTERNATIONAL) Inter-Decoder: Using Attention-Decoder losses as Intermediate Regularization for CTC-based Speech Recognition

Tatsuya Komatsu, Yusuke Fujita

The 2022 IEEE Spoken Language Technology Workshop (SLT 2022)

January 09, 2023

We propose InterDecoder: a new non-autoregressive automatic speech recognition (NAR-ASR) training method that injects the advantage of token-wise autoregressive decoders while keeping the efficient non-autoregressive inference. The NAR-ASR models are often less accurate than autoregressive models such as Transformer decoder, which predict tokens conditioned on previously predicted tokens. The Inter-Decoder regularizes training by feeding intermediate encoder outputs into the decoder to compute the token-level prediction errors given previous ground-truth tokens, whereas the widely used Hybrid CTC/Attention model uses the decoder loss only at the final layer. In combination with Self-conditioned CTC, which uses the Intermediate CTC predictions to condition the encoder, performance is further improved. Experiments on the Librispeech and Tedlium2 dataset show that the proposed method shows a relative 6% WER improvement at the maximum compared to the conventional NAR-ASR methods.

Paper : Inter-Decoder: Using Attention-Decoder losses as Intermediate Regularization for CTC-based Speech Recognitionopen into new tab or window (external link)