Publications

カンファレンス (国際) Better Intermediates Improve CTC Inference

Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee (NAVER), Lukas Lee (NAVER), Shinji Watanabe (Carnegie Mellon University), Yusuke Kida

The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022)

2022.9.18

This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning. The paper first formulates self-conditioned CTC as a probabilistic model with an intermediate prediction as a latent representation and provides a tractable conditioning framework. We then propose two new conditioning methods based on the new formulation: (1) Searched intermediate conditioning that refines intermediate predictions with beam-search, (2) Multi-pass conditioning that uses predictions of previous inference for conditioning the next inference. These new approaches enable better conditioning than the original self-conditioned CTC during inference and improve the final performance. Experiments with the LibriSpeech dataset show relative 3%/12% performance improvement at the maximum in test clean/other sets compared to the original self-conditioned CTC.

Paper : Better Intermediates Improve CTC Inference新しいタブまたはウィンドウで開く (外部サイト)