Publications

CONFERENCE (INTERNATIONAL) Better Intermediates Improve CTC Inference

Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee (NAVER), Lukas Lee (NAVER), Shinji Watanabe (Carnegie Mellon University), Yusuke Kida

The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022)

September 18, 2022

This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning. The paper first formulates self-conditioned CTC as a probabilistic model with an intermediate prediction as a latent representation and provides a tractable conditioning framework. We then propose two new conditioning methods based on the new formulation: (1) Searched intermediate conditioning that refines intermediate predictions with beam-search, (2) Multi-pass conditioning that uses predictions of previous inference for conditioning the next inference. These new approaches enable better conditioning than the original self-conditioned CTC during inference and improve the final performance. Experiments with the LibriSpeech dataset show relative 3%/12% performance improvement at the maximum in test clean/other sets compared to the original self-conditioned CTC.

Paper : Better Intermediates Improve CTC Inferenceopen into new tab or window (external link)