Publications

カンファレンス (国際) Target Vocabulary Recognition Based on Multi-Task Learning with Decomposed Teacher Sequences

Aoi Ito (Hosei University), Tatsuya Komatsu, Yusuke Fujita, Yusuke Kida

The 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023)

2023.8.20

This paper proposes a method for target vocabulary recognition based on multi-task learning with decomposed teacher sequences. The proposed method first decomposes teacher sequences into the target vocabulary and the non-target vocabulary sequences. Then, multi-task learning is performed by calculating losses for both the target vocabulary sequence and the non-target vocabulary sequence. By utilizing information from both target and non-target vocabulary, our proposed method provides more stable training and more accurate recognition of target vocabulary than single-task learning using only the target vocabulary. Experiments conducted on the Corpus of Spontaneous Japanese (CSJ) dataset, using numerals and katakana as target vocabulary, demonstrate the effectiveness of our proposed method. The results show a maximum CER improvement rate of 27% for katakana and 34% for numerals in target vocabulary recognition, as well as an 84% reduction in insertion errors in non-target vocabulary utterances.

Paper : Target Vocabulary Recognition Based on Multi-Task Learning with Decomposed Teacher Sequences新しいタブまたはウィンドウで開く (外部サイト)