Publications

CONFERENCE (INTERNATIONAL) Target Vocabulary Recognition Based on Multi-Task Learning with Decomposed Teacher Sequences

Aoi Ito (Hosei University), Tatsuya Komatsu, Yusuke Fujita, Yusuke Kida

The 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023)

August 20, 2023

This paper proposes a method for target vocabulary recognition based on multi-task learning with decomposed teacher sequences. The proposed method first decomposes teacher sequences into the target vocabulary and the non-target vocabulary sequences. Then, multi-task learning is performed by calculating losses for both the target vocabulary sequence and the non-target vocabulary sequence. By utilizing information from both target and non-target vocabulary, our proposed method provides more stable training and more accurate recognition of target vocabulary than single-task learning using only the target vocabulary. Experiments conducted on the Corpus of Spontaneous Japanese (CSJ) dataset, using numerals and katakana as target vocabulary, demonstrate the effectiveness of our proposed method. The results show a maximum CER improvement rate of 27% for katakana and 34% for numerals in target vocabulary recognition, as well as an 84% reduction in insertion errors in non-target vocabulary utterances.

Paper : Target Vocabulary Recognition Based on Multi-Task Learning with Decomposed Teacher Sequencesopen into new tab or window (external link)