Domain Adaptation by Data Distribution Matching via Submodularity for Speech Recognition - LINEヤフーの研究開発

Publications

カンファレンス (国際) Domain Adaptation by Data Distribution Matching via Submodularity for Speech Recognition

Yusuke Shinohara, Shinji Watanabe (Carnegie Mellon University)

The 2023 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2023)

2023.12.16

We study the problem of building a domain-specific speech recognition model given some text from the target domain. One of the most popular approaches to this problem is shallow fusion, which incorporates a domain-specific language model build from the given text. However, shallow fusion significantly increases the model size and inference cost, which makes its deployment harder in industry. In this paper, we propose domain adaptation by data distribution matching, where a subset is selected from an existing multi-domain training data to match the target-domain distribution, and a model is fine-tuned on the subset. A submodular optimization algorithm with a novel extension is employed for the subset selection. Experiments on LibriSpeech, a corpus of audiobooks, where we treat each book as a domain, show that the proposed distribution-matching approach achieves WERs equivalent with the conventional shallow-fusion approach, without any increase in the model size and inference cost.

音声処理

Paper : Domain Adaptation by Data Distribution Matching via Submodularity for Speech Recognition 新しいタブまたはウィンドウで開く（外部サイト）