Domain Adaptation by Data Distribution Matching via Submodularity for Speech Recognition - LY Corporation R&D

Publications

CONFERENCE (INTERNATIONAL) Domain Adaptation by Data Distribution Matching via Submodularity for Speech Recognition

Yusuke Shinohara, Shinji Watanabe (Carnegie Mellon University)

The 2023 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2023)

December 16, 2023

We study the problem of building a domain-specific speech recognition model given some text from the target domain. One of the most popular approaches to this problem is shallow fusion, which incorporates a domain-specific language model build from the given text. However, shallow fusion significantly increases the model size and inference cost, which makes its deployment harder in industry. In this paper, we propose domain adaptation by data distribution matching, where a subset is selected from an existing multi-domain training data to match the target-domain distribution, and a model is fine-tuned on the subset. A submodular optimization algorithm with a novel extension is employed for the subset selection. Experiments on LibriSpeech, a corpus of audiobooks, where we treat each book as a domain, show that the proposed distribution-matching approach achieves WERs equivalent with the conventional shallow-fusion approach, without any increase in the model size and inference cost.

Speech Processing

Paper : Domain Adaptation by Data Distribution Matching via Submodularity for Speech Recognition open into new tab or window (external link)