Universal Score-based Speech Enhancement with High Content Preservation - LY Corporation R&D

Publications

CONFERENCE (INTERNATIONAL) Universal Score-based Speech Enhancement with High Content Preservation

Robin Scheibler, Yusuke Fujita, Yuma Shirahata, Tatsuya Komatsu

The 25th Annual Conference of the International Speech Communication Association (INTERSPEECH 2024)

September 01, 2024

We propose UNIVERSE++, a universal speech enhancement method based on score-based diffusion and adversarial training. Specifically, we improve the existing UNIVERSE model that decouples clean speech feature extraction and diffusion. Our contributions are three-fold. First, we make several modifications to the network architecture, improving training stability and final performance. Second, we introduce an adversarial loss to promote learning high quality speech features. Third, we propose a low-rank adaptation scheme with a phoneme fidelity loss to improve content preservation in the enhanced speech. In the experiments, we train a universal enhancement model on a large scale dataset of speech degraded by noise, reverberation, and various distortions. The results on multiple public benchmark datasets demonstrate that UNIVERSE++ compares favorably to both discriminative and generative baselines for a wide range of qualitative and intelligibility metrics.

Paper : Universal Score-based Speech Enhancement with High Content Preservation open into new tab or window (external link)

Software : https://github.com/line/open-universe open into new tab or window (external link)

PDF : Universal Score-based Speech Enhancement with High Content Preservation