Publications
CONFERENCE (INTERNATIONAL) Universal Score-based Speech Enhancement with High Content Preservation
Robin Scheibler, Yusuke Fujita, Yuma Shirahata, Tatsuya Komatsu
The 25th Annual Conference of the International Speech Communication Association (INTERSPEECH 2024)
September 01, 2024
We propose UNIVERSE++, a universal speech enhancement method based on score-based diffusion and adversarial training. Specifically, we improve the existing UNIVERSE model that decouples clean speech feature extraction and diffusion. Our contributions are three-fold. First, we make several modifications to the network architecture, improving training stability and final performance. Second, we introduce an adversarial loss to promote learning high quality speech features. Third, we propose a low-rank adaptation scheme with a phoneme fidelity loss to improve content preservation in the enhanced speech. In the experiments, we train a universal enhancement model on a large scale dataset of speech degraded by noise, reverberation, and various distortions. The results on multiple public benchmark datasets demonstrate that UNIVERSE++ compares favorably to both discriminative and generative baselines for a wide range of qualitative and intelligibility metrics.
Paper : Universal Score-based Speech Enhancement with High Content Preservation (external link)
Software : https://github.com/line/open-universe (external link)
PDF : Universal Score-based Speech Enhancement with High Content Preservation