Eight papers have been accepted to INTERSPEECH2025

News

May 23, 2025

Our papers have been accepted to INTERSPEECH2025 (external link).

BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
Masaya Kawamura, Takuya Hasumi, Yuma Shirahata, Ryuichi Yamamoto

Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning
Hien Ohnaka, Yuma Shirahata, Byeongseon Park, Ryuichi Yamamoto

SLASH: Self-Supervised Speech Pitch Estimation Leveraging DSP-derived Absolute Pitch
Ryo Terashima, Yuma Shirahata, Masaya Kawamura

DnR-nonverbal: Cinematic Audio Source Separation Dataset Containing Non-Verbal Sounds
Takuya Hasumi, Yusuke Fujita

Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos
Yuchi Ishikawa, Shota Nakada, Hokuto Munakata, Kazuhiro Saito, Tatsuya Komatsu, Yoshimitsu Aoki (Keio University)

Audio-Text Contrastive Learning with Audio-Composed Text Features
Tatsuya Komatsu, Hokuto Munakata, Yuchi Ishikawa

Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
Reo Yoneyama (Nagoya University), Masaya Kawamura, Ryo Terashiama, Ryuichi Yamamoto, Tomoki Toda (Nagoya University)

OpusLM: A Family of Open Unified Speech Language Models
Jinchuan Tian (CMU), William Chen (CMU), Yifan Peng (CMU), Jiatong Shi (CMU), Siddhant Arora (CMU), Shikhar Bharadwaj (CMU), Takashi Maekaku, Yusuke Shinohara, Keita Goto, Xiang Yue (CMU), Chao-Han Huck Yang (NVIDIA), Shinji Watanabe (CMU)

back to index