News
Eight papers have been accepted to INTERSPEECH2025
May 23, 2025
Our papers have been accepted to INTERSPEECH2025 (external link).
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
Masaya Kawamura, Takuya Hasumi, Yuma Shirahata, Ryuichi Yamamoto
Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning
Hien Ohnaka, Yuma Shirahata, Byeongseon Park, Ryuichi Yamamoto
SLASH: Self-Supervised Speech Pitch Estimation Leveraging DSP-derived Absolute Pitch
Ryo Terashima, Yuma Shirahata, Masaya Kawamura
DnR-nonverbal: Cinematic Audio Source Separation Dataset Containing Non-Verbal Sounds
Takuya Hasumi, Yusuke Fujita
Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos
Yuchi Ishikawa, Shota Nakada, Hokuto Munakata, Kazuhiro Saito, Tatsuya Komatsu, Yoshimitsu Aoki (Keio University)
Audio-Text Contrastive Learning with Audio-Composed Text Features
Tatsuya Komatsu, Hokuto Munakata, Yuchi Ishikawa
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
Reo Yoneyama (Nagoya University), Masaya Kawamura, Ryo Terashiama, Ryuichi Yamamoto, Tomoki Toda (Nagoya University)
OpusLM: A Family of Open Unified Speech Language Models
Jinchuan Tian (CMU), William Chen (CMU), Yifan Peng (CMU), Jiatong Shi (CMU), Siddhant Arora (CMU), Shikhar Bharadwaj (CMU), Takashi Maekaku, Yusuke Shinohara, Keita Goto, Xiang Yue (CMU), Chao-Han Huck Yang (NVIDIA), Shinji Watanabe (CMU)