Speech Processing
-
- CONFERENCE (INTERNATIONAL)
- CC-G2PNP: Streaming Grapheme-to-Phoneme and Prosody with Conformer-CTC for Unsegmented Languages
- Yuma Shirahata, Ryuichi Yamamoto
- 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026)
- May 07, 2026
-
- CONFERENCE (INTERNATIONAL)
- CASTELLA: Long Audio Dataset with Captions and Temporal Boundaries
- Hokuto Munakata, Takehiro Imamura (Nagoya University), Taichi Nishimura, Tatsuya Komatsu
- 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026)
- May 06, 2026
-
- CONFERENCE (INTERNATIONAL)
- Wave-Trainer-Fit: Neural Vocoder With Trainable Prior And Fixed-Point Iteration Towards High-Quality Speech Generation From SSL Features
- Hien Ohnaka (Nara Institute of Science and Technology), Yuma Shirahata, Masaya Kawamura
- 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026)
- May 05, 2026
-
- CONFERENCE (INTERNATIONAL)
- Online Register For Dual-Mode Self-Supervised Speech Models: Mitigating the Lack of Future Context
- Keita Goto, Takashi Maekaku, Jin Sakuma, Jinchuan Tian (Carnegie Mellon University), Yusuke Shinohara, Shinji Watanabe (Carnegie Mellon University)
- 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026)
- May 04, 2026
-
- CONFERENCE (DOMESTIC)
- ニューラルオーディオコーデック特徴量を用いた音声から話者特有の表情予測モデルの構築及び分析
- 朴 浚鎔 (東京大学), 陳 晋升 (東京大学), 土井 啓成, 朴 炳宣, 白旗 悠真, 橘 健太郎, 楊 棟 (東京大学), 齋藤 佑樹 (東京大学), 猿渡 洋 (東京大学)
- 日本音響学会 2026年春季研究発表会 (ASJ 2026 spring)
- March 19, 2026
-
- CONFERENCE (DOMESTIC)
- テキストベースの大規模言語モデルを用いた音声事前学習モデルの評価
- 前角 高史, 後藤 啓太, Jinchuan Tian (カーネギーメロン大学), 篠原 雄介, 渡部 晋治 (カーネギーメロン大学)
- 日本音響学会 2026年春季研究発表会 (ASJ 2026 spring)
- March 17, 2026
-
- OTHERS (INTERNATIONAL)
- Online Register for Dual-Mode Self-Supervised Speech Models: Mitigating The Lack of Future Context
- Keita Goto, Takashi Maekaku, Jin Sakuma, Jinchuan Tian (Carnegie Mellon University), Yusuke Shinohara, Shinji Watanabe (Carnegie Mellon University)
- arXiv.org (arXiv)
- March 02, 2026
-
- OTHERS (INTERNATIONAL)
- Bagpiper: Solving Open-Ended Audio Tasks via Rich Captions
- Jinchuan Tian (Carnegie Mellon University), Haoran Wang (Carnegie Mellon University), Bo-Hao Su (Carnegie Mellon University), Chien-Yu Huang (Carnegie Mellon University), Qingzheng Wang (Carnegie Mellon University), Jiatong Shi (Carnegie Mellon University), William Chen (Carnegie Mellon University), Xun Gong (Carnegie Mellon University), Siddhant Arora (Carnegie Mellon University), Chin-Jou Li (Carnegie Mellon University), Masao Someki (Carnegie Mellon University), Takashi Maekaku, Keita Goto, Yusuke Shinohara, Jin Sakuma, Chao-Han Huck Yang (NVIDIA Research), Shinji Watanabe (Carnegie Mellon University)
- arXiv.org (arXiv)
- February 06, 2026
-
- WORKSHOP (INTERNATIONAL)
- CAVIARES: Corpus for Audio-Visual Expressive Voice Agent
- Jinsheng Chen (The University of Tokyo), Yuki Saito (The University of Tokyo), Dong Yang (The University of Tokyo), Naoko Tanji (The University of Tokyo), Hironori Doi, Byeongseon Park, Yuma Shirahata, Kentaro Tachibana, Hiroshi Saruwatari (The University of Tokyo)
- 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2025)
- December 09, 2025
-
- WORKSHOP (INTERNATIONAL)
- Evaluating Self-Supervised Speech Models Via Text-Based LLMs
- Takashi Maekaku, Keita Goto, Jinchuan Tian (Carnegie Mellon University), Yusuke Shinohara, Shinji Watanabe (Carnegie Mellon University)
- 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2025)
- December 08, 2025
-
- OTHERS (INTERNATIONAL)
- CASTELLA: Long Audio Dataset with Captions and Temporal Boundaries
- Hokuto Munakata, Takehiro Imamura (Nagoya University), Taichi Nishimura, Tatsuya Komatsu
- arXiv.org (arXiv)
- November 19, 2025
-
- OTHERS (INTERNATIONAL)
- Evaluating Self-Supervised Speech Models via Text-Based LLMS
- Takashi Maekaku, Keita Goto, Jinchuan Tian (Carnegie Mellon University), Yusuke Shinohara, Shinji Watanabe (Carnegie Mellon University)
- arXiv.org (arXiv)
- October 07, 2025
-
- CONFERENCE (DOMESTIC)
- BitTTS: 1.58-bit量子化と重みインデキシングによる軽量なテキスト音声合成
- 川村 真也, 蓮実 拓也, 白旗 悠真, 山本 龍一
- 日本音響学会 2025年秋季研究発表会 (ASJ 2025 autumn)
- September 11, 2025
-
- OTHERS (DOMESTIC)
- 映画音源分離のための非言語音声を含むデータセット
- 蓮実 拓也, 藤田 雄介
- 日本音響学会 2025年秋季研究発表会 (ASJ 2025 autumn)
- September 10, 2025
-
- CONFERENCE (DOMESTIC)
- 音声からの音素・韻律ラベルの獲得とその応用
- 白旗 悠真, 朴 炳宣, 山本 龍一
- 日本音響学会 2025年秋季研究発表会 (ASJ 2025 autumn)
- September 10, 2025
-
- CONFERENCE (INTERNATIONAL)
- BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
- Masaya Kawamura, Takuya Hasumi, Yuma Shirahata, Ryuichi Yamamoto
- The 26th Annual Conference of the International Speech Communication Association (INTERSPEECH 2025)
- August 21, 2025
-
- CONFERENCE (INTERNATIONAL)
- Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
- Reo Yoneyama (Nagoya University), Masaya Kawamura, Ryo Terashima, Ryuichi Yamamoto (Nagoya University/LY Corporation), Tomoki Toda (Nagoya University)
- The 26th Annual Conference of the International Speech Communication Association (INTERSPEECH 2025)
- August 21, 2025
-
- CONFERENCE (INTERNATIONAL)
- DnR-nonverbal: Cinematic Audio Source Separation Dataset Containing Non-Verbal Sounds
- Takuya Hasumi, Yusuke Fujita
- The 26th Annual Conference of the International Speech Communication Association (INTERSPEECH 2025)
- August 21, 2025
-
- CONFERENCE (INTERNATIONAL)
- Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos
- Yuchi Ishikawa, Shota Nakada, Hokuto Munakata, Kazuhiro Saito, Tatsuya Komatsu, Yoshimitsu Aoki (Keio University)
- The 26th Annual Conference of the International Speech Communication Association (INTERSPEECH 2025)
- August 19, 2025
-
- CONFERENCE (INTERNATIONAL)
- SLASH: Self-Supervised Speech Pitch Estimation Leveraging DSP-derived Absolute Pitch
- Ryo Terashima, Yuma Shirahata, Masaya Kawamura
- The 26th Annual Conference of the International Speech Communication Association (INTERSPEECH 2025)
- August 19, 2025