Publications

CONFERENCE (INTERNATIONAL)
On Sorting and Padding Multiple Targets for Sound Event Localization and Detection with Permutation Invariant and Location-based Training: Robin Scheibler, Tatsuya Komatsu, Yusuke Fujita, Michael Hentschel; Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2022 (APSIPA ASC 2022); November 07, 2022

WORKSHOP (INTERNATIONAL)
Sound Event Localization and Detection with pre-trained Audio Spectrogram Transformer and Multichannel Separation Network: Robin Scheibler, Tatsuya Komatsu, Yusuke Fujita, Michael Hentschel; Detection and Classification of Acoustic Scenes and Events (DCASE 2022); November 03, 2022

CONFERENCE (INTERNATIONAL)
Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR: Takashi Maekaku, Yuya Fujita, Yifan Peng (Carnegie Mellon University), Shinji Watanabe (Carnegie Mellon University); The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 19, 2022

CONFERENCE (INTERNATIONAL)
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation: Xuankai Chang (Carnegie Mellon University), Takashi Maekaku, Yuya Fujita, Shinji Watanabe (Carnegie Mellon University); The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 19, 2022

CONFERENCE (INTERNATIONAL)
A Unified Accent Estimation Method Based on Multi-Task Learning for Japanese Text-to-Speech: Byeongseon Park, Ryuichi Yamamoto, Kentaro Tachibana; The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 18, 2022

CONFERENCE (INTERNATIONAL)
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History: Yuto Nishimura (The University of Tokyo), Yuki Saito (The University of Tokyo), Shinnosuke Takamichi (The University of Tokyo), Kentaro Tachibana, Hiroshi Saruwatari (The University of Tokyo); The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 18, 2022

CONFERENCE (INTERNATIONAL)
Better Intermediates Improve CTC Inference: Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee (NAVER), Lukas Lee (NAVER), Shinji Watanabe (Carnegie Mellon University), Yusuke Kida; The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 18, 2022

CONFERENCE (INTERNATIONAL)
Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation: Ryo Terashima, Ryuichi Yamamoto, Eunwoo Song (NAVER), Yuma Shirahata, Hyun-Wook Yoon (NAVER), Jae-Min Kim (NAVER), Kentaro Tachibana; The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 18, 2022

CONFERENCE (INTERNATIONAL)
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning: Takaaki Saeki (The University of Tokyo), Kentaro Tachibana, Ryuichi Yamamoto; The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 18, 2022

CONFERENCE (INTERNATIONAL)
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding: Yen-Ju Lu (Academia Sinica), Xuankai Chang (CMU), Chenda Li (SJTU), Wangyou Zhang (SJTU), Samuele Cornell (Universit`a Politecnica delle Marche), Zhaoheng Ni (Meta AI), Yoshiki Masuyama (CMU/TMU), Brian Yan (CMU), Robin Scheibler, Zhong-Qiu Wang (CMU), Yu Tsao (Academica Sinica), Yanmin Qian (SJTU), Shinji Watanabe (CMU); The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 18, 2022

CONFERENCE (INTERNATIONAL)
Independence-based Joint Dereverberation and Separation with Neural Source Model: Kohei Saijo (Waseda University), Robin Scheilbler; The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 18, 2022

CONFERENCE (INTERNATIONAL)
InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR: Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida; The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 18, 2022

CONFERENCE (INTERNATIONAL)
Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems: Hyun-Wook Yoon (NAVER), Ohsung Kwon (NAVER), Hoyeon Lee (NAVER), Ryuichi Yamamoto, Eunwoo Song (NAVER), Jae-Min Kim (NAVER), Min-Jae Hwang (NAVER); The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 18, 2022

CONFERENCE (INTERNATIONAL)
Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition: Yusuke Shinohara, Shinji Watanabe (Carnegie Mellon University); The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 18, 2022

CONFERENCE (INTERNATIONAL)
Spatial Loss for Unsupervised Multi-channel Source Separation: Kohei Saijo (Waseda University), Robin Scheilbler; The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 18, 2022

CONFERENCE (INTERNATIONAL)
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent: Yuki Saito (The University of Tokyo), Yuto Nishimura (The University of Tokyo), Shinnosuke Takamichi (The University of Tokyo), Kentaro Tachibana, Hiroshi Saruwatari (The University of Tokyo); The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 18, 2022

CONFERENCE (INTERNATIONAL)
TTS-by-TTS 2: Data-selective Augmentation for Neural Speech Synthesis Using Ranking Support Vector Machine with Variational Autoencoder: Eunwoo Song (NAVER), Ryuichi Yamamoto, Ohsung Kwon (NAVER), Chan-Ho Song, Min-Jae Hwang (NAVER), Suhyeon Oh (NAVER), Hyun-Wook Yoon (NAVER), Jin-Seob Kim (NAVER), Jae-Min Kim (NAVER); The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022); September 18, 2022

CONFERENCE (DOMESTIC)
CTC ベース音声認識モデルにおける中間層ロスと条件付けが与える影響の考察: 市村収太, 中込優, 藤田雄介, 小松達也, 木田祐介; 日本音響学会 2022年秋季研究発表会 (ASJ 2022 autumn); September 14, 2022

CONFERENCE (DOMESTIC)
End-to-end Automatic Speech Recognition with Independent Vector Analysis Frontend: シャイブラーロビン, Zhang Wangyou (Shanghai Jiao Tong University), Chang Xuankai (Shanghai Jiao Tong University), 渡部晋治 (Carnegie Mellon University), Qian Yanmin (Shanghai Jiao Tong University); 日本音響学会 2022年秋季研究発表会 (ASJ 2022 autumn); September 14, 2022

CONFERENCE (DOMESTIC)
中間層予測にビームサーチを用いた新しい CTC 推論: 小松達也, 藤田雄介, Lee Jaesong (NAVER), Lee Lukas (NAVER), 渡部晋治 (Carnegie Mellon University), 木田祐介; 日本音響学会 2022年秋季研究発表会 (ASJ 2022 autumn); September 14, 2022