マルチモーダル
-
- ワークショップ (国際)
- CAVIARES: Corpus for Audio-Visual Expressive Voice Agent
- Jinsheng Chen (The University of Tokyo), Yuki Saito (The University of Tokyo), Dong Yang (The University of Tokyo), Naoko Tanji (The University of Tokyo), Hironori Doi, Byeongseon Park, Yuma Shirahata, Kentaro Tachibana, Hiroshi Saruwatari (The University of Tokyo)
- 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2025)
- 2025.12.9
-
- カンファレンス (国際)
- DnR-nonverbal: Cinematic Audio Source Separation Dataset Containing Non-Verbal Sounds
- Takuya Hasumi, Yusuke Fujita
- The 26th Annual Conference of the International Speech Communication Association (INTERSPEECH 2025)
- 2025.8.21
-
- カンファレンス (国際)
- Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos
- Yuchi Ishikawa, Shota Nakada, Hokuto Munakata, Kazuhiro Saito, Tatsuya Komatsu, Yoshimitsu Aoki (Keio University)
- The 26th Annual Conference of the International Speech Communication Association (INTERSPEECH 2025)
- 2025.8.19
-
- カンファレンス (国際)
- Leveraging Unlabeled Audio for Audio-Text Contrastive Learning via Audio-Composed Text Features
- Tatsuya Komatsu, Hokuto Munakata, Yuchi Ishikawa
- The 26th Annual Conference of the International Speech Communication Association (INTERSPEECH 2025)
- 2025.8.17
-
- 論文誌 (国際)
- A-UVI: GNSS-Assisted EO-based UV Index Estimation Method for Individual-level Precise UV Exposure Assessment
- Yuuki Nishiyama (The University of Tokyo), Subaru Atsumi (The University of Tokyo), Kota Tsubouchi, Kaoru Sezaki (The University of Tokyo)
- The Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT)
- 2025.6.25
-
- カンファレンス (国際)
- Language-based Audio Moment Retrieval
- Hokuto Munakata, Taichi Nishimura, Shota Nakada, Tatsuya Komatsu
- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025)
- 2025.4.11
-
- カンファレンス (国際)
- Music Tagging with Classifier Group Chains
- Takuya Hasumi, Tatsuya Komatsu, Yusuke Fujita
- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025)
- 2025.4.10
-
- カンファレンス (国際)
- DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
- Shota Nakada, Taichi Nishimura, Hokuto Munakata, Masayoshi Kondo, Tatsuya Komatsu
- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025)
- 2025.4.8
-
- その他 (国内)
- Classifier chains によりタグ間の依存を考慮した音楽タグづけ
- 蓮実 拓也, 小松 達也, 藤田 雄介
- 日本音響学会 2025年春季研究発表会 (ASJ 2025 spring)
- 2025.3.19
-
- カンファレンス (国内)
- マルチモーダル共感的対話音声合成に向けたコーパスの構築
- 齋藤 佑樹 (東京大学), 陳 晋升 (東京大学), 楊 棟 (東京大学), 丹治 尚子 (東京大学), 土井 啓成, 白旗 悠真, 朴 炳宣, 橘 健太郎, 猿渡 洋 (東京大学)
- 日本音響学会 2025年春季研究発表会 (ASJ 2025 spring)
- 2025.3.17
-
- カンファレンス (国際)
- ReMoGPT: Part-Level Retrieval-Augmented Motion-Language Models
- Qing Yu, Mikihiro Tanaka, Kent Fujiwara
- The 39th Annual AAAI Conference on Artificial Intelligence (AAAI-25)
- 2025.3.1
-
- その他 (国際)
- Congestion Forecast for Trains with Railroad-Graph-based Semi-Supervised Learning using Sparse Passenger Reports
- Soto Anno (Tokyo Institute of Technology), Kota Tsubouchi, Masamichi Shimosaka (Tokyo Institute of Technology)
- arXiv.org (arXiv)
- 2024.10.23
-
- カンファレンス (国際)
- Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models
- Kent Fujiwara, Mikihiro Tanaka, Qing Yu
- The 18th European Conference on Computer Vision 2024 (ECCV 2024)
- 2024.9.29
-
- カンファレンス (国際)
- Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches
- Qing Yu, Mikihiro Tanaka, Kent Fujiwara
- The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024 (CVPR 2024)
- 2024.6.19
-
- カンファレンス (国際)
- Audio Difference Learning for Audio Captioning
- Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda (Nagoya University), Tomoki Toda (Nagoya University)
- 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)
- 2024.4.14
-
- カンファレンス (国内)
- 日本語テキストと音楽の対照学習の実験的評価
- 蓮実 拓也, 小松 達也, 藤田 雄介, 二又 航介, 橘 健太郎
- 日本音響学会 2024年春季研究発表会 (ASJ 2024 spring)
- 2024.3.7
-
- カンファレンス (国際)
- A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video
- Keito Kudo (Tohoku Univ.), Haruki Nagasawa (Tohoku Univ.), Jun Suzuki (Tohoku Univ.), Nobuyuki Shimizu
- The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)
- 2023.12.10
-
- ワークショップ (国際)
- Constructing Image–Text Pair Dataset from Books
- Yamato Okamoto (Naver Cloud Corp. / Works Mobile Japan Corp.), Haruto Toyonaga (Doshisha Univ.), Yoshihisa Ijiri, Hirokatsu Kataoka
- International Conference on Computer Vision workshop on "Towards the Next Generation of Computer Vision Datasets" (ICCVW Datacomp)
- 2023.10.3
-
- ワークショップ (国際)
- Leveraging Image-Text Similarity and Caption Modification for the DataComp Challenge: Filtering Track and BYOD Track
- Shuhei Yokoo, Peifei Zhu, Yuchi Ishikawa, Mikihiro Tanaka, Masayoshi Kondo, Hirokatsu Kataoka
- ICCV 2023 Workshop on Towards the Next Generation of Computer Vision Datasets: DataComp Track (ICCV 2023)
- 2023.10.3
-
- カンファレンス (国際)
- Role-aware Interaction Generation from Textual Description
- Mikihiro Tanaka, Kent Fujiwara
- 2023 International Conference on Computer Vision (ICCV 2023)
- 2023.10.2