Publications

カンファレンス (国際) ReMoGPT: Part-Level Retrieval-Augmented Motion-Language Models

Qing Yu, Mikihiro Tanaka, Kent Fujiwara

The 39th Annual AAAI Conference on Artificial Intelligence (AAAI-25)

2025.4.11

Generation of 3D human motion holds significant importance in the creative industry. While recent notable advances have been made in generating common motions, existing methods struggle to generate diverse and rare motions due to the complexity of motions and limited training data. This work introduces ReMoGPT, a unified motion-language generative model that solves a wide range of motion-related tasks by incorporating a multi-modal retrieval mechanism into the generation process to address the limitations of existing models, namely diversity and generalizability. We propose to focus on body-part-level motion features to enable fine-grained text-motion retrieval and locate suitable references from the database to conduct generation. Then, the motion-language generative model is trained with prompt-based question-and-answer tasks designed for different motion-relevant problems. We incorporate the retrieved samples into the prompt, and then perform instruction tuning of the motion-language model, to learn from task feedback and produce promising results with the help of fine-grained multi-modal retrieval. Extensive experiments validate the efficacy of ReMoGPT, showcasing its superiority over existing state-of-the-art methods. The framework performs well on multiple motion tasks, including motion retrieval, generation, and captioning.

Paper : ReMoGPT: Part-Level Retrieval-Augmented Motion-Language Models新しいタブまたはウィンドウで開く (外部サイト)