カンファレンス (国際) Role-aware Interaction Generation from Textual Description

Mikihiro Tanaka, Kent Fujiwara

2023 International Conference on Computer Vision (ICCV 2023)


This research tackles the problem of generating interaction between two human actors corresponding to textual description. We claim that certain interactions, which we call asymmetric interactions, involve a relationship between an actor and a receiver, whose motions significantly differ depending on the assigned role. However, existing studies of interaction generation attempt to learn the correspondence between a single label and the motions of both actors combined, overlooking differences in individual roles. We consider a novel problem of role-aware interaction generation, where roles can be designated before generation. We translate the text of the asymmetric interactions into active and passive voice to ensure the textual context is consistent with each role. We propose a model that learns to generate motions of the designated role, which together form a mutually consistent interaction. As the model treats individual motions separately, it can be pretrained to derive knowledge from single-person motion data for more accurate interactions. Moreover, we introduce a method inspired by Permutation Invariant Training (PIT) that can automatically learn which of the two actions corresponds to an actor or a receiver without additional annotation. We further present cases where existing evaluation metrics fail to accurately assess the quality of generated interactions, and propose a novel metric, Mutual Consistency, to address such shortcomings. Experimental results demonstrate the efficacy of our method, as well as the necessity of the proposed metric.