ワークショップ (国際) How do people talk about images? A study on open-domain conversations on images.

Yi Pei Chen (U. Tokyo), Nobuyuki Shimizu, Takashi Miyazaki, Hideki Nakayama (U. Tokyo)

2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Student Research Workshop (NAACL 2022 SRW)


Open-domain conversations on images require the model to consider the relation and balance between utterances and images in order to generate proper responses. This paper004explores how human conduct conversationson images by investigating a well-constructed open-domain image conversation dataset, ImageChat. We examine the conversations onimages from three perspectives:image relevancy,image informationandutterance style. We find that about 37% of utterances are noton the image-relevant theme, i.e. the utterances could be generated without the image. We also discover that 45% of utterances contains image objects, 23% of utterances have non-object image-related information such as the description of the events in the image, and 32% of utterances do not have any image-relevant information at all. In addition, the speakers style influences the utterance a lot, with an average relevance score 1.5 out of 2, although this phenomenon may due to the natural of an artificially-constructed dialog dataset. Based on our analysis, we propose to enriching the image information with image captionand object tags, and we increase the diversity and image-relevancy of generated responses to the strong baseline. The result verifies that our analysis provides useful insights and directions and could facilitate future research on open-domain conversation on images.

Paper : How do people talk about images? A study on open-domain conversations on images.新しいタブまたはウィンドウで開く (外部サイト)