Publications

カンファレンス (国際) Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation

Ryota Yoshihashi, Yuya Otsuka, Kenji Doi, Tomohiro Tanaka, Hirokatsu Kataoka

17th Asian Conference on Computer Vision (ACCV 2024)

2024.12.9

The advance of generative models for images has inspired various training techniques for image recognition utilizing synthetic images. In semantic segmentation, one promising approach is extracting pseudo-masks from attention maps in text-to-image diffusion models, which enables real-image-and-annotation-free training. However, the pioneering training methods using the diffusion-synthetic images and pseudo-masks, e.g., DiffuMask have limitations in terms of mask quality, scalability, and ranges of applicable domains. To address these limitations, we propose a new framework to view diffusion-synthetic semantic segmentation training as a weakly supervised learning problem, where we utilize potentially inaccurate attentive information within the generative model as supervision. Motivated by this perspective, we first introduce reliability-aware robust training, originally used as a classifier-based WSSS method, with modification to handle generative attentions. Additionally, we propose techniques to boost the weakly supervised synthetic training: We introduce prompt augmentation by synonym-and-hyponym replacement, which is data augmentation to the prompt text set to scale up and diversify training images with limited text resources. Finally, LoRA-based adaptation of Stable Diffusion enables the transfer to a distant domain, e.g., auto-driving images. Experiments in PASCAL VOC, ImageNet-S, and Cityscapes show that our method effectively closes gap between real and synthetic training in semantic segmentation.

Paper : Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation新しいタブまたはウィンドウで開く (外部サイト)

Software : https://github.com/yahoojapan/attn2mask新しいタブまたはウィンドウで開く (外部サイト)

PDF : Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation