Attention Overlap Is Responsible for The Entity Missing Problem in Text-to-Image Diffusion Models!
We investigate the root cause of the entity missing problem in text-to-image diffusion models,
where certain objects described in prompts fail to appear in the generated images. Through de
tailed analysis, we demonstrate that overlapping attention maps between entities suppress their
independent representation, leading to object omission. To address this, we propose simple atten
tion separation techniques that reduce attention overlap and significantly improve entity inclusion
rates across various diffusion models and datasets, without compromising image quality.
Paper |
Code