Prompt-to-Prompt Image Editing with Cross Attention Control 论文阅读Lazurite, This is my personal website

Prompt-to-Prompt Image Editing with Cross Attention Control 论文阅读

date

Jun 14, 2023

Last edited time

Jun 14, 2023 05:21 AM

status

Published

slug

Prompt-to-Prompt-Image-Editing-with-Cross-Attention-Control论文阅读

tags

DL

CV

DDPM

summary

type

Post

origin

https://www.notion.so/lazurite/Prompt-to-Prompt-Image-Editing-with-Cross-Attention-Control-6665ba0a8899439eaa891ac912c1d490?pvs=4

Field

Plat

Prompt-to-Prompt Image Editing with Cross Attention Control

Google的文章，用Imagen来实现。Motivation相比于直接text2image生成，text-guided editing要求原来图像绝大部分区变化不大，目前的方法需要用户指定mask来引导生成。本文发现 cross-attention对于image的布局控制…

https://zhuanlan.zhihu.com/p/570874172

Prompt-to-Prompt Image Editing with Cross Attention Control

Recent large-scale text-driven synthesis models have attracted much attention thanks to their remarkable capabilities of generating highly diverse images that follow given text prompts. Such...

https://arxiv.org/abs/2208.01626

Hertz 等 - 2022 - Prompt-to-Prompt Image Editing with Cross Attentio.pdf

Motivation

相比于直接 text2image 生成，text-guided editing 要求原来图像绝大部分区变化不大，目前的方法需要用户指定 mask 来引导生成。

本文发现 cross-attention 对于 image 的布局控制很重要。

目前已有的纯 text-guided 的 editing（text2live）text2live，目前只能修改图片的纹理（外观），不能修改复杂的实体结构，比如把自行车换成一辆车。并且，他还需要训练。

Contribution

本文的 key idea 在于把 cross-attention map 嵌入到 diffusion 的过程中，利用 pixel 和 token 的关系来控制生成。

如上面的图所示，本文可以改变某个简单的 token，从而保持大部分场景不变，而改变小部分区域的置换；也可以全局地改变整个图；当然也可以加入一些新的信息。

本文的方法与别的方法不同，这个方法不需要——training, finetuning, extra data, optimization。只需要简简单单改一下输入的 prompt 即可。

notion image

Method

notion image

先做 text-to-image 的 attention 操作，得到每个 token 对应的 attention map，接着有三种操作：

1token 换词，那么直接替换 attention map 即可。
加词，则是直接在对应位置加入新的 attention map。
token 增强——直接提高对应的 map 的权重。
都建立在已经用一个 prompt 输入的基础上，但是如果是只有一个图，怎么直接修改？用 captioning?)

生成的图片不仅和 random seed 有关，text embedding 与 pixel 之间相互的关系也很重要。（diffusion model 利用 cross-attention 来融合图文的信息并且预测噪声的，所以可以用 attention map 把关系给打出来~ 一个很有趣的发现：denoise 的早期就已经知道东西的位置和方向了，那么能否在这做一些加速的操作？以及后面的过程不要 guidance 会有问题吗？）

notion image

下面是一个例子：只保留 butterfly 的 attention map，可以生成各种样式的图案：

notion image

算法流程如下，必须确保随机种子一致才行。Edit() 是各种编辑操作。

notion image

不同程度的 injection（利用 t 来控制需要 inject 的步数）

notion image

Application

Text-only Localized Editing

notion image

Fader Control using Attention Re-weighting

notion image

Real Image Editing

最简单的做法：给 init image + 一句话，然后基于这句话进行重新改变的编辑。

notion image

部分场景会失效——因为过大的 scale 会导致失真，过小的 scale 则导致不太能编辑。

解决方案：用 attention map 自动提取 mask 区域后，进行 blended diffusion 的操作。

notion image

Limitations

目前的 inversion 需要用户给一个合理的 prompt。

目前的 cross-attention 是再小分辨率的层级做的，所以对于细节的生成不太好。

目前的方法不能对图中的物体进行移位操作。

© Lazurite 2021 - 2025