(ICML2023)NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion 论文阅读
date
Jun 1, 2023
Last edited time
Jun 1, 2023 08:23 AM
status
Published
slug
NerfDiff论文阅读
tags
DL
3D
DDPM
summary
eq7看不懂了
type
Post
Field
Plat
![notion image](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Fe8bb0d49-dde9-4f2b-a80a-58d5a39b4337%2FUntitled.png?table=block&id=05e7777e-9190-4348-b0f1-1f898dc11903&cache=v2)
当前的 Single-image View Synthesis 方法存在一些问题。其中最主要的问题是合成图像的质量不够高,特别是在遮挡区域和未见区域。此外,现有方法也难以保持输入图像的一致性,导致合成图像与输入图像之间存在明显的差异。
NerfDiff是我们提出的一种用于单张图像视角合成的模型。它结合了NeRF和3D-aware条件扩散模型的优点,通过两个阶段的训练和微调来实现高质量、高保真度的图像合成。
在训练阶段,我们在一组场景上联合训练基于相机空间三平面的 NeRF 和 3D 感知条件扩散模型 (CDM)。我们在微调阶段给定输入图像初始化 NeRF 表示。在第二阶段中,我们使用第一阶段训练好的单张图像NeRF作为初始模型,并将其与3D-aware CDM联合优化。具体来说,我们使用CDM生成多个视角下的图像,并将这些图像与输入图像进行比较,以计算渲染损失函数。同时,我们还引入了NeRF-guided distillation 技术来提高CDM生成的多视角图像质量,并保持与输入图像之间的一致性。
![notion image](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F91f79999-ef6a-4239-85c9-553b3f52148a%2FUntitled.png?table=block&id=9dd4b3be-9024-416c-adb4-615aff79a5e5&cache=v2)
Architecture
![notion image](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F9734a306-04dc-4e93-b904-a2e5bf6d6d6a%2FUntitled.png?table=block&id=dab40761-737f-44d7-ba13-487c5d1ca438&cache=v2)
NeRF Guided Distillation
NeRF Guided Distillation 用于提高3D-aware CDM生成的多视角图像的质量,并保持与输入图像之间的一致性。具体来说,它通过交替更新NeRF表示和指导多视角扩散过程来解决单张图像视角合成中的不确定性问题。
![notion image](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F9f624615-4f74-44ba-91bd-c4024c9d637a%2FUntitled.png?table=block&id=36326148-ce4e-4812-beb2-ce786a283957&cache=v2)
但是看不懂![notion image](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F9b592854-38aa-40f2-972f-5d81d4b9a903%2FUntitled.png?table=block&id=da039d44-0ff1-44f5-bda6-07a9a7b497a9&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Fc705156c-f008-4072-9ef9-a8530940ebcc%2FUntitled.png?table=block&id=311f47b4-bb1c-4d00-a6cc-8a63e7605032&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F29a4a770-3fb6-4ff5-80e1-571b4ba5aaa9%2FUntitled.png?table=block&id=51d70aca-858b-4ccd-bcd7-d39c42622f90&cache=v2)
- Given an input image .
- , , is the noised target for .
- is the volume rendered image.
- is a prior distribution on the relative camera poses to the input and , are the corresponding images at the relative camera .
![notion image](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F9b592854-38aa-40f2-972f-5d81d4b9a903%2FUntitled.png?table=block&id=da039d44-0ff1-44f5-bda6-07a9a7b497a9&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Fc705156c-f008-4072-9ef9-a8530940ebcc%2FUntitled.png?table=block&id=311f47b4-bb1c-4d00-a6cc-8a63e7605032&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F29a4a770-3fb6-4ff5-80e1-571b4ba5aaa9%2FUntitled.png?table=block&id=51d70aca-858b-4ccd-bcd7-d39c42622f90&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Ff9061271-ae5a-416e-a0d3-2ba23473c95a%2FUntitled.png?table=block&id=8db16394-8936-4186-80f7-2dc727b9065b&cache=v2)