DCIGN人脸数据集是采用深度网络 Deep Convolutional Inverse Graphics Network 进行人脸特征建模,旨在构建剔除人脸角度、光照、材质等外生条件的人脸模型。
深度卷积反演图形网络(DC-IGN)有一个编码器和一个解码器。我们遵循变异自动编码器(Kingma和Welling)的架构,并有一些变化。编码器包括几层卷积,然后是最大集合,而解码器有几层非集合(使用最近邻居的上采样),然后是卷积。(a) 在训练期间,数据(x)通过编码器产生后验近似值Q(z_i|x),其中z_i由场景潜在变量组成,如姿势、光线、纹理或形状。为了学习DC-IGN的参数,使用随机梯度下降法对梯度进行反向传播,使用以下变异对象函数:-log(P(x|z_i) + KL(Q(z_i|x)||P(z_i)),对于每个z_i。我们可以强迫DC-IGN通过展示具有一组非活动和活动变换(例如脸部旋转、光线向某个方向扫射等)的迷你批次来学习一个分离的表示。(b) 在测试过程中,数据x可以通过编码器来获得潜标z_i。只需操作适当的图形代码组(z_i),就可以将图像重新渲染成不同的视角、照明条件、形状变化等,这就是人们对现成的3D图形引擎的操作方式。
A CUDA-capable GPU
The CUDA Toolkit
cuDNN: NVidia's NN library
cudnn.torch: Torch bindings to cuDNN.
Facebook has some great instructions for installing these over at https://github.com/facebook/fbcunn/blob/master/INSTALL.md
This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN) that aims to learn an interpretable representation of images that is disentangled with respect to various transformations such as object out-of-plane rotations, lighting variations, and texture. The DC-IGN model is composed of multiple layers of convolution and de-convolution operators and is trained using the Stochastic Gradient Variational Bayes (SGVB) algorithm. We propose training procedures to encourage neurons in the graphics code layer to have semantic meaning and force each group to distinctly represent a specific transformation (pose, light, texture, shape etc.). Given a static face image, our model can re-generate the input image with different pose, lighting or even texture and shape variations from the base face. We present qualitative and quantitative results of the model’s efficacy to learn a 3D rendering engine. Moreover, we also utilize the learnt representation for two important visual recognition tasks: (1) an invariant face recognition task and (2) using the representation as a summary statistic for generative modeling.
A big shout-out to all the Torch developers. Torch is simply awesome. We thank Thomas Vetter for giving us access to the basel face model. T. Kulkarni was graciously supported by the Leventhal Fellowship. This research was supported by ONR award N000141310333, ARO MURI W911NF-13-1-2012 and CBMM. We would also like to thank (y0ast) https://github.com/y0ast for making the variational autoencoder code available online.
