Researchers at Nvidia say they have created a rendering framework that can produce 3D objects from 2D images, with the correct shape, color, texture and lighting; a framework that can help machine learning models achieve depth perception.
The rendering framework called DIB-R — a differentiable interpolation-based renderer — produces 3D objects from 2D images and was presented this week at the annual conference on Neural Information Processing Systems in Vancouver, Canada.
The framework, when wrapped around a neural network, learns to predict shape, texture, and light from single images and generate 3D shapes from a photo.
In the paper presented this week the researchers (from Nvidia, the University of Toronto, Vector Institute, McGill University and Aalto Universit) noted: “Many machine learning models operate on images, but ignore the fact that images are 2D projections formed by 3D geometry interacting with light, in a process called rendering…
“Enabling machine learning models to understand the image formation process could facilitate disentanglement of geometry from the lighting effects, which is key in achieving invariance and robustness.”
![2D to 3D rendering](https://www.cbronline.com/wp-content/uploads/2019/12/allsup-1.png)
Credit: Nvidia
DIB-R 2D to 3D Rendering
DIB-R uses an encoder-decoder architecture to transform the input data from the 2D image into a feature map that is then used to predict the image outcome.
DIB-R takes a polygon sphere and alters it to the point that it represents the 2D image it is trying to reproduce in 3D. The researchers trained the model using a number of image datasets from a collection of bird photos to images of vehicles.
It could potentially be used by archaeological researchers to create 3D images of objects that have been discovered and imaged during excavations.
![2D to 3D rendering](https://www.cbronline.com/wp-content/uploads/2019/12/model2a-2-1-scaled.png)
Credit: DIB-R Paper
Using a single NVIDIA V100 GPU it takes just two days to train the model, once trained DIB-R can create a 3D object based on the data of a 2D image within a 100 milliseconds. DIB-R is built on the machine learning framework PyTorch.
The researchers noted that the: “Key to our approach is to view foreground rasterization as a weighted interpolation of local properties and background rasterization as an distance-based aggregation of global geometry. Our approach allows for accurate optimization over vertex positions, colors, normals, light directions and texture coordinates through a variety of lighting models.”