July 2, 2021
By Ian Gormely
A new machine learning model from Vector Institute and Apple researchers can create 3D environments without any reference images.
Generative scene networks (GSN) are built on Neural Radiance Fields (NeRFs), which allow users to easily build 3D models from 2D photos. But NeRFs can’t fill in details they haven’t already “seen.” GSNs expand their scope, modelling entire environments like walking from a house and into a garage filling in new details as the camera moves. Once trained, GSNs can create, or “hallucinate,” as Vector Faculty member Graham Taylor puts it, entirely new environments when unconstrained. Users can also give the model a partial scene and let the model fill in the rest for a more grounded representation of reality.
GSNs, which synthesize radiance fields of indoor scenes in order to accomplish this impressive feat, were first described in “Unconstrained Scene Generation with Locally Conditioned Radiance Fields” a new paper co-authored by Canada CIFAR AI Chair and Vector Faculty Member Graham Taylor and led by his student Terrance DeVries
Taylor, DeVries and their co-authors Nitish Srivastava and Joshua M. Susskind (who both studied under Geoffrey Hinton at the University of Toronto along with Taylor) and Miguel Angel Bautista, DeVries’ mentor at Apple, are excited about applications of the technology.
In particular, Taylor sees the potential for deployment in the construction industry. Through his work with Next AI and the Creative Destruction Lab at the Rotman School of Management, Taylor is mentoring the startup Origami XR. Their iOS app uses the LiDAR scanner included on new Apple products to quickly and reliably create 3D models of individual rooms from construction projects (the software uses NeRF models to clean up the images). Replicating this with current LiDAR technology would require expensive equipment and extensive training, something most construction companies, which are generally small and medium-sized enterprises (SMEs), can’t afford.
Origami’s founder, Erik Peterson, was the person who first introduced Taylor to NeRFs. And while Peterson says that his company currently has no plans to integrate GSNs into their software, Taylor believes that GSNs have the potential to be helpful to construction companies interested in modelling entire buildings.
Taylor and his colleagues hope that GSN will lead to many downstream applications for 3D modelling, similar to how StyleGAN2, another generative model, did for 2D images like the neural filter tool in Adobe Photoshop. They see video games as a natural fit, especially since part of GSN’s training data came from VizDoom, a Doom simulator. “You can create new games on the fly,” says Taylor pointing to Toronto-based Transitional Forms, who are using AI to develop content for the entertainment industry. They also singled out real estate and design as industries for which their model could be useful.
He also sees their paper as a perfect example of the local AI ecosystem – research, mentorship and, potentially, deployment in industry – in full bloom. “I think this is exactly what we want to see emerge from a Pan-Canadian AI strategy — strengthening of the research ecosystem and with that, home-grown economic opportunities.”