Skip to Content

Learning Unsupervised Shape Recovery from Images

The thesis presents an innovative approach to 3D shape reconstruction from images, exploiting the capabilities of a Generative Adversarial Network (GAN) called ShapeTexGAN. The research focuses on generating high quality 3D models from unordered point clouds or 2D images without requiring detailed camera information, with the aim of improving the efficiency and quality of 3D reconstruction processes where limited input data is available.


Type of Work: Master Thesis

Main Author: Jan Petrik

Affiliation: ETH Zurich

Supervisors: Radek Danecek, Markus Gross

Date: 20th October 2020

Journal: None

Online: On Demand

The methodology of this thesis centres on the development of ShapeTexGAN, a GAN architecture that incorporates graph and point convolutions to efficiently process geometric data. The uniqueness of this research lies in its ability to train on unordered point clouds and 2D images, producing registered polygon meshes. It also demonstrates the ability to generate textured meshes, i.e. to apply a texture to an untextured polygon mesh.
The main findings of this thesis highlight the effectiveness of ShapeTexGAN in generating accurate and high quality 3D shapes, as evidenced by both qualitative and quantitative evaluations. The ability to infer texture from coloured point clouds and apply it to meshes using a k-nearest neighbour algorithm further distinguishes this work and demonstrates the potential of integrating texture and shape reconstruction in a unified framework.


  • The interpolation outcomes in the ShapeTexGAN latent space between two 3D models are demonstrated. Additionally, the dots highlight specific polygons on the polygon mesh, and it is evident that these polygons remain unchanged throughout the interpolation process. Therefore, the generated 3D models maintain registration, signifying a stable topological structure throughout the transition.
  • Reconstruction results of ShapeTexGAN in the most challenging scenario, where the input to this AI-driven architecture is a single image silhouette of an object - a car, a chair and a human torso - and the output is a 3D model. It should be emphasised that the reconstruction is performed solely on the basis of this information, without any additional parameters.