3D Shape Representations
Depth Map
For each pixel, depth map gives distance from the camera to the object in the world at the pixel
- RGB + Depth image = RGB-D Image (2.5D)
- This type of data can be recorded directly for some types of 3D sensors
Predicting Depth Maps
Given an image, predict the depth map
We can use a fully convolutional network
Problem Scale / Depth Ambiguity
A small, close object looks exactly the same as a larger, farther-away object
- Absolute scale / depth are ambiguous from a single image
Surface Normals
For each pixel, surface normals give a vector giving the normal vector to the object in the world for that pixel
Predicting Normals
Similar to depth maps
- Loss function is comparing angles between two vectors
Can predict depth map and normals with a single network
Voxel Grid
- Represent a shape with a V x V x V grid of occupancies
- Just like a segmentation masks in Instance Segmentation#Mask R-CNN, but in 3D
Conceptually simple: just a 3D grid
Need high spatial resolution to capture fine structures
Scaling to high resolutions is nontrivial
Processing Voxel Inputs: 3D Convolution
- Kernel is 3D cube sliding through input
Generating Voxel Shapes: 3D Convolution
Voxel Problems: Memory Usage
Scaling Voxels: Oct-Trees
Implicit Functions
Learn a function to classify arbitrary 3D points as inside / outside the shape
The surface of the 3D object is the level set {x: o(x) = 1/2}
Point Cloud
- Represent shape as a set of P points in 3D space
- Requires new architecture, losses, etc
Can represent fine structures without huge numbers of points
Doesn’t explicitly represent the surface of the shape
Extracting a mesh for rendering or other applications requires post-processing
Processing Point Cloud Inputs: PointNet
Input: P points each with x, y, z positions
- Order of points should not matter
- Process pointclouds as set
Mesh
Triangle Mesh
Represent a 3D shape as a set of triangles
Vertices: Set of V points in 3D space
Faces: Set of triangles over the vertices
Benefits
- Standard representation of graphics
- Explicitly represents 3D shapes
- Adaptive
- Can represent flat surfaces very efficiently
- Can allocate more faces to areas with fine details
- Can attach data on verts and interpolate over the whole surface
- RGB colors, texture coordinates, normal vectors, etc
Problem
Nontrivial to process with neural networks