3D Shape Representations

Pasted image 20241206173150.png

Depth Map

For each pixel, depth map gives distance from the camera to the object in the world at the pixel

  • RGB + Depth image = RGB-D Image (2.5D)
  • This type of data can be recorded directly for some types of 3D sensors

Pasted image 20241206173357.png

Predicting Depth Maps

Given an image, predict the depth map
We can use a fully convolutional network
Pasted image 20241206173552.png

Problem Scale / Depth Ambiguity

A small, close object looks exactly the same as a larger, farther-away object

  • Absolute scale / depth are ambiguous from a single image

Surface Normals

For each pixel, surface normals give a vector giving the normal vector to the object in the world for that pixel

Pasted image 20241206174020.png

Predicting Normals

Similar to depth maps

Can predict depth map and normals with a single network

Voxel Grid

Pasted image 20241206174430.png

Conceptually simple: just a 3D grid

Need high spatial resolution to capture fine structures

Scaling to high resolutions is nontrivial

Processing Voxel Inputs: 3D Convolution

Generating Voxel Shapes: 3D Convolution

Pasted image 20241206182159.png

Voxel Problems: Memory Usage

Pasted image 20241206182905.png

Scaling Voxels: Oct-Trees

Pasted image 20241206182940.png

Implicit Functions

Learn a function to classify arbitrary 3D points as inside / outside the shape

The surface of the 3D object is the level set {x: o(x) = 1/2}
Pasted image 20241206183140.png
Pasted image 20241206183315.png

Point Cloud

Pasted image 20241206183519.png

Can represent fine structures without huge numbers of points

Doesn’t explicitly represent the surface of the shape

Extracting a mesh for rendering or other applications requires post-processing

Processing Point Cloud Inputs: PointNet

Input: P points each with x, y, z positions

Mesh

Triangle Mesh

Pasted image 20241206190502.png
Represent a 3D shape as a set of triangles
Vertices: Set of V points in 3D space
Faces: Set of triangles over the vertices

Benefits

  • Standard representation of graphics
  • Explicitly represents 3D shapes
  • Adaptive
    • Can represent flat surfaces very efficiently
    • Can allocate more faces to areas with fine details
  • Can attach data on verts and interpolate over the whole surface
    • RGB colors, texture coordinates, normal vectors, etc

Problem

Nontrivial to process with neural networks