Assignment 1. Rendering Basics with PyTorch3D

Course: 16-825 Learning For 3D Vision
Author: Karthik Pullalarevu
Andrew ID: kpullala

Introduction

In this assignment, you will learn the basics of rendering with PyTorch3D, explore 3D representations, and practice constructing simple geometry.


Task 1. Practicing with Cameras 📸

1.1. 360-degree Renders

To create a 360-degree render of an object, I followed these steps:

  1. Load the object's vertices and faces.
  2. Define a set of perspective cameras by specifying their rotation ($R$) and translation ($T$) matrices.
  3. Render the mesh from each camera's viewpoint.
  4. Loop through multiple viewpoints to create an animation.
  5. Combine the rendered images into a GIF.

By keeping the camera's elevation fixed at 10.0 and its distance from the object at 2.7, I generated 36 distinct views by varying the azimuthal angle in 10-degree increments.

# Set elevation and azimuth of the views
for angle in range(0, 360, 10):
    R, T = pytorch3d.renderer.look_at_view_transform(dist=2.7, elev=10.0, azim=angle)
    rendered_image = render_cow(cow_path = '/home/karthik/Depth-Anything-V2/lf3/assignment1/data/cow.obj'
            ,device = 'cuda', R=R, T=T)
    all_images.append(rendered_image)

my_images = [np.array((img * 255).astype(np.uint8)) for img in all_images]
duration = 1000 // 15  # Convert FPS (frames per second) to duration (ms per frame)
imageio.mimsave('cow_rotation.gif', my_images, duration=duration, loop=0)
360 render of a cow
Figure 1. A 360-degree render of the cow mesh.

1.2. Recreating the Dolly Zoom

The dolly zoom is a classic cinematic effect that changes the camera's field of view (FoV) while simultaneously moving the camera to keep the subject the same size in the frame. To recreate this, I increased the FoV over time while moving the camera closer to the object according to the formula: $distance = \frac{1.8 \times 10^4}{fov^2}$.

fovs = torch.linspace(5, 120, num_frames)
for fov in tqdm(fovs):
    distance = (1.8*10000)/(fov ** 2)
    T = [[0, 0, distance]]
    cameras = pytorch3d.renderer.FoVPerspectiveCameras(fov=fov, T=T, device=device)
Dolly zoom effect on a cow
Figure 2. A recreation of the dolly zoom effect.

Task 2. Practicing with Meshes 🧊

2.1. Constructing a Tetrahedron

A tetrahedron is a polyhedron with 4 vertices and 4 triangular faces. I constructed it using the following vertices and face indices. The camera is set to look at the center of the mesh.

vertices = torch.tensor([[1,2,1.5], [2,0,2], [-2, 0, 2], [0,0,0]], dtype = torch.float32) * 0.25
faces = torch.tensor([[0,1,2], [0,2,3], [0,1,3], [1,2,3]], dtype = torch.int64)

# Set the camera to look at the center of the tetrahedron
R, T = pytorch3d.renderer.look_at_view_transform(
    dist=2.7,
    elev=10.0,
    azim=angle,
    at=vertices.mean(0, keepdims=True)
)
Rendered tetrahedron
Figure 3. Render of the constructed tetrahedron.

2.2. Constructing a Cube

A cube mesh can be constructed from 8 vertices and 12 triangular faces (where each of the 6 square sides is made of two triangles).

vertices = torch.tensor([[1,1,1], [3,1,1], [3,3,1], [1,3,1],
                         [1,1,3], [3,1,3], [3,3,3], [1,3,3]], dtype = torch.float32) * 0.25
faces = torch.tensor([[0,1,2], [0,2,3], [4,5,6], [4,6,7],
                      [0,1,5], [0,5,4], [2,3,7], [2,7,6],
                      [1,2,6], [1,6,5], [0,3,7], [0,7,4]], dtype = torch.int64)

# Set the camera to look at the center of the cube
R, T = pytorch3d.renderer.look_at_view_transform(
    dist=2.7,
    elev=10.0,
    azim=angle,
    at=vertices.mean(0, keepdims=True)
)
Rendered cube
Figure 4. Render of the constructed cube.

Task 3. Re-texturing a Mesh 🎨

I re-textured the cow mesh by applying a color gradient based on the z-coordinate of each vertex. I assigned green (0, 1, 0) to vertices with the minimum z-value and blue (0, 0, 1) to vertices with the maximum z-value, with colors smoothly interpolated in between.

# Get the z-coordinates of the vertices
z = vertices[0, :, 2]

# Normalize the z-coordinates to create an alpha value for interpolation
alpha = (z - z.min()) / (z.max() - z.min())
alpha = alpha[:, None]

# Define the two colors for the gradient
color1 = torch.tensor([0., 1., 0.], device=z.device) # Green
color2 = torch.tensor([0., 0., 1.], device=z.device) # Blue

# Interpolate between the two colors
color  = alpha * color2 + (1 - alpha) * color1
Cow with z-axis texture
Figure 5. A cow textured with a color gradient along the Z-axis.

This same technique can be used to visualize the X and Y coordinate systems as well.

X-Axis Visualization Y-Axis Visualization
Cow with x-axis texture
Figure 6. Gradient along the X-axis.
Cow with y-axis texture
Figure 7. Gradient along the Y-axis.

Task 4. Camera Transformations 🔄

This task involves applying transformations to the camera to change the object's appearance in the rendered image. PyTorch3D uses a coordinate system where +X is left, +Y is up, and +Z is forward (out of the screen).

PyTorch3D coordinate system
Figure 8. PyTorch3D Coordinate System.

Scenario 1: 90-degree rotation

To rotate the cow 90 degrees about the camera's Z-axis, I applied the following relative rotation.

R_rel = [[0, -1, 0], [-1, 0, 0], [0, 0, 1]]
T_rel = [0, 0, 0]
Cow rotated 90 degrees
Figure 9. Cow rotated 90 degrees.

Scenario 2: Move the camera backward

To make the cow appear further away, I moved the camera backward by increasing its Z-position.

R_relative = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
T_relative = [0, 0, 3]
Cow viewed from a greater distance
Figure 10. Cow viewed from a greater distance.

Scenario 3: Move the camera left and up

To move the cow to the bottom-right of the frame, I moved the camera to the left (+X direction) and up (-Y direction).

R_relative = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
T_relative = [0.4, -0.6, 0]
Cow moved to the bottom-right
Figure 11. Cow moved to the bottom-right.

Scenario 4: Rotate the camera to look at the side

To view the cow from the side, I rotated the camera 90 degrees about its Y-axis.

R_relative = [[0, 0, -1], [0, 1, 0], [-1, 0, 0]]
T_relative = [0, 0, 0]
Cow rotated to show its side profile
Figure 12. Cow rotated to show its side profile.

Task 5. Rendering Generic 3D Representations

5.1. Point Clouds from RGB-D

I constructed 3D point clouds by "unprojecting" pixels from two RGB-D images into 3D space. Each pixel's color and depth value were used to calculate its (X, Y, Z) coordinate.

points, rgb = unproject_depth_image(
    torch.from_numpy(data['rgb1']),
    torch.from_numpy(data['mask1']),
    torch.from_numpy(data['depth1']),
    data['cameras1']
)

The results below show the point cloud from the first view, the second view, and a combined view.

View 1 View 2 Combined
Point cloud from image 1
Figure 13. Point cloud from the first image.
Point cloud from image 2
Figure 14. Point cloud from the second image.
Combined point cloud
Figure 15. Combined point cloud.

5.2. Parametric Functions

I generated a point cloud of a torus using its parametric equations. The major radius ($R_{tor}$) is the distance from the center of the tube to the center of the torus, and the minor radius ($r_{tor}$) is the radius of the tube.

# Define angles for sampling
phi = torch.linspace(0, 2 * np.pi, num_samples)
theta = torch.linspace(0, 2 * np.pi, num_samples)
Phi, Theta = torch.meshgrid(phi, theta)

# Torus parametric equations
R_tor = 1.0
r_tor = 0.5
x = torch.cos(Phi) * (R_tor + r_tor * torch.cos(Theta))
y = torch.sin(Phi) * (R_tor + r_tor * torch.cos(Theta))
z = r_tor * torch.sin(Theta)

points = torch.stack((x.flatten(), y.flatten(), z.flatten()), dim=1)
color = (points - points.min()) / (points.max() - points.min())

Results with Major Radius = 1.0, Minor Radius = 0.5

(The density of the point cloud increases with the number of samples.)

50 Samples
Torus with 50 samples
100 Samples
Torus with 100 samples
250 Samples
Torus with 250 samples
500 Samples
Torus with 500 samples
Figure 16. Torus point clouds with varying sample counts.

5.3. Implicit Surfaces

I created a mesh of a torus from an implicit function using the marching cubes algorithm. This method defines the surface as the set of points where a function equals a specific value (the isovalue).

# Create a grid of points (voxels)
min_value = -1.6
max_value = 1.6
X, Y, Z = torch.meshgrid([torch.linspace(min_value, max_value, voxel_size)] * 3)

# Implicit function for a torus
R_tor = 1.0
r_tor = 0.5
voxels = ((X * X + Y * Y) ** 0.5 - R_tor) ** 2 + (Z * Z - r_tor * r_tor)

# Extract the mesh using marching cubes
vertices, faces = mcubes.marching_cubes(mcubes.smooth(voxels), isovalue=0)

Tradeoffs between point clouds and meshes: