Assignment 1. Rendering Basics with PyTorch3D

Course: 16-825 Learning For 3D Vision
Author: Karthik Pullalarevu
Andrew ID: kpullala

Introduction

In this assignment, you will learn the basics of rendering with PyTorch3D, explore 3D representations, and practice constructing simple geometry.

Task 1. Practicing with Cameras 📸

1.1. 360-degree Renders

To create a 360-degree render of an object, I followed these steps:

Load the object's vertices and faces.
Define a set of perspective cameras by specifying their rotation ($R$) and translation ($T$) matrices.
Render the mesh from each camera's viewpoint.
Loop through multiple viewpoints to create an animation.
Combine the rendered images into a GIF.

By keeping the camera's elevation fixed at 10.0 and its distance from the object at 2.7, I generated 36 distinct views by varying the azimuthal angle in 10-degree increments.

# Set elevation and azimuth of the views
for angle in range(0, 360, 10):
    R, T = pytorch3d.renderer.look_at_view_transform(dist=2.7, elev=10.0, azim=angle)
    rendered_image = render_cow(cow_path = '/home/karthik/Depth-Anything-V2/lf3/assignment1/data/cow.obj'
            ,device = 'cuda', R=R, T=T)
    all_images.append(rendered_image)

my_images = [np.array((img * 255).astype(np.uint8)) for img in all_images]
duration = 1000 // 15  # Convert FPS (frames per second) to duration (ms per frame)
imageio.mimsave('cow_rotation.gif', my_images, duration=duration, loop=0)

360 render of a cow — **Figure 1.** A 360-degree render of the cow mesh.

1.2. Recreating the Dolly Zoom

The dolly zoom is a classic cinematic effect that changes the camera's field of view (FoV) while simultaneously moving the camera to keep the subject the same size in the frame. To recreate this, I increased the FoV over time while moving the camera closer to the object according to the formula: $distance = \frac{1.8 \times 10^4}{fov^2}$.

fovs = torch.linspace(5, 120, num_frames)
for fov in tqdm(fovs):
    distance = (1.8*10000)/(fov ** 2)
    T = [[0, 0, distance]]
    cameras = pytorch3d.renderer.FoVPerspectiveCameras(fov=fov, T=T, device=device)

Dolly zoom effect on a cow — **Figure 2.** A recreation of the dolly zoom effect.

Task 2. Practicing with Meshes 🧊

2.1. Constructing a Tetrahedron

A tetrahedron is a polyhedron with 4 vertices and 4 triangular faces. I constructed it using the following vertices and face indices. The camera is set to look at the center of the mesh.

vertices = torch.tensor([[1,2,1.5], [2,0,2], [-2, 0, 2], [0,0,0]], dtype = torch.float32) * 0.25
faces = torch.tensor([[0,1,2], [0,2,3], [0,1,3], [1,2,3]], dtype = torch.int64)

# Set the camera to look at the center of the tetrahedron
R, T = pytorch3d.renderer.look_at_view_transform(
    dist=2.7,
    elev=10.0,
    azim=angle,
    at=vertices.mean(0, keepdims=True)
)

Rendered tetrahedron — **Figure 3.** Render of the constructed tetrahedron.

2.2. Constructing a Cube

A cube mesh can be constructed from 8 vertices and 12 triangular faces (where each of the 6 square sides is made of two triangles).

vertices = torch.tensor([[1,1,1], [3,1,1], [3,3,1], [1,3,1],
                         [1,1,3], [3,1,3], [3,3,3], [1,3,3]], dtype = torch.float32) * 0.25
faces = torch.tensor([[0,1,2], [0,2,3], [4,5,6], [4,6,7],
                      [0,1,5], [0,5,4], [2,3,7], [2,7,6],
                      [1,2,6], [1,6,5], [0,3,7], [0,7,4]], dtype = torch.int64)

# Set the camera to look at the center of the cube
R, T = pytorch3d.renderer.look_at_view_transform(
    dist=2.7,
    elev=10.0,
    azim=angle,
    at=vertices.mean(0, keepdims=True)
)

Rendered cube — **Figure 4.** Render of the constructed cube.

Task 3. Re-texturing a Mesh 🎨

I re-textured the cow mesh by applying a color gradient based on the z-coordinate of each vertex. I assigned green (0, 1, 0) to vertices with the minimum z-value and blue (0, 0, 1) to vertices with the maximum z-value, with colors smoothly interpolated in between.

# Get the z-coordinates of the vertices
z = vertices[0, :, 2]

# Normalize the z-coordinates to create an alpha value for interpolation
alpha = (z - z.min()) / (z.max() - z.min())
alpha = alpha[:, None]

# Define the two colors for the gradient
color1 = torch.tensor([0., 1., 0.], device=z.device) # Green
color2 = torch.tensor([0., 0., 1.], device=z.device) # Blue

# Interpolate between the two colors
color  = alpha * color2 + (1 - alpha) * color1

Cow with z-axis texture — **Figure 5.** A cow textured with a color gradient along the Z-axis.

This same technique can be used to visualize the X and Y coordinate systems as well.

X-Axis Visualization	Y-Axis Visualization
Figure 6. Gradient along the X-axis.	Figure 7. Gradient along the Y-axis.

Task 4. Camera Transformations 🔄

This task involves applying transformations to the camera to change the object's appearance in the rendered image. PyTorch3D uses a coordinate system where +X is left, +Y is up, and +Z is forward (out of the screen).

PyTorch3D coordinate system — **Figure 8.** PyTorch3D Coordinate System.

Scenario 1: 90-degree rotation

To rotate the cow 90 degrees about the camera's Z-axis, I applied the following relative rotation.

R_rel = [[0, -1, 0], [-1, 0, 0], [0, 0, 1]]
T_rel = [0, 0, 0]

Scenario 2: Move the camera backward

To make the cow appear further away, I moved the camera backward by increasing its Z-position.

R_relative = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
T_relative = [0, 0, 3]

**Figure 10.** Cow viewed from a greater distance.

Scenario 3: Move the camera left and up

To move the cow to the bottom-right of the frame, I moved the camera to the left (+X direction) and up (-Y direction).

R_relative = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
T_relative = [0.4, -0.6, 0]

**Figure 11.** Cow moved to the bottom-right.

Scenario 4: Rotate the camera to look at the side

To view the cow from the side, I rotated the camera 90 degrees about its Y-axis.

R_relative = [[0, 0, -1], [0, 1, 0], [-1, 0, 0]]
T_relative = [0, 0, 0]

Figure 12. Cow rotated to show its side profile.

Task 5. Rendering Generic 3D Representations

5.1. Point Clouds from RGB-D

I constructed 3D point clouds by "unprojecting" pixels from two RGB-D images into 3D space. Each pixel's color and depth value were used to calculate its (X, Y, Z) coordinate.

points, rgb = unproject_depth_image(
    torch.from_numpy(data['rgb1']),
    torch.from_numpy(data['mask1']),
    torch.from_numpy(data['depth1']),
    data['cameras1']
)

The results below show the point cloud from the first view, the second view, and a combined view.

View 1	View 2	Combined
Figure 13. Point cloud from the first image.	Figure 14. Point cloud from the second image.	Figure 15. Combined point cloud.

5.2. Parametric Functions

I generated a point cloud of a torus using its parametric equations. The major radius ($R_{tor}$) is the distance from the center of the tube to the center of the torus, and the minor radius ($r_{tor}$) is the radius of the tube.

# Define angles for sampling
phi = torch.linspace(0, 2 * np.pi, num_samples)
theta = torch.linspace(0, 2 * np.pi, num_samples)
Phi, Theta = torch.meshgrid(phi, theta)

# Torus parametric equations
R_tor = 1.0
r_tor = 0.5
x = torch.cos(Phi) * (R_tor + r_tor * torch.cos(Theta))
y = torch.sin(Phi) * (R_tor + r_tor * torch.cos(Theta))
z = r_tor * torch.sin(Theta)

points = torch.stack((x.flatten(), y.flatten(), z.flatten()), dim=1)
color = (points - points.min()) / (points.max() - points.min())

Results with Major Radius = 1.0, Minor Radius = 0.5

(The density of the point cloud increases with the number of samples.)

50 Samples	100 Samples
250 Samples	500 Samples

Figure 16. Torus point clouds with varying sample counts.

5.3. Implicit Surfaces

I created a mesh of a torus from an implicit function using the marching cubes algorithm. This method defines the surface as the set of points where a function equals a specific value (the isovalue).

# Create a grid of points (voxels)
min_value = -1.6
max_value = 1.6
X, Y, Z = torch.meshgrid([torch.linspace(min_value, max_value, voxel_size)] * 3)

# Implicit function for a torus
R_tor = 1.0
r_tor = 0.5
voxels = ((X * X + Y * Y) ** 0.5 - R_tor) ** 2 + (Z * Z - r_tor * r_tor)

# Extract the mesh using marching cubes
vertices, faces = mcubes.marching_cubes(mcubes.smooth(voxels), isovalue=0)

Tradeoffs between point clouds and meshes:

Point Clouds: Easier to generate and manipulate, but miss detailed surface information and can appear sparse.
Meshes: As seen in the images, meshes provide detailed surface representation. However, their memory usage was significantly higher.
Performance: I noticed a significantly larger processing time when working with meshes compared to point clouds. (i.e. Around 3x the time when I was generating with voxel size of 64.

Results with Major Radius = 1.0, Minor Radius = 0.5

(The mesh quality improves with a higher voxel grid resolution.)

Voxel Size: 8	Voxel Size: 16
Voxel Size: 32	Voxel Size: 64

Figure 18. Torus meshes with varying voxel sizes.

Task 6. Do Something Fun! ✈️

For this task, I constructed an airplane using several parametric shapes, including an ellipsoid for the fuselage and rectangular prisms for the wings and tail. I colored each component differently and created a dynamic animation by moving the camera to simulate a fly-by, complete with a dolly zoom effect.

def render_airplane(image_size=256, num_samples=100, device=None, R=None, T=None):
    """
    Renders a simple airplane using parametric sampling.
    Components: fuselage (ellipsoid), wings (rectangular), tail surfaces.
    """
    
    if device is None:
        device = get_device()
    
    points_list = []
    
    # 1. Fuselage (elongated ellipsoid)
    u = torch.linspace(0, 2 * np.pi, num_samples // 2)
    v = torch.linspace(0, np.pi, num_samples // 4)
    U, V = torch.meshgrid(u, v)
    
    fuselage_length = 4.0
    fuselage_width = 0.8
    fuselage_height = 0.6
    
    x_fus = fuselage_length * torch.cos(V) * torch.cos(U)
    y_fus = fuselage_width * torch.cos(V) * torch.sin(U)
    z_fus = fuselage_height * torch.sin(V)
    
    fuselage_points = torch.stack((x_fus.flatten(), y_fus.flatten(), z_fus.flatten()), dim=1)
    points_list.append(fuselage_points)
    
    # 2. Main wings (rectangular surfaces with taper)
    wing_span = 8.0
    wing_chord_root = 1.5
    wing_chord_tip = 0.8
    wing_position_x = 0.5  # Position along fuselage
    
    # Wing surface parameterization
    wing_u = torch.linspace(-wing_span/2, wing_span/2, num_samples // 3)
    wing_v = torch.linspace(0, 1, num_samples // 8)
    Wing_U, Wing_V = torch.meshgrid(wing_u, wing_v)
    
    # Tapered wing chord
    chord_at_span = wing_chord_root + (wing_chord_tip - wing_chord_root) * (torch.abs(Wing_U) / (wing_span/2))
    
    x_wing = wing_position_x + Wing_V * chord_at_span
    y_wing = Wing_U
    z_wing = torch.zeros_like(Wing_U) + 0.1  # Slight dihedral
    
    wing_points = torch.stack((x_wing.flatten(), y_wing.flatten(), z_wing.flatten()), dim=1)
    points_list.append(wing_points)
    
    # 3. Horizontal tail
    tail_span = 2.0
    tail_chord = 0.8
    tail_position_x = -3.5
    
    tail_u = torch.linspace(-tail_span/2, tail_span/2, num_samples // 6)
    tail_v = torch.linspace(0, 1, num_samples // 12)
    Tail_U, Tail_V = torch.meshgrid(tail_u, tail_v)
    
    x_tail = tail_position_x + Tail_V * tail_chord
    y_tail = Tail_U
    z_tail = torch.zeros_like(Tail_U) + 0.8  # Elevated tail
    
    tail_points = torch.stack((x_tail.flatten(), y_tail.flatten(), z_tail.flatten()), dim=1)
    points_list.append(tail_points)
    
    # 4. Vertical tail
    vtail_height = 1.5
    vtail_chord = 0.6
    
    vtail_u = torch.linspace(0, vtail_height, num_samples // 8)
    vtail_v = torch.linspace(0, 1, num_samples // 12)
    VTail_U, VTail_V = torch.meshgrid(vtail_u, vtail_v)
    
    x_vtail = tail_position_x + VTail_V * vtail_chord
    y_vtail = torch.zeros_like(VTail_U)
    z_vtail = VTail_U + 0.3
    
    vtail_points = torch.stack((x_vtail.flatten(), y_vtail.flatten(), z_vtail.flatten()), dim=1)
    points_list.append(vtail_points)
    
    # Combine all points
    points = torch.cat(points_list, dim=0)
    
    # Color coding by component
    num_fuselage = fuselage_points.shape[0]
    num_wing = wing_points.shape[0]
    num_htail = tail_points.shape[0]
    num_vtail = vtail_points.shape[0]
    
    # Create color features (RGB for different components)
    colors = torch.zeros(points.shape[0], 3)
    colors[:num_fuselage] = torch.tensor([0.7, 0.7, 0.9])  # Light blue fuselage
    colors[num_fuselage:num_fuselage+num_wing] = torch.tensor([0.9, 0.7, 0.7])  # Light red wings
    colors[num_fuselage+num_wing:num_fuselage+num_wing+num_htail] = torch.tensor([0.7, 0.9, 0.7])  # Light green tail
    colors[num_fuselage+num_wing+num_htail:] = torch.tensor([0.9, 0.9, 0.7])  # Light yellow vertical tail
    
    airplane_point_cloud = pytorch3d.structures.Pointclouds(
        points=[points], features=[colors],
    ).to(device)
    
    # cameras = pytorch3d.renderer.FoVPerspectiveCameras(R=R, T=T, device=device)
    # renderer = get_points_renderer(image_size=image_size, device=device, background_color=(0, 0, 0))
    # rend = renderer(airplane_point_cloud, cameras=cameras)
    from tqdm import tqdm 
    
    fovs = torch.linspace(5, 120, 10)

    renders = []
    for fov in tqdm(fovs):
        distance = (8.0*10000)/(fov ** 2)  # TODO: change this.
        T = [[0, 0, distance]]  # TODO: Change this.
        cameras = pytorch3d.renderer.FoVPerspectiveCameras(fov=fov, T=T, device=device)
        renderer = get_points_renderer(image_size=image_size, device=device, background_color=(0, 0, 70))
        rend = renderer(airplane_point_cloud, cameras=cameras)
        rend = rend[0, ..., :3].cpu().numpy()  # (N, H, W, 3)
        renders.append(rend)

    from PIL import Image, ImageDraw
    import imageio
    images = []
    for i, r in enumerate(renders):
        image = Image.fromarray((r * 255).astype(np.uint8))
        draw = ImageDraw.Draw(image)
        draw.text((20, 20), f"fov: {fovs[i]:.2f}", fill=(255, 0, 0))
        images.append(np.array(image))
    imageio.mimsave('aeroplane_dolly_zoom.gif', images, duration=10, loop=0, fps=10)

Parametric airplane — **Figure 20.** A parametrically generated airplane with a dolly zoom effect.

References

Field of view image from Videoguys.
Implicit function and sampling of Torus from Wikipedia.