Course: 16-825 Learning For 3D Vision
Author: Karthik Pullalarevu
Andrew ID: kpullala
In this assignment, you will learn the basics of rendering with PyTorch3D, explore 3D representations, and practice constructing simple geometry.
To create a 360-degree render of an object, I followed these steps:
By keeping the camera's elevation fixed at 10.0 and its distance from the object at 2.7, I generated 36 distinct views by varying the azimuthal angle in 10-degree increments.
# Set elevation and azimuth of the views
for angle in range(0, 360, 10):
R, T = pytorch3d.renderer.look_at_view_transform(dist=2.7, elev=10.0, azim=angle)
rendered_image = render_cow(cow_path = '/home/karthik/Depth-Anything-V2/lf3/assignment1/data/cow.obj'
,device = 'cuda', R=R, T=T)
all_images.append(rendered_image)
my_images = [np.array((img * 255).astype(np.uint8)) for img in all_images]
duration = 1000 // 15 # Convert FPS (frames per second) to duration (ms per frame)
imageio.mimsave('cow_rotation.gif', my_images, duration=duration, loop=0)
The dolly zoom is a classic cinematic effect that changes the camera's field of view (FoV) while simultaneously moving the camera to keep the subject the same size in the frame. To recreate this, I increased the FoV over time while moving the camera closer to the object according to the formula: $distance = \frac{1.8 \times 10^4}{fov^2}$.
fovs = torch.linspace(5, 120, num_frames)
for fov in tqdm(fovs):
distance = (1.8*10000)/(fov ** 2)
T = [[0, 0, distance]]
cameras = pytorch3d.renderer.FoVPerspectiveCameras(fov=fov, T=T, device=device)
A tetrahedron is a polyhedron with 4 vertices and 4 triangular faces. I constructed it using the following vertices and face indices. The camera is set to look at the center of the mesh.
vertices = torch.tensor([[1,2,1.5], [2,0,2], [-2, 0, 2], [0,0,0]], dtype = torch.float32) * 0.25
faces = torch.tensor([[0,1,2], [0,2,3], [0,1,3], [1,2,3]], dtype = torch.int64)
# Set the camera to look at the center of the tetrahedron
R, T = pytorch3d.renderer.look_at_view_transform(
dist=2.7,
elev=10.0,
azim=angle,
at=vertices.mean(0, keepdims=True)
)
A cube mesh can be constructed from 8 vertices and 12 triangular faces (where each of the 6 square sides is made of two triangles).
vertices = torch.tensor([[1,1,1], [3,1,1], [3,3,1], [1,3,1],
[1,1,3], [3,1,3], [3,3,3], [1,3,3]], dtype = torch.float32) * 0.25
faces = torch.tensor([[0,1,2], [0,2,3], [4,5,6], [4,6,7],
[0,1,5], [0,5,4], [2,3,7], [2,7,6],
[1,2,6], [1,6,5], [0,3,7], [0,7,4]], dtype = torch.int64)
# Set the camera to look at the center of the cube
R, T = pytorch3d.renderer.look_at_view_transform(
dist=2.7,
elev=10.0,
azim=angle,
at=vertices.mean(0, keepdims=True)
)
I re-textured the cow mesh by applying a color gradient based on the z-coordinate of each vertex. I assigned green (0, 1, 0) to vertices with the minimum z-value and blue (0, 0, 1) to vertices with the maximum z-value, with colors smoothly interpolated in between.
# Get the z-coordinates of the vertices
z = vertices[0, :, 2]
# Normalize the z-coordinates to create an alpha value for interpolation
alpha = (z - z.min()) / (z.max() - z.min())
alpha = alpha[:, None]
# Define the two colors for the gradient
color1 = torch.tensor([0., 1., 0.], device=z.device) # Green
color2 = torch.tensor([0., 0., 1.], device=z.device) # Blue
# Interpolate between the two colors
color = alpha * color2 + (1 - alpha) * color1
This same technique can be used to visualize the X and Y coordinate systems as well.
| X-Axis Visualization | Y-Axis Visualization |
|---|---|
|
|
This task involves applying transformations to the camera to change the object's appearance in the rendered image. PyTorch3D uses a coordinate system where +X is left, +Y is up, and +Z is forward (out of the screen).
To rotate the cow 90 degrees about the camera's Z-axis, I applied the following relative rotation.
R_rel = [[0, -1, 0], [-1, 0, 0], [0, 0, 1]]
T_rel = [0, 0, 0]
To make the cow appear further away, I moved the camera backward by increasing its Z-position.
R_relative = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
T_relative = [0, 0, 3]
To move the cow to the bottom-right of the frame, I moved the camera to the left (+X direction) and up (-Y direction).
R_relative = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
T_relative = [0.4, -0.6, 0]
To view the cow from the side, I rotated the camera 90 degrees about its Y-axis.
R_relative = [[0, 0, -1], [0, 1, 0], [-1, 0, 0]]
T_relative = [0, 0, 0]
I constructed 3D point clouds by "unprojecting" pixels from two RGB-D images into 3D space. Each pixel's color and depth value were used to calculate its (X, Y, Z) coordinate.
points, rgb = unproject_depth_image(
torch.from_numpy(data['rgb1']),
torch.from_numpy(data['mask1']),
torch.from_numpy(data['depth1']),
data['cameras1']
)
The results below show the point cloud from the first view, the second view, and a combined view.
| View 1 | View 2 | Combined |
|---|---|---|
|
|
|
I generated a point cloud of a torus using its parametric equations. The major radius ($R_{tor}$) is the distance from the center of the tube to the center of the torus, and the minor radius ($r_{tor}$) is the radius of the tube.
# Define angles for sampling
phi = torch.linspace(0, 2 * np.pi, num_samples)
theta = torch.linspace(0, 2 * np.pi, num_samples)
Phi, Theta = torch.meshgrid(phi, theta)
# Torus parametric equations
R_tor = 1.0
r_tor = 0.5
x = torch.cos(Phi) * (R_tor + r_tor * torch.cos(Theta))
y = torch.sin(Phi) * (R_tor + r_tor * torch.cos(Theta))
z = r_tor * torch.sin(Theta)
points = torch.stack((x.flatten(), y.flatten(), z.flatten()), dim=1)
color = (points - points.min()) / (points.max() - points.min())
(The density of the point cloud increases with the number of samples.)
50 Samples![]() |
100 Samples![]() |
250 Samples![]() |
500 Samples![]() |
I created a mesh of a torus from an implicit function using the marching cubes algorithm. This method defines the surface as the set of points where a function equals a specific value (the isovalue).
# Create a grid of points (voxels)
min_value = -1.6
max_value = 1.6
X, Y, Z = torch.meshgrid([torch.linspace(min_value, max_value, voxel_size)] * 3)
# Implicit function for a torus
R_tor = 1.0
r_tor = 0.5
voxels = ((X * X + Y * Y) ** 0.5 - R_tor) ** 2 + (Z * Z - r_tor * r_tor)
# Extract the mesh using marching cubes
vertices, faces = mcubes.marching_cubes(mcubes.smooth(voxels), isovalue=0)
Tradeoffs between point clouds and meshes:
(The mesh quality improves with a higher voxel grid resolution.)
Voxel Size: 8![]() |
Voxel Size: 16![]() |
Voxel Size: 32![]() |
Voxel Size: 64![]() |
For this task, I constructed an airplane using several parametric shapes, including an ellipsoid for the fuselage and rectangular prisms for the wings and tail. I colored each component differently and created a dynamic animation by moving the camera to simulate a fly-by, complete with a dolly zoom effect.
def render_airplane(image_size=256, num_samples=100, device=None, R=None, T=None):
"""
Renders a simple airplane using parametric sampling.
Components: fuselage (ellipsoid), wings (rectangular), tail surfaces.
"""
if device is None:
device = get_device()
points_list = []
# 1. Fuselage (elongated ellipsoid)
u = torch.linspace(0, 2 * np.pi, num_samples // 2)
v = torch.linspace(0, np.pi, num_samples // 4)
U, V = torch.meshgrid(u, v)
fuselage_length = 4.0
fuselage_width = 0.8
fuselage_height = 0.6
x_fus = fuselage_length * torch.cos(V) * torch.cos(U)
y_fus = fuselage_width * torch.cos(V) * torch.sin(U)
z_fus = fuselage_height * torch.sin(V)
fuselage_points = torch.stack((x_fus.flatten(), y_fus.flatten(), z_fus.flatten()), dim=1)
points_list.append(fuselage_points)
# 2. Main wings (rectangular surfaces with taper)
wing_span = 8.0
wing_chord_root = 1.5
wing_chord_tip = 0.8
wing_position_x = 0.5 # Position along fuselage
# Wing surface parameterization
wing_u = torch.linspace(-wing_span/2, wing_span/2, num_samples // 3)
wing_v = torch.linspace(0, 1, num_samples // 8)
Wing_U, Wing_V = torch.meshgrid(wing_u, wing_v)
# Tapered wing chord
chord_at_span = wing_chord_root + (wing_chord_tip - wing_chord_root) * (torch.abs(Wing_U) / (wing_span/2))
x_wing = wing_position_x + Wing_V * chord_at_span
y_wing = Wing_U
z_wing = torch.zeros_like(Wing_U) + 0.1 # Slight dihedral
wing_points = torch.stack((x_wing.flatten(), y_wing.flatten(), z_wing.flatten()), dim=1)
points_list.append(wing_points)
# 3. Horizontal tail
tail_span = 2.0
tail_chord = 0.8
tail_position_x = -3.5
tail_u = torch.linspace(-tail_span/2, tail_span/2, num_samples // 6)
tail_v = torch.linspace(0, 1, num_samples // 12)
Tail_U, Tail_V = torch.meshgrid(tail_u, tail_v)
x_tail = tail_position_x + Tail_V * tail_chord
y_tail = Tail_U
z_tail = torch.zeros_like(Tail_U) + 0.8 # Elevated tail
tail_points = torch.stack((x_tail.flatten(), y_tail.flatten(), z_tail.flatten()), dim=1)
points_list.append(tail_points)
# 4. Vertical tail
vtail_height = 1.5
vtail_chord = 0.6
vtail_u = torch.linspace(0, vtail_height, num_samples // 8)
vtail_v = torch.linspace(0, 1, num_samples // 12)
VTail_U, VTail_V = torch.meshgrid(vtail_u, vtail_v)
x_vtail = tail_position_x + VTail_V * vtail_chord
y_vtail = torch.zeros_like(VTail_U)
z_vtail = VTail_U + 0.3
vtail_points = torch.stack((x_vtail.flatten(), y_vtail.flatten(), z_vtail.flatten()), dim=1)
points_list.append(vtail_points)
# Combine all points
points = torch.cat(points_list, dim=0)
# Color coding by component
num_fuselage = fuselage_points.shape[0]
num_wing = wing_points.shape[0]
num_htail = tail_points.shape[0]
num_vtail = vtail_points.shape[0]
# Create color features (RGB for different components)
colors = torch.zeros(points.shape[0], 3)
colors[:num_fuselage] = torch.tensor([0.7, 0.7, 0.9]) # Light blue fuselage
colors[num_fuselage:num_fuselage+num_wing] = torch.tensor([0.9, 0.7, 0.7]) # Light red wings
colors[num_fuselage+num_wing:num_fuselage+num_wing+num_htail] = torch.tensor([0.7, 0.9, 0.7]) # Light green tail
colors[num_fuselage+num_wing+num_htail:] = torch.tensor([0.9, 0.9, 0.7]) # Light yellow vertical tail
airplane_point_cloud = pytorch3d.structures.Pointclouds(
points=[points], features=[colors],
).to(device)
# cameras = pytorch3d.renderer.FoVPerspectiveCameras(R=R, T=T, device=device)
# renderer = get_points_renderer(image_size=image_size, device=device, background_color=(0, 0, 0))
# rend = renderer(airplane_point_cloud, cameras=cameras)
from tqdm import tqdm
fovs = torch.linspace(5, 120, 10)
renders = []
for fov in tqdm(fovs):
distance = (8.0*10000)/(fov ** 2) # TODO: change this.
T = [[0, 0, distance]] # TODO: Change this.
cameras = pytorch3d.renderer.FoVPerspectiveCameras(fov=fov, T=T, device=device)
renderer = get_points_renderer(image_size=image_size, device=device, background_color=(0, 0, 70))
rend = renderer(airplane_point_cloud, cameras=cameras)
rend = rend[0, ..., :3].cpu().numpy() # (N, H, W, 3)
renders.append(rend)
from PIL import Image, ImageDraw
import imageio
images = []
for i, r in enumerate(renders):
image = Image.fromarray((r * 255).astype(np.uint8))
draw = ImageDraw.Draw(image)
draw.text((20, 20), f"fov: {fovs[i]:.2f}", fill=(255, 0, 0))
images.append(np.array(image))
imageio.mimsave('aeroplane_dolly_zoom.gif', images, duration=10, loop=0, fps=10)