Rendering Basics with PyTorch3D¶

Part 1 -- Practicing with Cameras¶

1.1 - 360 Degree Renders¶

In the following task, I rendered a 3D Cow object from 18 different viewpoints. The elevation and the distance of the camera object was fixed but the azimuth ranged from -180° to 180° (thus, changing 20° in each viewpoint). Using the look_at_view_transform function in pytorch3d, I got 18 different (Rotation,Translation) matrix pairs that were used to transform the camera. Putting together all the rendered images, I get the below gif:

1.2 - Re-creating the Dolly Zoom¶

The idea of a dolly zoom is to change the focal length of the camera while moving the camera in a way that the subject stays the same size in the frame. (i.e. if you're increasing the focal length, effectively zooming in, then also increase the distance of the camera from the subject).

To achieve this, I scaled FoV (Field of view, which is related to Focal Length) and T (Translation matrix, which includes a z-axis distance component) accordingly:

fovs = torch.linspace(5, 120, num_frames)
min_focal_length = 1 / torch.tan(torch.deg2rad(fovs[0])/2)

renders = []
for i, fov in enumerate(tqdm(fovs)):
    curr_focal_length = 1 / torch.tan(torch.deg2rad(fov)/2) # Converting FOV to focal length
    distance = 50 * curr_focal_length / min_focal_length    # Ensures f/d stays constant
    T = [[0, 0, distance]]

In the above, 50 is chosen as the minimum distance (empirically, based on how it looked visually).

Using the obtained FoV and T, we can create a FoVPerspectiveCamera and use that while rendering. Once again, putting together all the rendered images into a gif gives us the desired (unsettling) dolly zoom effect:

Part 2 -- Practicing with Meshes¶

2.1 - Constructing a Tetrahedron¶

In this section, I built familiarity with meshes by manully defining my own vertices/faces for a tetrahedron mesh.

A tetrahedron should have 4 vertices and 4 faces (all the possible 3-vertex groups), which I defined as following:

vertices = torch.tensor(
    [[0,0,0],
    [x,x,0],
    [0,x,x],
    [x,0,x]], dtype=torch.float32
)
faces = torch.tensor([
    [0,1,2],
    [0,1,3],
    [1,2,3],
    [2,0,3]
], dtype=torch.long)

Lighting, camera, and texture did not need to be changed meaningfully. The tetrahedron can be seen from multiple viewpoints below:

2.2 - Constructing a Cube¶

This was the same as the previous part (2.1) except I was now defining the vertices and faces for a cube. Interestingly, since we are working with triangle meshes, each square face of the cube had to be defined by 2 triangle faces. This gives us 8 vertices (standard for a cube) and 12 faces (6 square faces x 2).

vertices = torch.tensor(
    [[0,0,0],
    [x,0,0],
    [x,x,0],
    [0,x,0],
    [0,0,x],
    [x,0,x],
    [x,x,x],
    [0,x,x]], dtype=torch.float32
)

faces = torch.tensor([
    [0,3,1],
    [1,3,2],
    [4,5,7],
    [5,6,7],
    [0,1,4],
    [1,5,4],
    [1,2,5],
    [2,6,5],
    [2,3,6],
    [3,7,6],
    [3,0,7],
    [0,4,7]
], dtype=torch.long)

Also, following pytorch3D convention, I defined each face as the 3 vertices in an anti-clockwise order when seen from outside the object. The following is the rendered cube:

Part 3 -- Re-texturing a Mesh¶

In this section, I built a better understanding of the texture of a mesh by having the color of a rendered object change smoothly across the z-axis. To do so:

I defined the 2 base colors I wanted (pink and a very dark blue), as defined by the following tensors:

color1 = torch.tensor([1.0, 0.7, 0.8])
color2 = torch.tensor([0.0, 0.0, 0.1])

Then, I found the smallest and largest z coordinates of the object and then calculated the following alpha mask, which is used for interpolating the colors.

alpha = (z - z_min) / (z_max - z_min)
color = alpha * color2 + (1 - alpha) * color1

The final rendered effect with this texture is as seen below:

Part 4 -- Camera Transformations¶

In this section, I built a better understanding of camera transformations and how they affect the view of the rendered object.

The camera intrinsics are defined using the R_relative and T_relative rotation and translation matrices.

The translation matrix has shape [3] and defines translations in X, Y, Z coordinates.

The rotation matrix has shape [3, 3] and for different $\theta$ rotations about the X, Y, Z axes, we can interpret the roll, pitch, yaw rotation matrices as following:

T_relative and R_relative are used to define the camera intrinsics as follows:

R = R_relative @ R_0
T = R_relative @ T_0 + T_relative

Where R_0 is the identity matrix (implying no rotation) and T_relative is the matrix [0,0,0] (implying no translation). Below are some of the different camera transformations and how R_relative and T_relative were defined.

**Note: I used Rotation.from_euler to convert a desired $\theta$ rotation into its corresponding rotation matrix:

Base cow:

Camera Rotated 90° left

T_relative is all 0s

R_relative = -90° rotation in the Z-axis

Camera Moved Away

T_relative = Positive value in the Zth dimension, 0s for X and Y

R_relative = identity

Camera Moved to Upper Right

T_relative = Positive X, Negative Y, 0 Z

R_relative = identity

** Note that because of the coordinate system, a negative Y corresponds with moving the camera up

Camera Rotated to Cow's Right Side

T_relative = Negative X, 0 Y, Positive Z

R_relative = 90° Rotation in the Yth axis

** Note that we also have to use a translation because once the camera is rotated, the cow is out of view. Moving the camera back and left brings it back in view

Part 5 -- Rendering Generic 3D Representations¶

5.1 - Rendering Point Clouds from RGB-D Images¶

In this section, I rendered point-clouds in Pytorch3d using RGB-D images as the input. We were given a unproject_depth_image function, which takes in the RGB data, the depth information, and a mask (corresponding to what the object is - a plant in this case), and gives back the specific points and their RGB values.

I then constructed the following 3 point clouds:

A point cloud of the plant given an image from 1 viewpoint
A point cloud of the plant given an image from a 2nd viewpoint
A point cloud of the plant formed by the union of the first 2 point clouds.

Constructing the 3rd point cloud was as simple as just concatenating the list of points/rgb values and rendering. We see a much more full view of the plant in this point cloud.

** Note: I did have to define a Rotation matrix that flips my rendering camera, because of a coordinate mismatch between the camera used to capture the initial images and the one I was using in Pytorch3d.

Point Cloud 1

Point Cloud 2

Point Cloud 3

5.2 - Parametric Functions¶

In this section, I sampled a parametric function to generate a point cloud for a desired object. The parametric function (given some value of $\Theta$ and $\Phi$) gives a 3D point that we can add to our point-cloud. To render a Torus (i.e. a Donut), I used the following parametric equations:

$$ \begin{aligned} x &= (R + r \sin(\Theta)) \cos(\Phi) \\ y &= (R + r \sin(\Theta)) \sin(\Phi) \\ z &= r \cos(\Theta) \end{aligned} \qquad \text{where } \Theta \in [0, 2\pi], \ \Phi \in [0, 2\pi]. $$

Here:

R = Distance from the center of the donut-hole to the center of the tube
r = Radius of the tube

As we sample more points from this parametric function, we get a better, more filled-in looking point cloud. Below you can see the results for different levels of sampling:

Num_samples = 50, 100 and 200 respectively

(This determines a num_samples x num_samples grid of $\Theta \Phi$ values)

I then did the same for an umbilic torus, which is a variation of a torus with the following parametric equations:

$$ \begin{aligned} x &= \sin(\theta)\,\Big(7 + \cos\!\Big(\tfrac{\theta}{3} - 2\phi\Big) + 2\cos\!\Big(\tfrac{\theta}{3} + \phi\Big)\Big) \\ y &= \cos(\theta)\,\Big(7 + \cos\!\Big(\tfrac{\theta}{3} - 2\phi\Big) + 2\cos\!\Big(\tfrac{\theta}{3} + \phi\Big)\Big) \\ z &= \sin\!\Big(\tfrac{\theta}{3} - 2\phi\Big) + 2\sin\!\Big(\tfrac{\theta}{3} + \phi\Big) \end{aligned} \qquad \text{where } \theta, \phi \in [-\pi, \pi]. $$

Num_samples = 50, 200 and 500 respectively:

5.3 - Implicit Surfaces¶

In this section, we are similarly rendering 3D objects but now using implicit surfaces rather than parametric equations. This requires:

Defining an implicit function F(x,y,z) which evaluates to 0 when (x,y,z) are on the surface of the 3d object. (Negative values mean we're inside the object, positive values mean we're outside).
Discretizing the 3D space into voxels (of some desired resolution)
Evaluating the implicit function at those voxel locations and storing the resultant values
Using marching cubes to extract the mesh which corresponds to the 0-level set
Converting the marching cubes retrieved coordinates into world space (instead of the grid space)
Rendering this mesh.

I rendered the mesh of a Torus for different voxel-resolutions. The implicit function is as follows:

$$ F(x,y,z) = \left(\sqrt{x^2 + y^2} - R\right)^2 + z^2 - r^2 $$

We can see how dramatically different the first and last renders are.

Voxel resolutions are: $16^{3}, 64^{3}, 128^{3}$ respectively:

Doing the same for a heart-shaped object, with implicit function:

$$ F(x,y,z) \;=\; \Big(x^2 + \tfrac{9}{4}y^2 + z^2 - 1\Big)^3 \;-\; x^2 z^3 \;-\; \tfrac{9}{80} y^2 z^3 $$

I also did a color interpolation along the z-axis going from pink to red.

Voxel resolution is $128^{3}$:

Now that I've rendered the Torus using both parametric equations and implicit functions, here are some trade offs of the 2 approaches:

Parametric functions are faster to render and require much less memory generally. This is because you're not computing any connectivity between the points and just rendering a few hundred/thousand points.
Implicit functions will require creating a large voxel grid (much more memorage usage), and will be slower since you have to evaluate at each voxel location, run marching cubes, and then render.
However, the meshes (which we get from implicit functions) are generally of higher quality since they have connectivity, so the objects look more solid/true to their original, rather than a collection of points.
Point clouds/parametric equations are also simpler and have higher ease of use, since we can just sample and render without any intermittent steps (i.e. marching cubes for implicit functions), but the meshes obtained from implicit functions may be useful for downstream use cases like ray tracing, shading, simulations, etc.

Part 6 -- Do Something Fun¶

Taking the learnings of a few different parts of this assignment, I wanted to do a point cloud interpolation so that I start with 1 point cloud (corresponding to an object) and move the points and end with another.

Specifically, I will be starting with a sphere and ending with our cow object.

To make this effect even more appealing, I also want to interpolate the colors so that the sphere points start off with a plain color and end with color matching the texture of the cow.

This idea involved the following:

Sampling faces and barycentric coordinates from both the sphere and the cow

We do this so we have the same number of points for both objects (since there is a mismatch in vertices/faces) -- Chose 20,000 points here
Doing this step involves the process described in Part 7

Sampling textures from the cow

This builds on 1. since we still sample faces and barycentric coordinates but it is trickier because to get the correct textures, we have to work in the (U, V) coordinate system of the texture image
We use the verts_uvs_padded and faces_uv_padded variables of a mesh, which specify a (U, V) coordinate in a texture image for every single vertex (Giving the shape (1,num_vertices,2) or (1,num_faces,2))
Use the sampled faces from step 1, use the same barycentric coordinate interpolation, but now, also use torch.nn.functional.grid_sample to interpolate the colors and get a resultant color for each point in the point cloud.

For a set number of interpolation steps, get the blended/interpolated point cloud for that step (both in terms of vertex positions and the colors). This is using the same underlying idea we used in part (3) when re-texturing a mesh
Render each point cloud, while rotating the camera around, and then create a gif of all rendered images.

I chose brown so it would almost look like the cow is being formed out of clay and painted. Note that for the last few frames, the point cloud transformation has already ended, I just wanted to give it some time so you can see the resultant cow properly.

Part 7 -- Sampling Points on Meshes¶

In this part, we had to obtain a point cloud given the triangle mesh of an object, which requires sampling. The steps involved were:

Calculating the area of each face in the triangle mesh
Sampling a face with probability proportional to the area of the face
Sampling a random barycentric coordinate within the face uniformly:

To do step 3, I used the trick we learned in lecture where we first sample 2 numbers (a, b) in the range [0, 1]
Then, we obtain 3 coefficients using the following formulas:

$ c1 = 1 - \sqrt{a}$

$ c2 = b * \sqrt{a}$

$ c3 = 1 - c1 - c2$

Then, a uniform point inside the triangle can be found using:

$(c1 * v1, c2 * v2, c3 * v3)$

where v1, v2, v3 are the vertices of the triangle (i.e. of the face).

Do steps 2 and 3 for the number of points you want in your point cloud.
Render all the sampled points.

Below, you can see the obtained point clouds for 10, 100, 1000, and 10,000 points respectively. I rendered them as white points against a black background for better viewability.