16-825 Assignment 2: Single View to 3D

1. Exploring Loss Functions

1.1. Fitting a voxel grid (5 points)

Voxel fitting Voxel ground truth

1.2. Fitting a point cloud (5 points)

Pointcloud fitting Voxel ground truth

1.3. Fitting a mesh (5 points)

Mesh fitting Voxel ground truth

2. Reconstructing 3D from Single View

2.1. Image to voxel grid (20 points)

Voxel Training Comparison

class DecoderBlock(nn.Module):
    def __init__(self, in_channels, out_channels, skip_channels=0):
        super(DecoderBlock, self).__init__()
        self.layers = nn.Sequential(
            nn.ConvTranspose3d(in_channels+skip_channels, out_channels, stride=2, kernel_size=4, padding=1),
            nn.BatchNorm3d(out_channels),
            nn.ReLU(inplace=True),
            nn.ConvTranspose3d(out_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm3d(out_channels),
        )
        self.relu = nn.ReLU(inplace=True)
        self.skip = nn.ConvTranspose3d(in_channels, out_channels, stride=2, kernel_size=4, padding=1)

    def forward(self, x):
        skip = self.skip(x)
        for layer in self.layers:
            # print(f"\tDecoder Layer: {layer.__class__.__name__}, x input shape: {x.shape}")
            x = layer(x)
            # print(f"\tx output shape: {x.shape}")
        # print(f"{'#'*20} Final shape for block: {x.shape} {'#'*20}")
        x = self.relu(x + skip)
        return x

        if args.type == "vox":
            # Input: b x 512
            # Output: b x 32 x 32 x 32
            self.projection = nn.Linear(512, 512*8*8*8)
            self.decoder = torch.nn.Sequential(
                DecoderBlock(512, 256), # 4x4
                DecoderBlock(256, 128), # 4x4
                nn.Conv3d(128, 1, kernel_size=3, padding=1),
                nn.Sigmoid()
            )

Voxel Training Results

2.2. Image to point cloud (20 points)

Input Image

        self.decoder = nn.Sequential(
            nn.Linear(512, 1024),
            nn.ReLU(inplace=True),
            nn.Linear(1024, 2048),
            nn.ReLU(inplace=True),
            nn.Linear(2048, self.n_point * 3),
        )

PointCloud Training Results

2.3. Image to mesh (20 points)

Input Image

        self.decoder = nn.Sequential(
            nn.Linear(512, 1024),
            nn.ReLU(inplace=True),
            nn.Linear(1024, 2048),
            nn.ReLU(inplace=True),
            nn.Linear(2048, self.n_mesh_verts * 3),
        )

Mesh Training Results

2.4. Quantitative comparisons (10 points)

Voxel Evaluation Results

Interpretation:

Pointcloud Evaluation Results

Interpretation:

Mesh Evaluation Results

Interpretation:

2.5. Analyse effects of hyperparameter variations (10 points)

Parameter Studied: Voxel extraction threshold for marching cubes and cubify operations

Motivation:
The threshold parameter controls the isosurface value used when converting predicted voxel occupancy grids into explicit 3D meshes. This is a critical hyperparameter because:

Experimental Setup:

Results:

Threshold 0.2 Threshold 0.3 Threshold 0.5
Threshold 0.2 Threshold 0.3 Threshold 0.5

Analysis:
I found tuning this hyperparameter to be the most interesting as it reveals where the model feels most confident about its predictions. The model seems to feel most confident about voxels in the center of the predicted shape while voxels towards the boundary of the shape are a little 'fuzzier'.

2.6. Interpret your model (15 points)

Method: L1 Distance Heatmap Overlay

I rendered the predicted 3D shape from the input camera viewpoint and compute pixel-wise L1 distances against the original image. The heatmap reveals where the model's reconstruction fails to match the 2D observation: red regions indicate geometric errors like incorrect depth, missing structures, or hallucinated geometry. This method directly measures multi-view consistency by connecting 3D predictions back to the 2D input domain.

Interpretation

3. Exploring Other Architectures / Datasets (Choose at least one)

3.3. Extended dataset for training (10 points - Optional)

Training
Extended Dataset Training Results

Results
Extended Dataset Visual Results