Learning for 3D Vision: HW5

1. Classification Model

Test accuracy: 96.75%
Example ID Ground Truth Class Predicted Class Rendered Example Interpretation (only for misclassified examples)
0 Chair Chair -
1 Chair Chair -
2 Chair Chair -
406 Chair Lamp I think the model confused this example because the chair seems to be folded. This means that its points does not have a high depth range, as would be expected for a chair.
617 Vase Vase -
618 Vase Vase -
619 Vase Lamp I think the model misclassified this example because the vase had flowers on it. Geometrically, flowers resemble lamps. They have a long thin tube and end with a cylindrical shape at the top.
620 Vase Vase -
719 Lamp Lamp -
720 Lamp Lamp -
721 Lamp Vase I think the model misclassified this example because the of the cylindrical shape at the top. This makes the shape resemble a vase. Moreover, because the model uses a maxpool, it might have ignored the empty space in the bottom section.
722 Lamp Lamp -

2. Segmentation Model

Test accuracy: 85.66%
Example ID Ground Truth Segmentation Prediction Segmentation Accuracy Interpretation (only for low accuracy examples)
0 92.91% -
1 96.23% -
5 90.81% -
6 94.73% -
26 49.61% I think the model had a low accuracy in this example because the ground truth segmentation distinguishes between parts that are geometrically very close. For example, the yellow side of the sofa is being separated from the sitting pillow in red. Meanwhile, model does not make this distinction because this parts have similar 3D positions. Their ground truth segmentation is more based on a semantic understanding of the object.
142 49.04% Similarly, the ground truth segmentation of this object is also based more on the role each part of the chair plays, rather than geometric position, which is the only thing the PointNet model has access to. Here, the back of the chair (in cyan), is being separated from the arms rests (in yellow), even though they are very close in 3D space. Meanwhile, the model mixes together part of the seat and the back of the chair, and part of the arm rests and the back of the seat, because these parts are close in 3D space.

3. Robustness Analysis

Experiment 1: Random rotations

In this experiment, we apply random rotations to the input point clouds at test time, and evaluate how this affects the performance of the classification and segmentation models. The rotations where generated by randomly sampling Euler angles for each axis from a uniform distribution between 0 and 2π.

Classfication

Baseline test accuracy: 96.75%
Test accuracy under random rotations: 33.47%
Example ID Ground Truth Class Predicted Class (without rotation) Rendered Example (without rotation) Predicted Class (with rotation) Rendered Example (with rotation) Interpretation (only for misclassified examples)
0 Chair Chair Lamp In this example, the model posibly mistook the rotated cube formed by the legs of the chair and the beams between the legs for a spherical shape, similar to that of lamps. In the original example, many of the points in the legs as beams share similar 3D coordinates, which might have make it easier for the model to focus on the vertices of this cube.
617 Vase Vase Vase I think this example was succesfully classified despite the rotation because the rotation was mild, and even if the object got confused with a spherical shape, there are still many vases that have spherical shapes. Therefore this mistake would not affect the final classification.
719 Lamp Lamp Vase In this example, the model posibly mistook the rotated lamp with a spherical shape, similar to that of some vases. In the original example, many of the points in the long stick that comes out of the bottom share similar 3D coordinates, which might have make it easier for the model to focus on a few of this points and understand them as a thin cylyndrical shape. Meanwhile, in the rotated version, it might have focused on points at the tip of this shape, and together with the points on the piece that goes on the wall, it might have interpreted them as a spherical shape.

Segmentation

Baseline test accuracy: 85.66%
Test accuracy under random rotations: 40.78%
Example ID Ground Truth Segmentation Predicted Segmentation (without rotation) Accuracy (without rotation) Predicted Segmentation (with rotation) Accuracy (with rotation) Interpretation (only for misclassified examples)
0 92.91% 20.30% In the rotated example, the points on the back of the chair have very different 3D coordinates. That is why the model missclassified them as belongin to differents segments of the chair. It seems like the model is segmenting base on the difference between the height and depth of the points.
1 96.23% 44.28% In the rotated example, the legs of the chair now have very different 3D coordinates. Just like before, this is causing the model to confuse them for different segments. In an opposite way, the arms rests are now more similar in 3D coordinates to other parts of the back, so it no longer assigns them different segment labels.
2 81.09% 38.40% This example also shows how the model is clustering together points that share at least one dimension. For example, in the original object, the legs of the chair had very different x and z values, but similar y values, so they are assigned the same label. Meanwhile, in the rotated example, the legs no longer share similar values in any dimension and get several different labels.

Experiment 2: Number of points

In this experiment, we subsample the number of input point clouds at test time. We use 100, 1,000 and 5,000 points, and evaluate how this affects the performance of the classification and segmentation models. The model was trained using 10,000 points, and this is referred to as the baseline in the tables below.

Classfication

Example ID Ground Truth Class Rendered Example (10,000 points) Predicted Class Baseline (10,000 points) Rendered Example (5,000 points) Predicted Class (5,000 points) Rendered Example (1,000 points) Predicted Class (1,000 points) Rendered Example (100 points) Predicted Class (100 points) Interpretation
0 Chair Chair Chair Chair Chair The model is very robust to the number of points. Even with only 100 points, the model is able to correctly classify the chair.
617 Vase Vase Vase Vase Vase The model is very robust to the number of points. Even with only 100 points, the model is able to correctly classify the vase.
746 Lamp Lamp Lamp Lamp Vase This is one of the few examples where reducing the number of points had an effect on the predicted class. When only using 100 points, the model confuses this for a vase. This possible happen because with so few points, the points start to resemble a spherical shape, similar to that of some vases.

Segmentation

Example ID Ground Truth Segmentation Predicted Segmentation Baseline (10,000 points) Predicted Segmentation (5,000 points) Predicted Segmentation (1,000 points) Predicted Segmentation (100 points) Interpretation
0 The model is very robust to the number of points. Even with only 100 points, the model still identifies the segments correctly.
5 The model is very robust to the number of points. Even with only 100 points, the model still identifies the segments correctly.
6 The model is very robust to the number of points. Even with only 100 points, the model still identifies the segments correctly.