Cartoon Interpolation Final Write Up
Vasu Agrawal (vasua), Yongyi Zhao (yongyiz)
We created three different modes for cartoon interpolation; spline, cage, and skeleton. The spline interpolation forces an image to loosely follow the profile of a control spline. The cage interpolation stretches and skews images within the arbitrary n-dimensional concave and convex polygonal control cage. Finally, the skeleton interpolation superimposes a skeleton on top of the cartoon, allowing joints to be transformed in a manner similar to Scotty3D. Each of the three transformations happens in near real time. For more information, see the implementation section below, or watch this video.
Figure 1 & 2: Results from the spline interpolation on a cartoon alligator.
Figure 3 & 4: Results from cage interpolation on a cartoon cow.
Figure 5 & 6: Results from skeleton transformation on a gingerbread man.
A significant focus of the development effort was put towards tweaking the interpolations so that they would be able to be rendered in real time. This is particularly challenging in Python, which is effectively unable to even loop over the entire image buffer in the interframe time while performing the most trivial operations. Since this project was as much about becoming better at high performance Python as it was about the graphics algorithms for us, we wanted the interpolation to happen as fast as possible. In order to facilitate this performance, NumPy and PIL were the tools of choice. Using NumPy matrices and vectors rather than raw Python arrays means that the entire computation happens in C rather than in Python code, which adds more than an order of magnitude of speedup. In the spline interpolation, for example, great care was taken with the implementation of the interpolation algorithm so that everything could be expressed as a matrix multiplication or a native elementwise function of the matrix. To coerce this behavior, significant reshaping of arrays was required at numerous steps, but the results are worth it. Similarly, PIL provides an Image.transform( ) function which very quickly applies an affine transform to an image, based on the values (a, b, c, d, e, f). The new positions (x’, y’) are calculated from the equations (x, y) = (ax’ + by’ + c, dx’ + ey’ + f). While PIL provided the fast per-pixel application of the transforms, all transforms were calculated by us.
At a high level, the spline is implemented by finding an inverse transform for every point (x’, y’), going from the transformed image to the original image. This inverse transform is based on the transformation of the nearest spline knots to the point, considering their transform between bind pose and transformed pose. Knots that are further away contribute exponentially less to the weighting. Each of the transforms is average with this weighting, and the final transformation is used to convert an (x’, y’) point in the transformed image to an (x, y) in the original, untransformed image. However, using simply the knots would result in ignoring the spline, so the transformation of multiple points along the spline is considered as well. As the number of points along the spline to consider increases, the spline interpolation becomes smoother, but there is a rapid point of diminishing returns, which is why the 16 interpolated points along the spline were chosen for the final submission.
The cage is implemented using a modified version of barycentric transformation. The user first creates the cage by selected points, completing the cage upon re-clicking the first point in the polygon. Each time that the user selects a vertex of the polygon and drags the point, the program performs a triangulation, such that every triangle shares the selected point. As the user drags the triangle, the new and old positions of the triangles are known to the program. Barycentric coordinates are calculated by taking the areas of the triangles. Thus all pixels within the polygon space are modified by the barycentric coordinates.
The skeleton was implemented using a skinning method, with thresholding, very similar to the approach used in assignment 4. The user creates a skeleton by clicking on points in the canvas, which generate bones connecting selected points. Each bone (joint) contains the transform into its parent space. This transform consists of a translation matrix and the rotation matrix (which the user adjusts by clicking on the left and right keys). The child can find its world space transformation by composing all parent transforms. The transform that is performed on the image when a joint is rotated, is calculated in the following implementation: the inverse of the local to world space is applied (left multiplied), then the rotation of the joint itself, and finally the original local to world space matrix is applied. We note that the inverse of this transform is passed to the Image.transform() function since the function maps the old pixel position to the new pixel position.
In the process of this project, we experimented with a number of ways of performing each of the different interpolations, as well as the tuning of various parameters and algorithms. Below we’ve documented some of our more interesting results.
The final implementation of the weighting function uses an exponential decay on the distance to knots. However, an earlier version used 1 minus inverse distance, which produces interesting stratifications along the spline. Simply by changing the weighting function to be the smooth exponential, results were significantly improved, and the transforms became much less stratified. Additionally, the magnitude of the changes became less dramatic, as shown by the relative sharpness of the visualizations below compared to the exponential version. Further experimenting revealed that the cutoff distance (i.e. the maximum distance from which a knot can affect a point) also plays a large factor in the transform, with larger cutoff distances causing much stronger transformations.
Figure 7 & 8: Minimal stratifications with a large cutoff distance, using the original inverse weighting technique.
Figure 9 & 10: Strong stratification with small cutoff distance, using the original inverse weighting technique.
Another implementation we experimented with was the triangulation method for the cage editor. Style 1 uses a zig-zag pattern. In style 2, the triangulation is recalculated each time a new vertex is selected, such that the triangles always share the selected vertex. Style 1 fares better in efficiency, since the triangulation is only performed once and the user will never be simultaneously modifying all triangles for larger polygon (calculating the per-pixel barycentric transform is quite expensive, and becomes more expensive when it must be done for each modified triangle). Style 1 also fares better for polygons that are initially concave (since style 2 simply connects all vertices to a single point, it is possible that you will be forming triangles which should not be present). However, the calculated results of style 2 are better because the transform is performed for all triangles. For example, when point B is dragged in style 1, modifications will only be made to triangle ABC. All other pixels remain unchanged. Whereas, modifying all triangles simultaneously creates much smoother results.
Figure 11: Style 1 triangulation, bind mode without cage transform.
Figure 12: Style 1 triangulation, with cage transform. Note that only small subsection of polygon is modified.
Figure 13: Style 2 triangulation, bind mode without cage transform, we attempted to make cages as similar as possible in bind mode.
Figure 14: Style 2 triangulation, with cage transform. Note that all of subsection in polygon has been warped by vertex.
In order to run the program, you will need to first install a few dependencies. As mentioned, the program is written from scratch in Python rather than C++, so the first requirement is the installation of Python3, which can be found here, or installed with apt install python3 on modern Ubuntu (16.04+) based systems. You’ll need to have a version of NumPy installed. We used 1.11.0. NumPy can be installed via pip, with pip3 install numpy. For image loading, the program uses PIL, which can be installed with pip3 install Image. We used version 1.1.7. Finally, the program uses PyGame as an event handler for mouse and keyboard events, as well as to draw images to the screen. We used PyGame version 1.9.2rc1. You can install PyGame for your system by getting a binary package from here, or simply apt install python3-pygame if applicable.
Once all of the dependencies are installed, the program can be run as follows:
$ ./wrapper.py -h
usage: wrapper.py [-h] [-f FILENAME] [-m MODE]
Demonstration of spline, cage, and skeleton warping on 2D images.
Final project for 15-462
Vasu Agrawal (vasua)
Yongyi Zhao (yongyiz)
See writeup for more information.
-h, --help show this help message and exit
-f FILENAME, --filename FILENAME
Name of the file to open
-m MODE, --mode MODE Mode to operate in.
Spline = 0
Cage = 1
Skeleton = 2
For example, use the following to start the skeleton interpolation mode on “gingerbread.png”, as shown in the skeleton interpolation results above.
$ ./wrapper.py -f gingerbread.png -m 2
When the spline interpolation program is started (with -m 0, per above), the user is allowed to make a control spline in bind mode. The current mode can be seen on the top left of the screen. After a bind spline has been made, the user can press the B key to switch into transform mode (and back). The points bind mode can be moved around in transform mode, but no new points will be added. Once the user is satisfied with the transformed spline, he can press and hold I to view the interpolated image. The image will stay on the screen as long as the button is being pressed. Once the I key is released, the transformed spline can be moved again, or the bind mode spline points can be moved. Since the transform is calculated based on the difference between the bind mode and the transform mode, both may be desireable.
There is also a visualization mode available by pressing and holding V. The visualization shows the magnitude of the transform that is to be applied at a given point in the bind pose; the brighter the shade of blue, the more the original image will be transformed at that point. See below for an example of the visualized transform and the actual transform.
Figure 15 & 16: Showing the visualization mode and its transformation using the final exponential weighting.
The user generates vertices for the cage by clicking in the canvas. Once a canvas has been completed (ie the user re-clicks the first point of the polygon), they can then begin to distort the mesh. Simply dragging the points of the vertex allows the user to distort (stretch and skew) the cartoon image.
When the user first enters the canvas, they are in bind mode. The program will remain in bind mode as long as the user does not click on the left and right keys. While in bind mode: the user defines skeleton joints also by clicking on points in the canvas. The user can also increase and decrease the thickness of the bones with their up and down arrow keys. The user can also select which joint to branch new joints off of (the selected joint is highlighted yellow). All of these actions must be performed in bind mode. Please note that once the user exits bind mode (and enters transform mode) they can no longer adjust the thickness of the joints, nor create new joints. Also, while in bind mode, the user cannot rotate. Once the user clicks the left and right arrow keys, the selected joint will be rotated (along with the image itself) and the mode will be set to transform.