Important:
The papers below are provided as representative examples of the
work in each area. It is very important that you check the home page of
the author and of the associated lab, which contains very often a
number
of additional resources (videos, related papers, presentations, example
code, etc.). Many of the papers are rather hard to read (or outright
mysterious!) out of context
and it is a good idea to use these additional resources.
In addition, tutorials, additional references can be retrieved from
the usual sources:
o
CVOnline
o
Vision Home page
o
IEEE Explore (from 128…. machines
only)
o
Books listed in class
Important: For various reasons,
many of the pdf links listed below can be accessed only from a
CMU machine (i.e., with IP 128.2...) or remotely through VPN.
Aaron Hertzmann, Steven M. Seitz. Example-Based Photometric Stereo: Shape Reconstruction with General, Varying BRDFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1254-1264, August 2005. pdf
Aaron Hertzmann, Steven M. Seitz. Shape and Materials by Example: A Photometric Stereo Approach. Proc. IEEE CVPR 2003. Madison, WI. June 2003. Vol. 1. pp. 533-540. pdf
2 Shading Models and Recognition
Two classical papers on shading models and their use in recognition.
The key result is that the set of all images of an object under all
possible
illumination conditions is a low-dimensional subspace. This property is
used in recognition applications.
What is the Set of Images of an Object Under All Possible Lighting
Conditions?
P. Belhumeur, D. Kriegman.
International Journal of Computer Vision, 28(3), 1998. pdf
Illumination Cones for Recognition Under Variable Lighting: Faces.
A. Georghiades, D. Kriegman, P. Belhumeur.
IEEE Conf. on Computer Vision and Pattern Recognition, 1998.
pdf
3 Color Constancy
The first paper is a classic that combines several classical
approaches to color constancy into a single, straightforward framework.
This follows closely the derivation sketched out at the end of the
class notes on color. The second paper develops further one approach
based on a probabilistic model.
Color by correlation: a simple,
unifying framework for color constancy
Finlayson, G.D.; Hordley, S.D.; HubeL, P.M.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Volume 23, Issue 11, Nov. 2001 Page(s):1209 - 1221 pdf
Color constancy using
KL-divergence
Rosenberg, C.; Hebert, M.; Thrun, S.;
Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE
International Conference on
Volume 1, 7-14 July 2001 Page(s):239 - 246 vol.1 pdf
FILTERING,
FEATURE EXTRACTION, SCALE-SPACE
4 Texture Classification
Classification of textures by using affine-invariant detectors
(i.e., generalization of the scale-invariant detectors discussed in
class). The idea is that the use of these affine-invariant features
yields better robustness to changes in geometric and photometric
variations.
A sparse texture representation
using local affine regions
Lazebnik, S.; Schmid, C.; Ponce, J.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Volume 27, Issue 8, Aug. 2005 Page(s):1265 - 1278 pdf
Affine-invariant local
descriptors and neighborhood statistics for texture recognition
Lazebnik, S.; Schmid, C.; Ponce, J.;
Computer Vision, 2003. Proceedings. Ninth IEEE International Conference
on
2003 Page(s):649 - 655 vol.1 pdf
Varma, M. and Zisserman, A.
A statistical approach to texture classification from single images
International Journal of Computer Vision: Special Issue on Texture
Analysis and Synthesis, to appear in 2005. pdf
Varma, M. and Zisserman, A.
Classifying Images of Materials: Achieving Viewpoint and Illumination
Independence
Proceedings of the 7th European Conference on Computer Vision,
Copenhagen, Denmark (2002). pdf
6
Scale-Invariant Representations
Another very popular way of
extracting scale-invariant regions and features. This one is based on
first-order derivatives (unlike the Laplacian-based technique described
in class). The paper includes applications to tracking and recognition.
Scale, Saliency and Image Description.
Timor Kadir and Michael Brady.
International Journal of Computer Vision. 45 (2):83-105, November
2001. pdf
7
Scale-Invariant Representations
Two other (related)
approaches to extracting geometrically invariant regions with
applications to wide-baseline stereo correspondence. These approaches
are based primarily on the local distribution of intensity in the image
(instead of on first or second derivatives).

CAMERA GEOMETRY AND CAMERA CALIBRATION
8 Geometry from a single camera
A nice application of the fundamental concepts in camera geometry. The
paper shows how to recover quantitative geometric information from a
single (e.g., for forensic applications). The first paper is a complete
journal version. The second one is an earlier conference version of
related ideas. The third paper estimates ground layer and vehicle ego-motion by using planar motion constraint.
Its a very nice application of camera geoemtry transformation.
Criminisi, A. , Reid, I. and Zisserman,
A.
Single View Metrology
International Journal of Computer Vision (2000) PDF
Criminisi, A , Reid, I. and Zisserman, A
A Plane Measuring Device
Image and Vision Computing (1999) PDF
Qifa Ke; Kanade, T.
Transforming camera geometry to a virtual downward-looking camera: robust ego-motion estimation and ground-layer detection
Computer Vision and Pattern Recognition (2003) PDF
9 Camera Calibration
You have seen the basic single camera calibration in class. This paper uses a sequences of
planar views to enforce the multiview constraints which exist between colleniations between images.
Malis, E.; Cipolla, R.
Camera self-calibration from unknown planar structures enforcing the multiview constraints between collineations
Pattern Analysis and Machine Intelligence. Volume 24, Issue 9, Sept. 2002 Page(s):1268 - 1272
PDF
RECONSTRUCTION FROM MULTIPLE IMAGES
Tresadern, P.; Reid, I
Articulated structure from motion by factorization
CVPR (2005) PDF
Yan, J.; Pollefeys, M
A factorization-based approach to articulated motion recovery
CVPR (2005) PDF
14 Stereo
The first paper describes
a recent approach to stereo which is based on graph cut and energy
minimization algorithms.
This class of algorithms is currently among the best performing
algorithms for stereo. The second paper is a broader survey of relevant
energy minimization techniques. Note: Some familiarity with graph
algorithms and basics of MRFs is helpful for this subject. The third paper describes a cooperative stereo algorithm
where uniqueness and continuity of depth values is preserved. Occlusions are also explicitly modeled in the paper.
Fast approximate energy
minimization via graph cuts
Boykov, Y.; Veksler, O.; Zabih, R.;
IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),
Volume 23, Issue 11, Nov. 2001 Page(s):1222 - 1239 pdf
Computing visual correspondence with occlusions using graph cuts.
Kolmogorov, V.; Zabih, R.
Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE
International
Conference on Computer Vision, Volume: 2 , 2001. pdf
A cooperative algorithm for stereo matching and occlusion detection
Zitnick, C.L.; Kanade, T
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Volume 22, Issue 7, July 2000 Page(s):675 - 684 pdf
15 3-D Reconstruction from Multiple Cameras
Another nice approach for recovering 3D from N fixed cameras. The first
paper describes a simple and general algorithm for reconstruction that,
unlike conventional stereo, does not require any search for
correspondences.
The second paper describes a more general theory inspired by this class
of approaches.
Photorealistic Scene Reconstruction by Voxel Coloring
S. M. Seitz and C. R. Dyer, International Journal of Computer Vision,
35(2), 1999, pp. 151-173. pdf
A Theory of Shape by Space Carving
K. N. Kutulakos and S. M. Seitz.
International Journal of Computer Vision, Marr Prize Special Issue,
2000, 38(3). pdf
16 Structure from Motion
A classical paper on 3-D reconstruction from sequences of images.
Most interesting, it includes an application of the self-calibration
(metric reconstruction) approaches described in class, which enable
reconstruction from uncalibrated cameras. The first paper describes an
entire system
M. Pollefeys, L. Van Gool, M. Vergauwen, F. Verbiest, K. Cornelis,
J. Tops, R. Koch, Visual modeling
with a hand-held camera, International Journal of Computer
Vision 59(3), 207-232, 2004. [pdf]
J. Repko, M. Pollefeys, 3D Models from Extended Uncalibrated Video Sequences: Addressing Key-frame Selection and Projective Drift, Proc. 3DIM'05. pdf
17 Structure from Motion
Classical papers on how to build practical system for reconstructing
3-D models from sequences of images. The second paper focuses on the
generation
of virtual images from image sequences.
A. 3D Model Aquisition from Extended Image Sequence.
Beardsley, P.A., Torr, P.H.S. and Zisserman.
In Proc. 4th European Conference on
Computer Vision, LNCS 1065, Cambridge, pages 683-695, 1996. pdf
Automatic 3D Model Acquisition and Generation of New Images from Video
Sequences.
Fitzgibbon, A.W. and Zisserman, A.
In Proceedings of European Signal Processing Conference (EUSIPCO '98),
Rhodes, Greece, pages 1261-1269, 1998. pdf
19 Bundle Adjustment
The first paper is a detailed explanation of Bundle Adjustment, very hughly recommended reading if you
are interested in nonlinear minimization. The last two papers are practical implementations of bundle adjustment for 3-D
reconstruction
from sequences of images.
Bundle Adjustment - A Modern Synthesis
Triggs B, McLauchlan P, Hartley R, and Fitzgibbon A
pdf
Efficient bundle adjustment with virtual key frames: a hierarchical
approach to multi-frame structure from motion
Heung-Yeung Shum; Qifa Ke; Zhengyou Zhang
IEEE Proc. Conference on Computer Vision and Pattern Recognition,
1999.
pdf
Model-Based Bundle Adjustment with Application to Face
Modeling
Ying Shan, Zicheng Liu, Zhengyou Zhang
Proc. IEEE International Conference on Computer Vision, 2001.
pdf
MOTION ANALYSIS AND SEGMENTATION
21 Motion Segmentation
A classical approach to motion segmentation using dominant motion. The
second paper is a shorter (conference) version of the first reference.
The material includes elaboration on motion models discussed in class
and probabilistic models using maximum likelihood interpretation of
motion estimation. The third paper is a nice application, it uses occlusions to do
motion segmentation.
Compact representations of
videos through dominant and multiple motion estimation
Sawhney, H.S.; Ayer, S.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Volume 18, Issue 8, Aug. 1996 Page(s):814 - 830 pdf
Model-based 2D&3D dominant motion
estimation for mosaicing and video representation
Sawhney, H.S.; Ayer, S.; Gorkani, M.;
Computer Vision, 1995. Proceedings., Fifth International Conference on
20-23 June 1995 Page(s):583 - 590 pdf
Panoramic Image Mosaics.
H.Y. Shum, R. Szeliski.
Microsoft Research Tech Report 1997. pdf
Robust Video Mosaicing through Topology Inference and Local to Global
Alignment, Sawney, Hsu, Kumar, ECCV 1998. pdf
23 Motion Segmentation
The first paper is a complete development of the concept of "layers"
for segmenting scenes based on motion (e.g., "foreground"/"background"
spearation).
Familiarity with Bayesian classification is recommended.
An integrated Bayesian approach to layer extraction from image
sequences.
Torr, P.H.S. Szeliski, R. Anandan, P.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume
23, Number 3, 2001. pdf
Bayesian Estimation of Layers from Multiple Images.
Y. Wexler, A. Fitzgibbon and A. Zisserman.
Proceedings of the 7th European Conference on Computer Vision.
2002. pdf
24 Motion Layers
Another approach for motion layer segmentation (results were shown
in class; the paper describes the details). The second paper describes
an extension to deal with outliers. The third paper discusses initialization and rank detection issues.
Qifa Ke and Takeo Kanade,
"A Subspace Approach to Layer Extraction",
IEEE Conference on Computer Vision and Pattern Recognition (CVPR
2001), Volume I, pages 255-262, Hawaii, Dec. 2001. pdf
Qifa Ke and Takeo Kanade,
"A Robust Subspace Approach to Layer Extraction",
IEEE Workshop on Motion and Video Computing (Motion 2002), pages
37-43, Orlando, Florida, Dec. 2002. pdf
Qifa Ke and Takeo Kanade,
"Robust Subspace Clustering by Combined Use of kNND Metric and SVD Algorithm",
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2004), Washington D.C., June 2004 pdf
25 Motion Tracking and Motion Analysis
Two papers that describe a classical example of a system for motion
analysis from image sequences. The goal of the system is to understand
human activities in video. Includes motion segmentation through
background
substraction, tracking, and recognition.
W4: real-time surveillance of people and their activities
Haritaoglu, I. Harwood, D. Davis, L.S.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume
22, Number 8, 2000. pdf
Robust real-time periodic motion detection, analysis, and
applications
Cutler, R.; Davis, L.S.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume
22, Number 8, 2000. pdf
26 Motion Segmentation and Background Substraction
More elaborate (and more recent) ways to do background substraction.
Ahmed Elgammal, David Harwood, Larry Davis Non-parametric Model for Background Subtraction. pdf
Anurag Mittal Nikos Paragios Motion-Based Background Subtraction using Adaptive Kernel Density Estimation CVPR 2004. pdf
27 Event detection/Activity recognition
Two ways of detectiong events (sitting up, waving, etc.) from videos.
Y. Ke, R. Sukthankar, and M. Hebert. Event Detection in Crowded Videos. IEEE International Conference on Computer Vision, October, 2007. pdf
I. Laptev, P. Pérez. Retrieving actions in movies. In Proc. Int. Conf. Comp. Vis.(ICCV'07), Rio de Janeiro, Brazil, October 2007. pdf
TRACKING
28 Template/Feature Tracking
The first paper is a complete analysis of the motion recovery
approaches based on the "Lucas-Kanade" model, i.e., parameterizing the
motion (u,v) by some low-dimensional model and solving by least-squares
over a window. This was shown in class for constant, affine, planar,
motions. The paper analyzes further the properties of this class of
approaches. The second paper focuses on one detail (asked about in
class): When to update the template when tracking for a long time.
Lucas-Kanade 20 Years On: A Unifying Framework
S. Baker and I. Matthews
International Journal of Computer Vision, Vol. 56, No. 3, March,
2004, pp. 221 - 255. pdf
The Template Update Problem
I. Matthews, T. Ishikawa, and S. Baker
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 26, No. 6, June, 2004, pp. 810 - 815. pdf
29 Template Tracking
Extension of the classical template tracking approach to varying
illumination
and non-planar shapes. Second paper extends tracking to simultaneous
use of multiple trackers.
Efficient region tracking with parametric models of geometry and
illumination
Hager, G.D.; Belhumeur, P.N.
Pattern Analysis and Machine Intelligence, IEEE Transactions on ,
Volume:
20 Issue: 10 , Oct. 1998. pdf
Probabilistic data association methods for tracking complex visual
objects
Rasmussen, C.; Hager, G.D.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, Volume
23, Number 6, 2001. pdf
30 Mean-Shift Tracking
These papers use a pupolar approach to tracking, the "mean-shift"
(also
used for segmentation). The advantage of this technique is that it does
not require the motion target to be restricted to a class of motions
(e.g.,
affine) and it can deal with deformable targets. Some familiarity with
kernel density
estimation would help a little.
Kernel-based object tracking
Comaniciu, D. Ramesh, V. Meer, P.
IEEE Trans. Pattern Anal. Machine Intell , Vol. 25, No. 5, 2003.
pdf
Real-time tracking of non-rigid objects using mean shift.
D. Comaniciu, V. Ramesh, P. Meer.
Proc. IEEE Computer Vision and Pattern Recognition Conference.
2000.
pdf
SEGMENTATION
31 Searching Through the Space of Segmentations
In these paper, segmentation is presented as a search through the space
of
segmentations. In the first approach, a two stage segmentation approach
is utilized where images are first oversegmented into
superpixels, and then a linear classifier is trained to group together
superpixels. In the DDMCMC paper, Markov Chain Monte Carlo is used to
search through
the space of segmentations. This paper is rather difficult to
understand and can be considered a projects in its own right.
Learning a Classification Model for Segmentation.
Xiaofeng Ren and Jitendra Malik,
in ICCV '03, volume 1, pages 10-17, Nice 2003.
pdf
Image Segmentation by Data-Driven Markov Chain Monte Carlo,
Z.W. Tu and S.C. Zhu,
IEEE Trans on Pattern Analysis and Machine Intelligence, vol.24, no.5, pp. 657-673, May, 2002 pdf
32 Image Segmentation and Image Retrieval
Image segmentation using EM techniques and its application to
content-based
image retrieval. The second paper is an earlier (and easier to read)
version.
Basic understanding of expectation-maximization algorithms is useful..
Blobworld: image segmentation using expectation-maximization and its
application to image querying
Carson, C. Belongie, S. Greenspan,
H.
Malik, J.
IEEE Trans. Pattern Anal. Machine Intell , Vol. 24, No. 8, 2002.
pdf
Color- and texture-based image segmentation using EM and its
application
to content-based image retrieval
Belongie, S.; Carson, C.; Greenspan, H.; Malik, J.
Sixth International Conference on Computer Vision, 1998.
pdf
33 Segmentation
for Recognition
Another view of the segmentation problem, with
application to extracting human shapes from images.
RECOGNITION
37 Photometric Invariants
Matching images using local features that are invariant by rotation,
translation, and scale. An important approach lately, based on
extensions
of the Harris detector.
Indexing based on scale invariant interest points. The second paper is here for historical context
since it is quite old.
Mikolajczyk, K.; Schmid, C.
Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE
International
Conference on, Volume: 1 , 2001. pdf
Local grayvalue invariants for image retrieval
Schmid, C.; Mohr, R.
Pattern Analysis and Machine Intelligence, IEEE Transactions on ,
Volume:
19 Issue: 5 , May 1997. pdf
38 Shape Matching
Another way to match shapes using “invariant” shape descriptors based
on local distributions of edges. Second paper discusses connections
between
recognition, grouping and segmentation.
Shape matching and object recognition using shape contexts
Belongie, S. Malik, J. Puzicha, J.
Pattern Analysis and Machine Intelligence, IEEE Transactions
on , Volume 24, Number 4, 2002. pdf
Visual grouping and object recognition
Malik, J.
Image Analysis and Processing, 2001. Proceedings. 11th International
Conference on Computer Vision, 2001. pdf
39 Shape Matching
Two other different, but related, papers an shape matching using quadratic programmic techniques
A
Berg, T Berg, J Malik, Shape Matching and
Object Recognition using Low Distortion Correspondences, CVPR
2005. pdf
Alternative approach: M. Leordeanu and M. Hebert, A Spectral Technique for Correspondence Problems using Pairwise Constraints, ICCV 2005.
Application to recognition and weakly-supervised learning: M. Leordeanu, M. Hebert, and R. Sukthankar. Beyond Local Appearance: Category Recognition
from Pairwise Interactions of Simple Features. Proc. CVPR, June,
2007. pdf
40 Matching Local Invariants
An important approach to object recognition based on invariant
features.
The first paper includes the basics of constructing invariant feature
detectors
from extensions of the Harris detector, constructing representations of
the features using edges, and using the features for recognition using
a nearest-neighbor technique. This is related to the SIFT features used
in related work on localization.
Second paper is an earlier version of similar ideas.
Distinctive image features from scale-invariant keypoints
David G. Lowe
preprint, to appear International Journal of Computer Vision.
2003.
pdf
Object recognition from local scale-invariant features
David G. Lowe,
International Conference on Computer Vision, 1999. pdf
41 Pictorial Structures
Another classic approach based on matching image parts and representing
their relations, with applications to recognizing and tracking human
shapes in images. The second paper generalizes some aspects of the
initial formulation to make it applicable to braoder recognition
problem. Warning: Only for people already familiar with graphical
models, belief propagation and related topics (e.g., from the machine
learning class).
P. Felzenszwalb and D. Huttenlocher. Pictorial Structures for Object
Recognition.
International Journal of Computer Vision, Vol. 61, No. 1, January
2005. pdf
Spatial Priors for Part-Based Recognition using Statistical Models.
P. Felzenszwalb; D. Crandall; and D. Huttenlocher
IEEE Conference on Computer Vision and Pattern Recognition,
2005 pdf
42 Constellation models
An approach based on recognizing image parts and their relations,
extracted from training data. Uses scale-invariant features and other
concepts from earlier in class. The second paper is an older paper in
which some of the key ideas were introduced.
Warning: For those who have taken the machine learning class or
equivalent.
R. Fergus, P. Perona, and A. Zisserman. Object Class Recognition by
Unsupervised Scale-Invariant Learning
Proc. of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 2003. pdf
M. Weber, M. Welling and P. Perona. Unsupervised Learning of Models
for Recognition
Proc. 6th European Conference Computer Vision (ECCV) Dublin,
Ireland, 2000 June. pdf
A nice application of all the concepts of invariant region extraction,
SIFT descriptors, clustering, etc. to the problem of extracting object
descriptions from video in an unsupervised manner.
Sivic, J. and Zisserman, A.
Video Data Mining Using Configurations of Viewpoint Invariant Regions
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Washington, DC (2004) PDF
Sivic, J. and Zisserman, A.
Video Google: A Text Retrieval Approach to Object Matching in Videos
Proceedings of the International Conference on Computer Vision (2003) PDF
45 Face Detection
One of the best-performing face detector based on local statistics
of wavelet coefficients. Warning: Good understanding of image
processing (wavelets) and machine learning (bayes classifiers,
boosting) is required for this paper.
Object Detection Using the Statistics of Parts
H. Schneiderman and T. Kanade
International Journal of Computer Vision, 2002. pdf
These papers are about object discovery (unsupervised learning) in images. Utilizing the idea of latent topic modeling from the statistical text processing literature, these papers model object categories as latent topics. Images are represented as bags of topics, and topics are represented as bags of words. These techniques use pLSA and/or LDA to automatically discover objects in images, and require a good amount of machine learning knowledge. The second paper uses the concept of multiple segmentations to discover segments which correspond to objects.
Discovering Objects and thier Location in Images
Josef Sivic, Bryan Russell, Alexei A. Efros, Andrew Zisserman, Bill Freeman.
In ICCV 2005 pdf
Using Multiple Segmentations to Discover Objects and their Extent in Image Collections
Bryan Russell, Alexei A. Efros, Josef Sivic, Bill Freeman, Andrew Zisserman.
In CVPR 2006 pdf