Pictures of objects belonging to
101 categories. About 40 to 800 images per category. Most
categories have about 50 images. Collected in September 2003 by Fei-Fei Li,
Marco Andreetto, and Marc 'Aurelio Ranzato. The size of each image is
roughly 300 x 200 pixels.
We have carefully clicked outlines of each object in these pictures, these are included under the 'Annotations.tar'. There is also a matlab script to view the annotaitons, 'show_annotations.m'.
If you are using the Caltech 101 dataset for testing your recognition algorithm
you should try and make your results comparable to the results of others.
We suggest training and testing on fixed number of pictures and repeating
the experiment with different random selections of pictures in order to obtain
error bars. Popular number of training images: 1, 3, 5, 10, 15, 20, 30. Popular
numbers of testing images: 20, 30. See also the discussion below.
When you report your results please keep track of which images you used and which were misclassified. We will soon publish a more detailed experimental protocol that allows you to report those details. See the Discussion section for more details.
Collection of pictures: 101_ObjectCategories.tar.gz (131Mbytes)
Papers reporting experiments on
Caltech 101 images:
1. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. L. Fei-Fei, R. Fergus, and P. Perona. CVPR 2004, Workshop on Generative-Model Based Vision. 2004
2. Shape Matching and Object Recognition using Low Distortion Correspondence. Alexander C. Berg, Tamara L. Berg, Jitendra Malik. CVPR 2005
3. The Pyramid Match Kernel:Discriminative Classification with Sets of Image Features. K. Grauman and T. Darrell. International Conference on Computer Vision (ICCV), 2005.
5. Object Recognition with Features Inspired by Visual Cortex. T. Serre, L. Wolf and T. Poggio. Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), IEEE Computer Society Press, San Diego, June 2005.
6. SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition. Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik. CVPR, 2006.
7. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. CVPR, 2006 (accepted).
Empirical study of multi-scale filter banks for object categorization,
M.J. Marín-Jiménez, and N. Pérez de la Blanca.
December 2005. Tech Report.
Multiclass Object Recognition with Sparse, Localized Features,
Jim Mutch and David G. Lowe. , pg. 11-18,
CVPR 2006, IEEE Computer Society Press, New York, June 2006.
10. Using Dependant Regions or Object Categorization in a Generative Framework, G. Wang, Y. Zhang, and L. Fei-Fei. IEEE Comp. Vis. Patt. Recog. 2006
How to Reference this Dataset
We would appreciate it if you cite
our works when using the dataset:
1. Images only:
L. Fei-Fei, R. Fergus and P. Perona. Learning generative visual models
from few training examples: an incremental Bayesian approach tested on
101 object categories. IEEE. CVPR 2004, Workshop on Generative-Model
Based Vision. 2004
2. Images and annotations:
L. Fei-Fei, R. Fergus and P. Perona. One-Shot learning of object
categories. IEEE Trans. Pattern Recognition and Machine Intelligence. In
|Most images have
little or no clutter. The objects tend to be centered in each image. Most
objects are presented in a stereotypical pose.
Antonio Torralba averaged the images of each category producing this composite image. Click on the image to obtain an enlarged version: how many categories can you recognize from their average?
If you wish to demonstrate that your algorithm is translation-invariant and robust to clutter you will probably need to design carefully the procedure for training and testing. One possibility is creating panels composed of 2x2 or 3x3 pictures. Only one of the pictures in the panel belongs to a given class. The other pictures are taken from random classes, or perhaps from an independent collection of pictures.
17 March 2005
On reporting error rates
|It has been pointed
out to us that the categories that have more pictures are somewhat easier
(e.g. Airplanes (800+), Motorcycles(800+), Faces(400+)), while other categories
have under 40 images and are more difficult. So: people reporting aggregate
testing results should be careful to normalize performance across categories:
a) either you test on all the available images, but then average error rates across categories
b) or you test on a fixed number (e.g. 20) images per category and then report the overall error rate
If you report the overall error rate on all tested images, with different numbers of images per category, your results will tend to be too optimistic.
6 October 2005
Reported performance on the Caltech101 by various authors. There are several interesting things to note about this plot: (1) performance increases when all testing examples are used (the red curve is higher than the blue curve) and the performance is not normalized over all categories. (2) performance increases as more training examples are used. For these reasons authors should be careful when reporting results, in particular specifiying the exact training / testing paradigm used, and only comparing comprable setups.
A more detailed explanantion of the results can be found in the paper: Holub, AD. Welling, M. Perona, P. Exploiting Unlabelled Data for Hybrid Object Classification. NIPS 2005 Workshop in Inter-Class Transfer.
Latest results (March 2006) on the Caltech 101 from a variety of groups. (published results only).
If you would like to include your algorithm's performance please email us at firstname.lastname@example.org or email@example.com with a citation and your results. Thanks!
We are also interested in the time it takes to run your algorithm. Both during the training and during the classification stage
Plot courtesy of Hao Zhang.
Update by holub, April 2006.
Maintained by firstname.lastname@example.org and email@example.com