Caltech 101


[Description ][ Download ][ Discussion [Other Datasets]

Soccer ball


Pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. Collected in September 2003 by Fei-Fei Li, Marco Andreetto, and Marc 'Aurelio Ranzato.  The size of each image is roughly 300 x 200 pixels.
We have carefully clicked outlines of each object in these pictures, these are included under the 'Annotations.tar'. There is also a matlab script to view the annotaitons, 'show_annotations.m'.

How to use the dataset

If you are using the Caltech 101 dataset for testing your recognition algorithm you should try and make your results comparable to the results of others. We suggest training and testing on fixed number of pictures and repeating the experiment with different random selections of pictures in order to obtain error bars. Popular number of training images: 1, 3, 5, 10, 15, 20, 30. Popular numbers of testing images: 20, 30. See also the discussion below.
When you report your results please keep track of which images you used and which were misclassified. We will soon publish a more detailed experimental protocol that allows you to report those details. See the Discussion section for more details.


Collection of pictures: 101_ObjectCategories.tar.gz (131Mbytes)

Outlines of the objects in the pictures: [1] Annotations.tar [2] show_annotation.m


Papers reporting experiments on Caltech 101 images:

1. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. L. Fei-Fei, R. Fergus, and P. Perona. CVPR 2004, Workshop on Generative-Model Based Vision. 2004

2. Shape Matching and Object Recognition using Low Distortion Correspondence. Alexander C. Berg, Tamara L. Berg, Jitendra Malik. CVPR 2005

3. The Pyramid Match Kernel:Discriminative Classification with Sets of Image Features. K. Grauman and T. Darrell. International Conference on Computer Vision (ICCV), 2005.

4. Combining Generative Models and Fisher Kernels for Object Class Recognition Holub, AD. Welling, M. Perona, P. International Conference on Computer Vision (ICCV), 2005.

5. Object Recognition with Features Inspired by Visual Cortex. T. Serre, L. Wolf and T. Poggio. Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), IEEE Computer Society Press, San Diego, June 2005.

6. SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition. Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik. CVPR, 2006.

7. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. CVPR, 2006 (accepted).

8. Empirical study of multi-scale filter banks for object categorization, M.J. Marín-Jiménez, and N. Pérez de la Blanca. December 2005. Tech Report.

9. Multiclass Object Recognition with Sparse, Localized Features, Jim Mutch and David G. Lowe. , pg. 11-18, CVPR 2006, IEEE Computer Society Press, New York, June 2006.

10. Using Dependant Regions or Object Categorization in a Generative Framework, G. Wang, Y. Zhang, and L. Fei-Fei. IEEE Comp. Vis. Patt. Recog. 2006

If you would like to add a paper, email or

How to Reference this Dataset

We would appreciate it if you cite our works when using the dataset:
1. Images only:

L. Fei-Fei, R. Fergus and P. Perona. Learning generative visual models
from few training examples: an incremental Bayesian approach tested on
101 object categories
. IEEE. CVPR 2004, Workshop on Generative-Model
Based Vision. 2004

2. Images and annotations:
L. Fei-Fei, R. Fergus and P. Perona. One-Shot learning of object
IEEE Trans. Pattern Recognition and Machine Intelligence. In


Caltech101 averages small Most images have little or no clutter. The objects tend to be centered in each image. Most objects are presented in a stereotypical pose.

Antonio Torralba averaged the images of each category producing this composite image. Click on the image to obtain an enlarged version: how many categories can you recognize from their average?

If you wish to demonstrate that your algorithm is translation-invariant and robust to clutter you will probably need to design carefully the procedure for training and testing. One possibility is creating panels composed of 2x2 or 3x3 pictures. Only one of the pictures in the panel belongs to a given class. The other pictures are taken from random classes, or perhaps from an independent collection of pictures.

17 March 2005

On reporting error rates

It has been pointed out to us that the categories that have more pictures are somewhat easier (e.g. Airplanes (800+), Motorcycles(800+), Faces(400+)), while other categories have under 40 images and are more difficult. So: people reporting aggregate testing results should be careful to normalize performance across categories:
a)  either you test on all the available images, but then average error rates across categories
b) or you test on a fixed number (e.g. 20) images per category and then report the overall error rate
If you report the overall error rate on all tested images, with different numbers of images per category, your results will tend to be too optimistic.
6 October 2005

Reported performance on the Caltech101 by various authors. There are several interesting things to note about this plot: (1) performance increases when all testing examples are used (the red curve is higher than the blue curve) and the performance is not normalized over all categories. (2) performance increases as more training examples are used. For these reasons authors should be careful when reporting results, in particular specifiying the exact training / testing paradigm used, and only comparing comprable setups.

A more detailed explanantion of the results can be found in the paper: Holub, AD. Welling, M. Perona, P. Exploiting Unlabelled Data for Hybrid Object Classification. NIPS 2005 Workshop in Inter-Class Transfer.

Latest results (March 2006) on the Caltech 101 from a variety of groups. (published results only).

If you would like to include your algorithm's performance please email us at or with a citation and your results. Thanks!

We are also interested in the time it takes to run your algorithm. Both during the training and during the classification stage

Plot courtesy of Hao Zhang.


Update by holub, April 2006.

Last updated: April 5, 2006 by

Maintained by and