Visipedia, short for "Visual Encylclopedia," is an augmented version of Wikipedia, where pictures are first-class citizens alongside text. Goals of Visipedia include creation of hyerlinked, interactive images embedded in Wikipedia articles, scalable representations of visual knowledge, large scale machine vision datasets, and visual search capabilities. Toward achieving these goals, Visipedia advocates interaction and collaboration between machine vision and human users and experts. Visipedia is a joint project between Caltech Vision Group and Serge Belongie's Vision Group at UCSD.
Visipedia Project Page

- Welinder P., Branson S., Belongie S., Perona P. The Multidimensional Wisdom of Crowds. Neural Information Processing Systems (NIPS). 2010.
Branson S., Wah C., Babenko B., Welinder P., Perona P., Belongie S. "Visual Recognition with Humans in the Loop", European Conference on Computer Vision (ECCV), Heraklion, Crete, Greece Sept. 2010 PDF
- Welinder P., Perona P. Online crowdsourcing: rating annotators and obtaining cost-effective labels. Workshop on Advancing Computer Vision with Humans in the Loop at CVPR. 2010. PDF
- Perona, P. Visions of Visipedia. Proceedings of the IEEE August 2010. Vol. 98, Issue: 8, Pages 1526-1534 ISSN: 0018-9219 DOI PDF

segmentationM. Andreetto, L. Zelnik-Manor, and P. Perona
Which one comes first: segmentation or recognition? We propose a unified framework for carrying out the two simultaneously and without supervision. The framework combines a flexible probabilistic model, for representing the shape and appearance of each segment, with the popular ``bag of visual words'' model for recognition. If applied to a collection of images, our framework can simultaneously discover the segments of each image, and the correspondence between such segments, without supervision. Such recurring segments may be thought of as the `parts' of corresponding objects that appear multiple times in the image collection. Thus, the model may be used for learning new categories, detecting/classifying objects, and segmenting images, without using expensive human annotation. Project Web-Page

M. Andreetto, L. Zelnik-Manor, and P. Perona, "Non-parametric Probabilistic Image Segmentation", ICCV07.
M. Andreetto, L. Zelnik-Manor, and P. Perona, "Unsupervised Learning of Categorical Segments in Image Collections", POCV Workshop in CVPR08.

A. Angelova, L. Matthies, D. Helmick, P. Perona
Slip measures the lack of progress of a wheeled ground robot while driving. Large amounts of slippage which can occur on certain surfaces, such as sandy slopes, will negatively affect rover mobility. Therefore, obtaining information about slip before entering a particular terrain can be very useful for better planning and avoiding terrains with large slip. We consider prediction of slip from a distance using visual information as input. The proposed method is based on learning from experience and consists of terrain type recognition and nonlinear regression modeling. After learning, slip prediction is done remotely using only the visual information as input.
References: A. Angelova, L. Matthies, D. Helmick, P. Perona. 'Slip Prediction Using Visual Information'
Robotics: Science and Systems (RSS), 2006
A. Angelova, L. Matthies, D. Helmick, G. Sibley, P. Perona, 'Learning to Predict Slip for Ground Robots', International Conference on Robotics and Automation, 2006


L. Zelnik-Manor and P. Perona
Grouping is the task of separating a dataset into meaningful subsets. Out of the many approaches suggested for the task we focus on graph theoretic approaches as these are known to perform well on a large variety of data structures. The algorithms we suggest find automatically the number of groups, the appropriate scale of analysis, can handle multi-scale data and irregular background clutter.
Project Web-Page.
References: L. Zelnik-Manor and P. Perona, "
Self-Tuning Spectral Clustering", NIPS 2004.

S. Agarwal, J. Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, and S. Belongie
Often one needs a clustering tool which can operate on affinity relations which are beyond dyadic (pairwise) relations. For example, consider grouping data points into clusters, where each cluster can be well approximated by a line. As every pair of points trivially defines a line there is no useful measure of similarity between pairs of points for this problem. However, it is possible to define similarity measures over triplets of points that indicate how collinear they are. We offer grouping algorithms which operate in this high-order relations domain.
References: Sameer Agarwal, Jongwoo Lim, Lihi Zelnik-Manor, Pietro Perona, David Kriegman and Serge Belongie, "Beyond Pairwise Clustering", CVPR'05.


L. Zelnik-Manor, G. Peters, and P. Perona
Pictures taken by a rotating camera cover the viewing sphere surrounding the center of rotation. Having a set of images registered and blended on the sphere what is left to be done, in order to obtain a flat panorama, is projecting the spherical image onto a picture plane. This step is unfortunately not obvious -- the surface of the sphere may not be flattened onto a page without some form of distortion. We show that multiple projections may coexist successfully in the same mosaic: these projections are chosen locally and depend on what is present in the pictures. We show that such multi-view projections can produce more compelling results than the global projections commonly used.
References: L. Zelnik-Manor, G. Peters and P .Perona, "Squaring the Circle in Panoramas", ICCV 2005

C. Fanti,  L. Zelnik-Manor,  P. Perona
If we could endow computers with the ability to observe and understand the motion of the human body we could build new machines that autonomously interact with humans in the surrounding environment. Our research focuses on probabilistic models that, combining the statistics of motion and appearance of body parts, allow us to infer a high level description of the body's evolution in time. The formalism of graphical models, together with belief propagation and sampling techniques, provides the tools for a principled way of modeling uncertainty and efficient computation.

S. Savarese, M. Andreetto, H. Rushmeier, F. Bernardini, P. Perona
Cast shadows are an informative cue to the shape of objects. They are particularly useful for discovering object’s concavities. We propose a new method for recovering shape from cast shadows which we call shadow carving. Given a conservative estimate of the volume occupied by an object this method computes a new estimate which is consistent with the observed pattern of shadows. This new estimate is provably conservative, i.e. the shadow carving never removes portion of the real object. We propose a reconstruction system to recover shape from silhouettes and shadow carving. The silhouettes are used to reconstruct the initial conservative estimate of the object’s shape and shadow carving is used to carve out the concavities. We have simulated our reconstruction system with a commercial rendering package to explore the design parameters and assess the accuracy of the reconstruction. We have also implemented our reconstruction scheme in a table-top system and present the results of scanning of several objects
Project Web-Page.
References: S. Savarese, M. Andreetto, H. Rushmeier, F. Bernardini and P. Perona, "3D Reconstruction by Shadow Carving: Theory and Practical Evaluation", accepted for publication in the International Journal of Computer Vision, 2006 (48 pgs pdf)

A. Angelova, Y. Abu-Mostafa, P. Perona.
Could a training example be detrimental to learning? Contrary to the common belief that more training data yield better generalization, we show that the quality of the examples also matters and that the learning algorithm might be better off when some training examples are discarded. The question is which examples need to be eliminated, so as to improve generalization performance. We propose a general approach, called 'data pruning', to automatically identify and eliminate examples that are troublesome for learning with a given model. We apply it to a challenging face dataset, achieving significant improvements in performance, especially for very noisy data.
References: A. Angelova, Abu-Mostafa, Perona, ‘Pruning Training Sets for Learning of Object Categories’, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, [CVPR] 2005.


Pierre Moreels and P.Perona
We explore the performance of a number of popular feature detectors and descriptors in matching 3D object features across viewpoints and lighting conditions. This evaluation is based on the use of epipolar geometry between triplets of views of the same object. The combinations feature detector/feature descriptor are tested on a large database of objects viewed from 144 viewpoints under three different lighting conditions.
References: 1. Pierre Moreels and P.Perona, 'Evaluation of Features Detectors and Descriptors based on 3D objects', submitted to International Journal of Computer Vision, 2006
2. Pierre Moreels and P.Perona, 'Evaluation of Features Detectors and Descriptors based on 3D objects', International Conference on Computer Vision, 1, pp.800-807, 2005

P.Moreels, M.Maire, P.Perona, D.Geman
We present a probabilistic model for object recognition. Objects and scenes are represented by features, our goal is to establish features correspondences between a database of models and new test scenes. Given the huge size of the search space, our model uses the simplifying assumption of a common-frame shared by features extracted from a same object. The papers linked below explored various search directions in order to build hypotheses that explain test scenes: a hypothesis generation based on the A* principle (ECCV), a search based on entropy minimization (NIPS), and a coarse-to-fine search (Tech report).

References (+links): 1. P.Moreels and P.Perona, Probabilistic Coarse-To-Fine Object Recognition, tech report, 2005, here's the link:
2. P.Moreels and P. Perona, Common-Frame Model for Object Recognition. Advances in Neural Information Processing Systems (NIPS) 2004.
3. P.Moreels, M.Maire, P. Perona. Recognition by Probabilistic Hypothesis Construction. European Conference on Computer Vision (ECCV) 2004.

S. Savarese, M. Chen and P. Perona, "Local Shape from Mirror Reflections", International Journal of Computer Vision 64(1), 31–67, 2005. Project Web-Page
S. Savarese, L. Fei-Fei and P. Perona, "What do Reflections Tell us About the Shape of a Mirror?", in Applied Perception in Graphics and Visualization [sponsored by ACM SIGGRAPH], Los Angeles, August 7-8, 2004. Project Web-Page.






computational vision caltech