Steve's Object Detection Toolbox  1
Steve's Object Detection Toolbox Documentation

Introduction

This toolbox contains a collection of routines for multiclass object detection, deformable part models, pose mixture models, localized attribute and classification models, online structured learning, probabilistic user models, and interactive annotation tools for labeling parts and object classes. This toolbox was written by Steve Branson and is implemented in C++. The documentation and usability will hopefully be improved soon.

Download

Source code download:

Compilation

The only prerequisite is OpenCV (see https://help.ubuntu.com/community/OpenCV for installation in Ubuntu).

Overview

Features, Object Recognition, and Object Detection

The following features are included. They can be used in conjunction with object recognition, sliding window detection, or deformable part models (e.g., localized versions of features are supported):

Deformable Part Models

Deformable part models, where each part has a sliding window appearance model. This is similar to the model used by Felzenszwalb and Ramanan, but with an emphasis on semantically defined parts. It includes the following features:

Multiclass Object Detection Using Shared Parts and Attributes

Sliding window part and attribute detectors, which can be shared among multiple classes (e.g., for subordinate classification)

attributes.png

See import_birds200.cpp and train_multiclass_detector.cpp for an example.

Online Structured SVM Learning

Fast structured SVM learning supporting very large datasets. Examples can be added in online fashion via a network interface. The network interface also allows examples to be classified or labeled interactively using the current model as the learner trains in online fashion. Implementations of the following learning algorithms are included:

Interactive Computer Vision

Ability to learn probability distributions modeling noise in the way people answer attribute questions or label the locations of parts. These can be combined with computer vision routines to create interactive interfaces

Interactive Part Labeling

An interactive GUI for labeling part models, which displays in realtime the maximum likelihood location of all parts (e.g., the result of pictorial structure inference) as the user drags one or more parts. See online_interactive_parts_server.cpp for an example.

Visual 20 Questions Game

An interactive interface for classifying objects that are difficult for both humans and computers (e.g., allows non-experts to classify bird species). The system uses a combination of computer vision (deformable part model localization and multiclass classification) and user responses (answers to yes/no or multiple choice questions or clicks on the locations of parts) to predict a probability distribution over object classes. It optimizes an expected information gain criterion to intelligently select which question to ask next, progressively narrowing in on the true class. See 20q_server.cpp for an example.

Citation

This toolbox includes an implementation of the three papers listed below. Please consider citing them if you use this toolbox:

Branson S., Beijbom O., Belongie S., "Efficient Large-Scale Structured Learning", IEEE Conference on Computer Vision (CVPR), Portland, June 2013. pdf
@inproceedings { branson_online_interactive11,
        title = {Efficient Large-Scale Structured Learning},
        booktitle = {IEEE Conference on Computer Vision (CVPR)},
        year = {2013},
        address = {Portland, Oregon},
        author = {Steve Branson and Oscar Beijbom and Serge Belongie}
}
Branson S., Perona P., Belongie S., "Strong Supervision From Weak Annotation: Interactive Training of Deformable Part Models", IEEE International Conference on Computer Vision (ICCV), Barcelona, 2011. pdf
@inproceedings { branson_online_interactive11,
        title = {Strong Supervision From Weak Annotation:  Interactive Training of Deformable Part Models},
        booktitle = {IEEE International Conference on Computer Vision (ICCV)},
        year = {2011},
        address = {Barcelona, Spain},
        author = {Steve Branson and Pietro Perona and Serge Belongie}
}


Wah C., Branson S., Perona P., Belongie S., "Multiclass Recognition and Part Localization with Humans in the Loop", IEEE International Conference on Computer Vision (ICCV), Barcelona, 2011. pdf

@inproceedings { wah_multiclass11,
        title = {Multiclass Recognition and Part Localization with Humans in the Loop},
        booktitle = {IEEE International Conference on Computer Vision (ICCV)},
        year = {2011},
        address = {Barcelona, Spain},
        author = {Catherine Wah and Steve Branson and Pietro Perona and Serge Belongie}
}

Getting Started

The best way to get started is to browse through the example code (click on the Examples tab). These examples can be used without modification for many tasks (training part detectors, training multiclass classifiers, evaluating testsets, interactively labeling parts, the Visual 20 questions game); however, you may want to edit these examples to explore more advanced usage scenarios. You can obtain an example training/testing set by downloading http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz and unzipping it in the examples/ directory