Crowdclustering: Using crowdsourcing to discover categories in collections of images (and other human-interpretable patterns.)

Ph.D. Thesis: Can we build automatic categorization systems that learn and add categories over time with minimal human supervision?


Clustering via unsupervised learning of probabilistic discriminative classifiers (kernelized logistic regression).

Near optimal selection of informative examples from data streams. These data examples may be used in nonparametric clustering or regression problems.


Incremental/online algorithm for mixture model clustering with the Dirichlet process mixture model. Automatically adjusts the number of clusters as evidence arrives. The method is based on variational approximate inference.

Extends the techniques of the above paper to the topic model, an hierarchical extension of the mixture model suited to modeling documents and images.