The goal of Computer Vision is the automatic labeling of images containing multiple objects as well as noise and clutter. Recent work has focused on two main tasks. The first is the classification among object classes in segmented images containing only one object and the second is the detection of a particular object class in a large image. Both tasks have been primarily addressed using machine learning techniques involving variations on non-parametric regression. It is not clear however how these methods can extend to deal with the recognition of multiple object classes in images containing a number of objects in a wide range of configurations. We present an alternative approach which starts from simple statistical models for individual objects that can be composed to models for object configurations. Decisions are then entirely likelihood based and no decision boundaries need to be pre-learned. The model formulation also leads to a well defined coarse to fine strategy for efficient computation of the optimal scene annotation. The idea will be illustrated on the problem of reading license plates on rear- images of cars, reading handwritten zipcodes and face detection.
Group for Research in Decision Analysis