Introduction

Consider a standard portrait of a person – either painted or photographed. Can one estimate the distance between the camera (or the eye of the painter) and the face of the sitter? Can one do so accurately even when the camera and the sitter are unknown?

We propose the first automated method for estimating the camera-subject distance from a single frontal picture of an unknown sitter. Camera calibration is not necessary, nor is the reconstruction of a 3D representation of the shape of the subject's head. Our method is divided into two steps: firstly we automatically estimate the location and shape of the subject's face in an image, characterized by 55 costum keypoints positioned on eyes, eyebrows, nose, mouth, head and harline contour. Secondly we train a regressor to estimate the absolute distance from the measurement of changes in the position of these landmarks due to the effect of perspective in images taken at different distances (sometimes informally called "Perspective Distortion").

We collected and annotated a dataset of frontal portraits of 53 individuals spanning a number of attributes such as sex, age, ethnicity and hairstyle, each photographed from seven distances - 2, 3, 4, 6, 8, 12 and 16 ft. The proposed method exploits the high correlation between "perspective distortion" and absolute distance and outperforms humans in the tasks of 1) purely estimating the absolute distance and 2) reordering portraits of faces taken at different distances. We observed the phenomenon that different physiognomies will bias systematically the estimate of distance, i.e. some people look closer than others. We explored the importance of individual landmarks in the execution of both task.

Back to top

Contributions

Back to top

Results

Back to top

CMDP Dataset

We collected a novel dataset, the Caltech Multi-Distance Portraits (CMDP). This collection is made of high quality frontal portraits of 53 individuals against a blue background imaged from seven distances spanning the typical range of distances between photographer and subject: 2, 3, 4, 6, 8, 12, 16 ft (60, 90, 120, 180, 240, 360, 480 cm). For distances exceeding 5m, perspective projection approaches a parallel projection (the depth of a face is about 10cm), therefore no samples beyond 480 cm were needed. Participants were selected among both genders, different ages and a variety of ethnicities, physiognomies, hair and facial hair styles, to make the dataset as heterogeneous and representative as possible.

Pictures were shot with a Canon Rebel Xti DSLR camera mounting a 28-300mm L-series Canon zoom lens. Participants standing in front of a blue background were instructed to remain still and maintain a neutral expression. The photographer used a monopod to support the camera-lens assembly. The monopod was adjusted so that the height of the center of the lens would correspond to the bridge of the nose, between the eyes. Markings on the ground indicated seven distances. After taking each picture, the photographer moved the foot of the monopod to the next marking, adjusted the zoom to fill the frame of the picture with the face, and took the next picture. This procedure resulted in seven pictures (one per distance) being taken within 15-20 seconds. Images were then cropped and resampled to a common format. The lens was calibrated at different zoom settings to verify the amount of barrel distortion, which was found to be very small at all settings, and thus left uncorrected. Lens calibration was then discarded and not used further in our experiments.

All images in the dataset were manually annotated by three human annotators with 55 facial landmarks distributed over and along the face and head contour. To check consistency of annotations, randomly selected images from different subjects were doubly annotated. Annotators resulted to be very consistent, showing an average disagreement between them smaller than 3% of the interocular distance, and not varying much across distances. The location of the custom keypoints is very different from the positions typically used in literature, more focused towards the center and bottom of the face, as for example Multi-pie format. We purposely wanted to have landmarks around the head contour and all around the face, to sample a larger area of the face.

Back to top

Download

The following buttons will download the CMDP dataset fully annotated and the Matlab code for visualizing the results of the landmark estimation algorithm (RCPR) on the CMDP dataset and reproducing the paper's results on both the classification and regression tasks. Full details are described in the README file.

Notes:

Back to top

Cite

Back to top

Contact

© 2014, Xavier P. Burgos-Artizzu, Matteo Ruggero Ronchi and Pietro Perona

Back to top

Flag Counter