Estimating the pose of a person from a single monocular frame is a challenging task due to many confounding factors such as perspective projection, the variability of lighting and clothing, self-occlusion, occlusion by objects, and the simultaneous presence of multiple interacting people. Nevertheless, the performance of human pose estimation algorithms has recently improved dramatically, thanks to the development of suitable deep architectures and the availability of well-annotated image datasets, such as MPII Human Pose and COCO.

There is broad consensus that performance is saturated on simpler single-person datasets (LSP, CO_LSP), and researchers' focus is shifting towards less constrained and more challenging datasets, where images may contain multiple instances of people, and a variable number of body parts (or keypoints) are visible.

However, evaluation is challenging: more complex datasets make it harder to benchmark algorithms due to the many sources of error that may affect performance, and existing metrics, such as Average Precision (AP) or mean Percentage of Correct Parts (mPCP), hide the underlying causes of error and are not sufficient for truly understanding the behaviour of algorithms.


We study the errors occurring in multi-instance pose estimation, and how they're affected by physical characteristics of the portrayed people. We build upon currently adopted evaluation metrics and provide the tools for a fine-grained description of performance, which allows to quantify the impact of different types of error at a single glance. The fine-grained Precision-Recall curves are obtained by fixing an OKS threshold and evaluating the performance of an algorithm after progressively correcting its detections.

Back to top


Our goal is to propose a principled method for analyzing pose algorithms' performance. Specifically our main contributions are:

Back to top



Back to top


If you find our paper or the released data or code useful to your work, please cite:

author = {Matteo Ruggero Ronchi and Pietro Perona},
title = {Benchmarking and Error Diagnosis in Multi-instance Pose Estimation},
booktitle = {IEEE International Conference on Computer Vision, {ICCV} 2017, Venice, Italy, October 22-29, 2017},
pages = {369--378},
year = {2017},
crossref = {DBLP:conf/iccv/2017},
url = {},
doi = {10.1109/ICCV.2017.48},
timestamp = {Thu, 11 Jan 2018 13:21:37 +0100},
biburl = {},
bibsource = {dblp computer science bibliography,}

Back to top


© 2017, Matteo Ruggero Ronchi, and Pietro Perona

Back to top

Flag Counter