Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. Despite a few previous attempts to exploit trackers in FPV applications, a systematic analysis of the performance of state-of-the-art trackers in this domain is still missing. On the other hand, the visual tracking solutions available in the computer vision literature have significantly improved their performance in the last years for a large variety of target objects and tracking scenarios. To fill the gap, in this paper, we present TREK-100, the first benchmark dataset for visual object tracking in FPV. The dataset is composed of 100 video sequences densely annotated with 60K bounding boxes, 17 sequence attributes, 13 action verb attributes and 29 target object attributes. Along with the dataset, we present an extensive analysis of the performance of 30 among the best and most recent visual trackers. Our results show that object tracking in FPV is challenging, which suggests that more research efforts should be devoted to this problem.


Qualitative examples of some of the video sequences contained in TREK-100.
100 Videos
60054 Frames
13 Action Verbs
29 Target Objects


Performance of state-of-the-art trackers on the proposed TREK-100 benchmark. The curves in solid colors report the performance of the 30 benchmarked trackers on TREK-100, whereas the curves overlaid in semi-transparent colors outline the performance obtained by the same trackers on the standard OTB-100 dataset. In brackets, next to the trackers’ names, we report the success score, precision score and normalized precision score values achieved on TREK-100 (in black) and on OTB-100 (in gray).
Qualitative results of the top-performing state-of-the-art-trackers on TREK-100.



    title={Is First Person Vision Challenging for Object Tracking? The TREK-100 Benchmark Dataset}, 
    author={Matteo Dunnhofer and Antonino Furnari and Giovanni Maria Farinella and Christian Micheloni},


Research at the University of Udine has been supported by the ACHIEVE-ITN H2020 project. Research at the University of Catania has been supported by MIUR AIM - Attrazione e Mobilita Internazionale Linea 1 - AIM1893589 - CUP: E64118002540007.

Copyright © Machine Learning and Perception Lab - University of Udine - 2020

Website template was inspired by TAO Dataset.