We are excited to announce that our team has been awarded as the best solution for the Visual Object Tracking VOT2021 Long-Term tracking challenge. This is the most important competition of such a kind in the computer vision community, and it sees the participation of the most cutting-edge solutions in the field.

The VOT challenge is organised every year by a committee composed of the most prominent researchers in computer vision. This year, it happened in conjunction with the International Conference on Computer Vision (ICCV) 2021, one of the two major conferences about computer vision topics.

Visual object tracking is one of the fundamental problems in computer vision. It requires the development of algorithms capable of keeping track of the presence and position of a target object in a video. This task is similar to what humans do when they follow objects with their sight. The long-term setting introduces challenging scenarios because it demand algorithms to track the targets for long periods of time and across major variations of their appearance such as scale, color, or shape changes, or across their disappearance from the field of view.

The team was formed by Matteo Dunnhofer, PhD student in industrial and information engineering, Kristian Simonato, master thesis intern, and Christian Micheloni, professor in machine learning and computer vision and director of the Machine Learning and Perception Lab.

The solution proposed exploits an advanced machine learning algorithm to supervise the execution of two subordinate visual tracking algorithms. The supervising algorithms learns online an abstract representation of the target object that is used to compare the candidate target appearances proposed by the sub-algorithms. By this comparison, it is possible to understand which of the two algorithms is behaving better, exploit it to localize the target, and eventually correct the failing one.

Modern visual tracking algorithms like the one designed by our team are designed to track arbitrary objects such as people, vehicles, animals, and even forks. Beside video-surveillance and robotics, this kind of algorithms are suited for all those industrial processes which require to keep track of the changes of entities through time, and the deep learning techniques used in the winning solution can be easily generalised to different types of data and algorithms.

Congratulations to Matteo, Kristian, and Christian!

In the following video are reported some qualitative examples of the performance of the winning algorithm (please watch it on YouTube).