Publications
Explore our research publications: papers, articles, and conference proceedings from AImageLab.
Video action detection by learning graph-based spatio-temporal interactions
Authors: Tomei, Matteo; Baraldi, Lorenzo; Calderara, Simone; Bronzin, Simone; Cucchiara, Rita
Published in: COMPUTER VISION AND IMAGE UNDERSTANDING
Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has … (Read full abstract)
Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has been addressed by processing fine-grained features extracted from a video classification backbone. Recently, thanks to the robustness of object and people detectors, a deeper focus has been added on relationship modelling. Following this line, we propose a graph-based framework to learn high-level interactions between people and objects, in both space and time. In our formulation, spatio-temporal relationships are learned through self-attention on a multi-layer graph structure which can connect entities from consecutive clips, thus considering long-range spatial and temporal dependencies. The proposed module is backbone independent by design and does not require end-to-end training. Extensive experiments are conducted on the AVA dataset, where our model demonstrates state-of-the-art results and consistent improvements over baselines built with different backbones. Code is publicly available at https://github.com/aimagelab/STAGE_action_detection.
Visual Tracking: An Experimental Survey
Authors: A. W. M., Smeulder; D. M., Chu; Cucchiara, Rita; Calderara, Simone; A., Dehghan; M., Shah
Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
There is a large variety of trackers, which have been proposed in the literature during the last two decades with … (Read full abstract)
There is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is difficult problem, therefore it remains a most active area of research in Computer Vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities and at least six more aspects. However, the performance of proposed trackers have been evaluated typically on less than ten videos, or on the special purpose datasets. In this paper, we aim to evaluate trackers systematically and experimentally on 315 video fragments covering above aspects. We selected a set of nineteen trackers to include a wide variety of algorithms often cited in literature, supplemented with trackers appearing in 2010 and 2011 for which the code was publicly available. We demonstrate that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing. We find that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score. The analysis under a large variety of circumstances provides objective insight into the strengths and weaknesses of trackers.