Publications - AImageLab

Learning to Divide and Conquer for Online Multi-Target Tracking

Authors: Solera, Francesco; Calderara, Simone; Cucchiara, Rita

Published in: PROCEEDINGS IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION

Online Multiple Target Tracking (MTT) is often addressed within the tracking-by-detection paradigm. Detections are previously extracted independently in each frame … (Read full abstract)

Online Multiple Target Tracking (MTT) is often addressed within the tracking-by-detection paradigm. Detections are previously extracted independently in each frame and then objects trajectories are built by maximizing specifically designed coherence functions. Nevertheless, ambiguities arise in presence of occlusions or detection errors. In this paper we claim that the ambiguities in tracking could be solved by a selective use of the features, by working with more reliable features if possible and exploiting a deeper representation of the target only if necessary. To this end, we propose an online divide and conquer tracker for static camera scenes, which partitions the assignment problem in local subproblems and solves them by selectively choosing and combining the best features. The complete framework is cast as a structural learning task that unifies these phases and learns tracker parameters from examples. Experiments on two different datasets highlights a significant improvement of tracking performances (MOTA +10%) over the state of the art.

2015 Relazione in Atti di Convegno

DOI IRIS

Learning to identify leaders in crowd

Authors: Solera, Francesco; Calderara, Simone; Cucchiara, Rita

Published in: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS

Leader identification is a crucial task in social analysis, crowd management and emergency planning. In this paper, we investigate a … (Read full abstract)

Leader identification is a crucial task in social analysis, crowd management and emergency planning. In this paper, we investigate a computational model for the individuation of leaders in crowded scenes. We deal with the lack of a formal definition of leadership by learning, in a supervised fashion, a metric space based exclusively on people spatiotemporal information. Based on Tarde's work on crowd psychology, individuals are modeled as nodes of a directed graph and leaders inherits their relevance thanks to other members references. We note this is analogous to the way websites are ranked by the PageRank algorithm. During experiments, we observed different feature weights depending on the specific type of crowd, highlighting the impossibility to provide a unique interpretation of leadership. To our knowledge, this is the first attempt to study leader identification as a metric learning problem

2015 Relazione in Atti di Convegno

DOI IRIS

Towards the evaluation of reproducible robustness in tracking-by-detection

Authors: Solera, Francesco; Calderara, Simone; Cucchiara, Rita

Conventional experiments on MTT are built upon the belief that fixing the detections to different trackers is sufficient to obtain … (Read full abstract)

Conventional experiments on MTT are built upon the belief that fixing the detections to different trackers is sufficient to obtain a fair comparison. In this work we argue how the true behavior of a tracker is exposed when evaluated by varying the input detections rather than by fixing them. We propose a systematic and reproducible protocol and a MATLAB toolbox for generating synthetic data starting from ground truth detections, a proper set of metrics to understand and compare trackers peculiarities and respective visualization solutions.

2015 Relazione in Atti di Convegno

DOI IRIS

Understanding social relationships in egocentric vision

Authors: Alletto, Stefano; Serra, Giuseppe; Calderara, Simone; Cucchiara, Rita

Published in: PATTERN RECOGNITION

The understanding of mutual people interaction is a key component for recognizing people social behavior, but it strongly relies on … (Read full abstract)

The understanding of mutual people interaction is a key component for recognizing people social behavior, but it strongly relies on a personal point of view resulting difficult to be a-priori modeled. We propose the adoption of the unique head mounted cameras first person perspective (ego-vision) to promptly detect people interaction in different social contexts. The proposal relies on a complete and reliable system that extracts people׳s head pose combining landmarks and shape descriptors in a temporal smoothed HMM framework. Finally, interactions are detected through supervised clustering on mutual head orientation and people distances exploiting a structural learning framework that specifically adjusts the clustering measure according to a peculiar scenario. Our solution provides the flexibility to capture the interactions disregarding the number of individuals involved and their level of acquaintance in context with a variable degree of social involvement. The proposed system shows competitive performances on both publicly available ego-vision datasets and ad hoc benchmarks built with real life situations.

2015 Articolo su rivista

DOI IRIS

A complete system for garment segmentation and color classification

Authors: Manfredi, Marco; Grana, Costantino; Calderara, Simone; Cucchiara, Rita

Published in: MACHINE VISION AND APPLICATIONS

In this paper, we propose a general approach for automatic segmentation, color-based retrieval and classification of garments in fashion store … (Read full abstract)

In this paper, we propose a general approach for automatic segmentation, color-based retrieval and classification of garments in fashion store databases, exploiting shape and color information. The garment segmentation is automatically initialized by learning geometric constraints and shape cues, then it is performed by modeling both skin and accessory colors with Gaussian Mixture Models. For color similarity retrieval and classification, to adapt the color description to the users’ perception and the company marketing directives, a color histogram with an optimized binning strategy, learned on the given color classes, is introduced and combined with HOG features for garment classification. Experiments validating the proposed strategy, and a free-to-use dataset publicly available for scientific purposes, are finally detailed.

2014 Articolo su rivista

DOI IRIS

Detection of static groups and crowds gathered in open spaces by texture classification

Authors: Manfredi, Marco; Vezzani, Roberto; Calderara, Simone; Cucchiara, Rita

Published in: PATTERN RECOGNITION LETTERS

A surveillance system specifically developed to manage crowded scenes is described in this paper. In particular we focused on static … (Read full abstract)

A surveillance system specifically developed to manage crowded scenes is described in this paper. In particular we focused on static crowds, composed by groups of people gathered and stayed in the same place for a while. The detection and spatial localization of static crowd situations is performed by means of a One Class Support Vector Machine, working on texture features extracted at patch level. Spatial regions containing crowds are identified and filtered using motion information to prevent noise and false alarms due to moving flows of people. By means of one class classification and inner texture descriptors, we are able to obtain, from a single training set, a sufficiently general crowd model that can be used for all the scenarios that shares a similar viewpoint. Tests on public datasets and real setups validate the proposed system.

2014 Articolo su rivista

DOI IRIS

Head Pose Estimation in First-Person Camera Views

Authors: Alletto, Stefano; Serra, Giuseppe; Calderara, Simone; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

In this paper we present a new method for head pose real-time estimation in ego-vision scenarios that is a key … (Read full abstract)

In this paper we present a new method for head pose real-time estimation in ego-vision scenarios that is a key step in the understanding of social interactions. In order to robustly detect head under changing aspect ratio, scale and orientation we use and extend the Hough-Based Tracker which allows to follow simultaneously each subject in the scene. In an ego-vision scenario where a group interacts in a discussion, each subject's head orientation will be more likely to remain focused for a while on the person who has the floor. In order to encode this behavior we include a stateful Hidden Markov Model technique that enforces the predicted pose with the temporal coherence from a video sequence. We extensively test our approach on several indoor and outdoor ego-vision videos with high illumination variations showing its validity and outperforming other recent related state of the art approaches.

2014 Relazione in Atti di Convegno

DOI IRIS

Kernelized Structural Classification for 3D Dogs Body Parts Detection

Authors: Pistocchi, Simone; Calderara, Simone; Barnard, S.; Ferri, N.; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Despite pattern recognition methods for human behavioral analysis has flourished in the last decade, animal behavioral analysis has been almost … (Read full abstract)

Despite pattern recognition methods for human behavioral analysis has flourished in the last decade, animal behavioral analysis has been almost neglected. Those few approaches are mostly focused on preserving livestock economic value while attention on the welfare of companion animals, like dogs, is now emerging as a social need. In this work, following the analogy with human behavior recognition, we propose a system for recognizing body parts of dogs kept in pens. We decide to adopt both 2D and 3D features in order to obtain a rich description of the dog model. Images are acquired using the Microsoft Kinect to capture the depth map images of the dog. Upon depth maps a Structural Support Vector Machine (SSVM) is employed to identify the body parts using both 3D features and 2D images. The proposal relies on a kernelized discriminative structural classificator specifically tailored for dogs independently from the size and breed. The classification is performed in an online fashion using the LaRank optimization technique to obtaining real time performances. Promising results have emerged during the experimental evaluation carried out at a dog shelter, managed by IZSAM, in Teramo, Italy.

2014 Relazione in Atti di Convegno

DOI IRIS

Pattern recognition and crowd analysis

Authors: Bandini, S.; Calderara, S.; Cucchiara, R.

Published in: PATTERN RECOGNITION LETTERS

2014 Articolo su rivista

DOI IRIS

Visual Tracking: An Experimental Survey

Authors: A. W. M., Smeulder; D. M., Chu; Cucchiara, Rita; Calderara, Simone; A., Dehghan; M., Shah

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

There is a large variety of trackers, which have been proposed in the literature during the last two decades with … (Read full abstract)

There is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is difficult problem, therefore it remains a most active area of research in Computer Vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities and at least six more aspects. However, the performance of proposed trackers have been evaluated typically on less than ten videos, or on the special purpose datasets. In this paper, we aim to evaluate trackers systematically and experimentally on 315 video fragments covering above aspects. We selected a set of nineteen trackers to include a wide variety of algorithms often cited in literature, supplemented with trackers appearing in 2010 and 2011 for which the code was publicly available. We demonstrate that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing. We find that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score. The analysis under a large variety of circumstances provides objective insight into the strengths and weaknesses of trackers.

2014 Articolo su rivista

DOI IRIS

Publications by Simone Calderara

Learning to Divide and Conquer for Online Multi-Target Tracking

Learning to identify leaders in crowd

Towards the evaluation of reproducible robustness in tracking-by-detection

Understanding social relationships in egocentric vision

A complete system for garment segmentation and color classification

Detection of static groups and crowds gathered in open spaces by texture classification

Head Pose Estimation in First-Person Camera Views

Kernelized Structural Classification for 3D Dogs Body Parts Detection

Pattern recognition and crowd analysis

Visual Tracking: An Experimental Survey