Publications - AImageLab

Generative Adversarial Models for People Attribute Recognition in Surveillance

Authors: Fabbri, Matteo; Calderara, Simone; Cucchiara, Rita

In this paper we propose a deep architecture for detecting people attributes (e.g. gender, race, clothing ...) in surveillance contexts. … (Read full abstract)

In this paper we propose a deep architecture for detecting people attributes (e.g. gender, race, clothing ...) in surveillance contexts. Our proposal explicitly deal with poor resolution and occlusion issues that often occur in surveillance footages by enhancing the images by means of Deep Convolutional Generative Adversarial Networks (DCGAN). Experiments show that by combining both our Generative Reconstruction and Deep Attribute Classification Network we can effectively extract attributes even when resolution is poor and in presence of strong occlusions up to 80% of the whole person figure.

2017 Relazione in Atti di Convegno

IRIS

Learning to Map Vehicles into Bird's Eye View

Authors: Palazzi, Andrea; Borghi, Guido; Abati, Davide; Calderara, Simone; Cucchiara, Rita

Awareness of the road scene is an essential component for both autonomous vehicles and Advances Driver Assistance Systems and is … (Read full abstract)

Awareness of the road scene is an essential component for both autonomous vehicles and Advances Driver Assistance Systems and is gaining importance both for the academia and car companies. This paper presents a way to learn a semantic-aware transformation which maps detections from a dashboard camera view onto a broader bird's eye occupancy map of the scene. To this end, a huge synthetic dataset featuring 1M couples of frames, taken from both car dashboard and bird's eye view, has been collected and automatically annotated. A deep-network is then trained to warp detections from the first to the second view. We demonstrate the effectiveness of our model against several baselines and observe that is able to generalize on real-world data despite having been trained solely on synthetic ones.

2017 Relazione in Atti di Convegno

DOI IRIS

Learning Where to Attend Like a Human Driver

Authors: Palazzi, Andrea; Solera, Francesco; Calderara, Simone; Alletto, Stefano; Cucchiara, Rita

Despite the advent of autonomous cars, it's likely - at least in the near future - that human attention will … (Read full abstract)

Despite the advent of autonomous cars, it's likely - at least in the near future - that human attention will still maintain a central role as a guarantee in terms of legal responsibility during the driving task. In this paper we study the dynamics of the driver's gaze and use it as a proxy to understand related attentional mechanisms. First, we build our analysis upon two questions: where and what the driver is looking at? Second, we model the driver's gaze by training a coarse-to-fine convolutional network on short sequences extracted from the DR(eye)VE dataset. Experimental comparison against different baselines reveal that the driver's gaze can indeed be learnt to some extent, despite i) being highly subjective and ii) having only one driver's gaze available for each sequence due to the irreproducibility of the scene. Eventually, we advocate for a new assisted driving paradigm which suggests to the driver, with no intervention, where she should focus her attention.

2017 Relazione in Atti di Convegno

IRIS

Signal Processing and Machine Learning for Diplegia Classification

Authors: Bergamini, Luca; Calderara, Simone; Bicocchi, Nicola; Ferrari, Alberto; Vitetta, Giorgio

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Diplegia is one of the most common forms of a broad family of motion disorders named cerebral palsy (CP) affecting … (Read full abstract)

Diplegia is one of the most common forms of a broad family of motion disorders named cerebral palsy (CP) affecting the voluntary muscular system. In recent years, various classification criteria have been proposed for CP, to assist in diagnosis, clinical decision-making and communication. In this manuscript, we divide the spastic forms of CP into 4 other categories according to a previous classification criterion and propose a machine learning approach for automatically classifying patients. Training and validation of our approach are based on data about 200 patients acquired using 19 markers and high frequency VICON cameras in an Italian hospital. Our approach makes use of the latest deep learning techniques. More specifically, it involves a multi-layer perceptron network (MLP), combined with Fourier analysis. An encouraging classification performance is obtained for two of the four classes.

2017 Relazione in Atti di Convegno

DOI IRIS

Tracking social groups within and across cameras

Authors: Solera, Francesco; Calderara, Simone; Ristani, Ergys; Tomasi, Carlo; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

We propose a method for tracking groups from single and multiple cameras with disjoint fields of view. Our formulation follows … (Read full abstract)

We propose a method for tracking groups from single and multiple cameras with disjoint fields of view. Our formulation follows the tracking-by-detection paradigm where groups are the atomic entities and are linked over time to form long and consistent trajectories. To this end, we formulate the problem as a supervised clustering problem where a Structural SVM classifier learns a similarity measure appropriate for group entities. Multi-camera group tracking is handled inside the framework by adopting an orthogonal feature encoding that allows the classifier to learn inter- and intra-camera feature weights differently. Experiments were carried out on a novel annotated group tracking data set, the DukeMTMC-Groups data set. Since this is the first data set on the problem it comes with the proposal of a suitable evaluation measure. Results of adopting learning for the task are encouraging, scoring a +15% improvement in F1 measure over a non-learning based clustering baseline. To our knowledge this is the first proposal of this kind dealing with multi-camera group tracking.

2017 Articolo su rivista

DOI IRIS

Quick, accurate, smart: 3D computer vision technology helps assessing confined animals' behaviour

Authors: Barnard, Shanis; Calderara, Simone; Pistocchi, Simone; Cucchiara, Rita; Podaliri Vulpiani, Michele; Messori, Stefano; Ferri, Nicola

Published in: PLOS ONE

Mankind directly controls the environment and lifestyles of several domestic species for purposes ranging from production and research to conservation … (Read full abstract)

Mankind directly controls the environment and lifestyles of several domestic species for purposes ranging from production and research to conservation and companionship. These environments and lifestyles may not offer these animals the best quality of life. Behaviour is a direct reflection of how the animal is coping with its environment. Behavioural indicators are thus among the preferred parameters to assess welfare. However, behavioural recording (usually from video) can be very time consuming and the accuracy and reliability of the output rely on the experience and background of the observers. The outburst of new video technology and computer image processing gives the basis for promising solutions. In this pilot study, we present a new prototype software able to automatically infer the behaviour of dogs housed in kennels from 3D visual data and through structured machine learning frameworks. Depth information acquired through 3D features, body part detection and training are the key elements that allow the machine to recognise postures, trajectories inside the kennel and patterns of movement that can be later labelled at convenience. The main innovation of the software is its ability to automatically cluster frequently observed temporal patterns of movement without any pre-set ethogram. Conversely, when common patterns are defined through training, a deviation from normal behaviour in time or between individuals could be assessed. The software accuracy in correctly detecting the dogs' behaviour was checked through a validation process. An automatic behaviour recognition system, independent from human subjectivity, could add scientific knowledge on animals' quality of life in confinement as well as saving time and resources. This 3D framework was designed to be invariant to the dog's shape and size and could be extended to farm, laboratory and zoo quadrupeds in artificial housing. The computer vision technique applied to this software is innovative in non-human animal behaviour science. Further improvements and validation are needed, and future applications and limitations are discussed.

2016 Articolo su rivista

DOI IRIS

Socially Constrained Structural Learning for Groups Detection in Crowd

Authors: Solera, Francesco; Calderara, Simone; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Modern crowd theories agree that collective behavior is the result of the underlying interactions among small groups of individuals. In … (Read full abstract)

Modern crowd theories agree that collective behavior is the result of the underlying interactions among small groups of individuals. In this work, we propose a novel algorithm for detecting social groups in crowds by means of a Correlation Clustering procedure on people trajectories. The affinity between crowd members is learned through an online formulation of the Structural SVM framework and a set of specifically designed features characterizing both their physical and social identity, inspired by Proxemic theory, Granger causality, DTW and Heat-maps. To adhere to sociological observations, we introduce a loss function (G-MITRE) able to deal with the complexity of evaluating group detection performances. We show our algorithm achieves state-of-the-art results when relying on both ground truth trajectories and tracklets previously extracted by available detector/tracker systems.

2016 Articolo su rivista

DOI IRIS

Spotting prejudice with nonverbal behaviours

Authors: Palazzi, Andrea; Calderara, Simone; Bicocchi, Nicola; Vezzali, Loris; Di Bernardo, Gian Antonio; Zambonelli, Franco; Cucchiara, Rita

Despite prejudice cannot be directly observed, nonverbal behaviours provide profound hints on people inclinations. In this paper, we use recent … (Read full abstract)

Despite prejudice cannot be directly observed, nonverbal behaviours provide profound hints on people inclinations. In this paper, we use recent sensing technologies and machine learning techniques to automatically infer the results of psychological questionnaires frequently used to assess implicit prejudice. In particular, we recorded 32 students discussing with both white and black collaborators. Then, we identiﬁed a set of features allowing automatic extraction and measured their degree of correlation with psychological scores. Results conﬁrmed that automated analysis of nonverbal behaviour is actually possible thus paving the way for innovative clinical tools and eventually more secure societies.

2016 Relazione in Atti di Convegno

DOI IRIS

Transductive People Tracking in Unconstrained Surveillance

Authors: Coppi, Dalia; Calderara, Simone; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Long term tracking of people in unconstrained scenarios is still an open problem due to the absence of constant elements … (Read full abstract)

Long term tracking of people in unconstrained scenarios is still an open problem due to the absence of constant elements in the problem setting. The camera, when active, may move and both the background and the target appearance may change abruptly leading to the inadequacy of most standard tracking techniques. We propose to exploit a learning approach that considers the tracking task as a semi supervised learning (SSL) problem. Given few target samples the aim is to search the target occurrences in the video stream re-interpreting the problem as label propagation on a similarity graph. We propose a solution based on graph transduction that works iteratively frame by frame. Additionally, in order to avoid drifting, we introduce an update strategy based on an evolutionary clustering technique that chooses the visual templates that better describe target appearance evolving the model during the processing of the video. Since we model people appearance by means of covariance matrices on color and gradient information our framework is directly related to structure learning on Riemannian manifolds. Tests on publicly available datasets and comparisons with stateof- the-art techniques allow to conclude that our solution exhibit interesting performances in terms of tracking precision and recall in most of the considered scenarios.

2016 Articolo su rivista

DOI IRIS

Active query process for digital video surveillance forensic applications

Authors: Coppi, Dalia; Calderara, Simone; Cucchiara, Rita

Published in: SIGNAL, IMAGE AND VIDEO PROCESSING

Multimedia forensics is a new emerging discipline regarding the analysis and exploitation of digital data as support for investigation to … (Read full abstract)

Multimedia forensics is a new emerging discipline regarding the analysis and exploitation of digital data as support for investigation to extract probative elements. Among them, visual data about people and people activities, extracted from videos in an efficient way, are becoming day by day more appealing for forensics, due to the availability of large video-surveillance footage. Thus, many research studies and prototypes investigate the analysis of soft biometrics data, such as people appearance and people trajectories. In this work, we propose new solutions for querying and retrieving visual data in an interactive and active fashion for soft biometrics in forensics. The innovative proposal joins the capability of transductive learning for semi-supervised search by similarity and a typical multimedia methodology based on user-guided relevance feedback to allow an active interaction with the visual data of people, appearance and trajectory in large surveillance areas. Approaches proposed are very general and can be exploited independently by the surveillance setting and the type of video analytic tools.

2015 Articolo su rivista

DOI IRIS

Publications by Simone Calderara

Generative Adversarial Models for People Attribute Recognition in Surveillance

Learning to Map Vehicles into Bird's Eye View

Learning Where to Attend Like a Human Driver

Signal Processing and Machine Learning for Diplegia Classification

Tracking social groups within and across cameras

Quick, accurate, smart: 3D computer vision technology helps assessing confined animals' behaviour

Socially Constrained Structural Learning for Groups Detection in Crowd

Spotting prejudice with nonverbal behaviours

Transductive People Tracking in Unconstrained Surveillance

Active query process for digital video surveillance forensic applications