Publications - AImageLab

Performance measures and a data set for multi-target, multi-camera tracking

Authors: Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C.

Published in: LECTURE NOTES IN ARTIFICIAL INTELLIGENCE

To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance … (Read full abstract)

To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance that treats errors of all types uniformly and emphasizes correct identification over sources of error; (ii) the largest fully-annotated and calibrated data set to date with more than 2 million frames of 1080 p, 60 fps video taken by 8 cameras observing more than 2, 700 identities over 85 min; and (iii) a reference software system as a comparison baseline. We show that (i) our measures properly account for bottom-line identity match performance in the multi-camera setting; (ii) our data set poses realistic challenges to current trackers; and (iii) the performance of our system is comparable to the state of the art.

2016 Relazione in Atti di Convegno

DOI IRIS

Quick, accurate, smart: 3D computer vision technology helps assessing confined animals' behaviour

Authors: Barnard, Shanis; Calderara, Simone; Pistocchi, Simone; Cucchiara, Rita; Podaliri Vulpiani, Michele; Messori, Stefano; Ferri, Nicola

Published in: PLOS ONE

Mankind directly controls the environment and lifestyles of several domestic species for purposes ranging from production and research to conservation … (Read full abstract)

Mankind directly controls the environment and lifestyles of several domestic species for purposes ranging from production and research to conservation and companionship. These environments and lifestyles may not offer these animals the best quality of life. Behaviour is a direct reflection of how the animal is coping with its environment. Behavioural indicators are thus among the preferred parameters to assess welfare. However, behavioural recording (usually from video) can be very time consuming and the accuracy and reliability of the output rely on the experience and background of the observers. The outburst of new video technology and computer image processing gives the basis for promising solutions. In this pilot study, we present a new prototype software able to automatically infer the behaviour of dogs housed in kennels from 3D visual data and through structured machine learning frameworks. Depth information acquired through 3D features, body part detection and training are the key elements that allow the machine to recognise postures, trajectories inside the kennel and patterns of movement that can be later labelled at convenience. The main innovation of the software is its ability to automatically cluster frequently observed temporal patterns of movement without any pre-set ethogram. Conversely, when common patterns are defined through training, a deviation from normal behaviour in time or between individuals could be assessed. The software accuracy in correctly detecting the dogs' behaviour was checked through a validation process. An automatic behaviour recognition system, independent from human subjectivity, could add scientific knowledge on animals' quality of life in confinement as well as saving time and resources. This 3D framework was designed to be invariant to the dog's shape and size and could be extended to farm, laboratory and zoo quadrupeds in artificial housing. The computer vision technique applied to this software is innovative in non-human animal behaviour science. Further improvements and validation are needed, and future applications and limitations are discussed.

2016 Articolo su rivista

DOI IRIS

Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features

Authors: Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita

This paper presents a novel retrieval pipeline for video collections, which aims to retrieve the most significant parts of an … (Read full abstract)

This paper presents a novel retrieval pipeline for video collections, which aims to retrieve the most significant parts of an edited video for a given query, and represent them with thumbnails which are at the same time semantically meaningful and aesthetically remarkable. Videos are first segmented into coherent and story-telling scenes, then a retrieval algorithm based on deep learning is proposed to retrieve the most significant scenes for a textual query. A ranking strategy based on deep features is finally used to tackle the problem of visualizing the best thumbnail. Qualitative and quantitative experiments are conducted on a collection of edited videos to demonstrate the effectiveness of our approach.

2016 Relazione in Atti di Convegno

DOI IRIS

Shot, scene and keyframe ordering for interactive video re-use

Authors: Baraldi, L.; Grana, C.; Borghi, G.; Vezzani, R.; Cucchiara, R.

This paper presents a complete system for shot and scene detection in broadcast videos, as well as a method to … (Read full abstract)

This paper presents a complete system for shot and scene detection in broadcast videos, as well as a method to select the best representative key-frames, which could be used in new interactive interfaces for accessing large collections of edited videos. The final goal is to enable an improved access to video footage and the re-use of video content with the direct management of user-selected video-clips.

2016 Relazione in Atti di Convegno

DOI IRIS

Socially Constrained Structural Learning for Groups Detection in Crowd

Authors: Solera, Francesco; Calderara, Simone; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Modern crowd theories agree that collective behavior is the result of the underlying interactions among small groups of individuals. In … (Read full abstract)

Modern crowd theories agree that collective behavior is the result of the underlying interactions among small groups of individuals. In this work, we propose a novel algorithm for detecting social groups in crowds by means of a Correlation Clustering procedure on people trajectories. The affinity between crowd members is learned through an online formulation of the Structural SVM framework and a set of specifically designed features characterizing both their physical and social identity, inspired by Proxemic theory, Granger causality, DTW and Heat-maps. To adhere to sociological observations, we introduce a loss function (G-MITRE) able to deal with the complexity of evaluating group detection performances. We show our algorithm achieves state-of-the-art results when relying on both ground truth trajectories and tracklets previously extracted by available detector/tracker systems.

2016 Articolo su rivista

DOI IRIS

Spotting prejudice with nonverbal behaviours

Authors: Palazzi, Andrea; Calderara, Simone; Bicocchi, Nicola; Vezzali, Loris; Di Bernardo, Gian Antonio; Zambonelli, Franco; Cucchiara, Rita

Despite prejudice cannot be directly observed, nonverbal behaviours provide profound hints on people inclinations. In this paper, we use recent … (Read full abstract)

Despite prejudice cannot be directly observed, nonverbal behaviours provide profound hints on people inclinations. In this paper, we use recent sensing technologies and machine learning techniques to automatically infer the results of psychological questionnaires frequently used to assess implicit prejudice. In particular, we recorded 32 students discussing with both white and black collaborators. Then, we identiﬁed a set of features allowing automatic extraction and measured their degree of correlation with psychological scores. Results conﬁrmed that automated analysis of nonverbal behaviour is actually possible thus paving the way for innovative clinical tools and eventually more secure societies.

2016 Relazione in Atti di Convegno

DOI IRIS

Transductive People Tracking in Unconstrained Surveillance

Authors: Coppi, Dalia; Calderara, Simone; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Long term tracking of people in unconstrained scenarios is still an open problem due to the absence of constant elements … (Read full abstract)

Long term tracking of people in unconstrained scenarios is still an open problem due to the absence of constant elements in the problem setting. The camera, when active, may move and both the background and the target appearance may change abruptly leading to the inadequacy of most standard tracking techniques. We propose to exploit a learning approach that considers the tracking task as a semi supervised learning (SSL) problem. Given few target samples the aim is to search the target occurrences in the video stream re-interpreting the problem as label propagation on a similarity graph. We propose a solution based on graph transduction that works iteratively frame by frame. Additionally, in order to avoid drifting, we introduce an update strategy based on an evolutionary clustering technique that chooses the visual templates that better describe target appearance evolving the model during the processing of the video. Since we model people appearance by means of covariance matrices on color and gradient information our framework is directly related to structure learning on Riemannian manifolds. Tests on publicly available datasets and comparisons with stateof- the-art techniques allow to conclude that our solution exhibit interesting performances in terms of tracking precision and recall in most of the considered scenarios.

2016 Articolo su rivista

DOI IRIS

A Deep Siamese Network for Scene Detection in Broadcast Videos

Authors: Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita

We present a model that automatically divides broadcast videos into coherent scenes by learning a distance measure between shots. Experiments … (Read full abstract)

We present a model that automatically divides broadcast videos into coherent scenes by learning a distance measure between shots. Experiments are performed to demonstrate the effectiveness of our approach by comparing our algorithm against recent proposals for automatic scene segmentation. We also propose an improved performance measure that aims to reduce the gap between numerical evaluation and expected results, and propose and release a new benchmark dataset.

2015 Relazione in Atti di Convegno

DOI IRIS

A General-Purpose Sensing Floor Architecture for Human-Environment Interaction

Authors: Vezzani, Roberto; Lombardi, Martino; Pieracci, Augusto; Santinelli, Paolo; Cucchiara, Rita

Published in: ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS

Smart environments are now designed as natural interfaces to capture and understand human behavior without a need for explicit human-computer … (Read full abstract)

Smart environments are now designed as natural interfaces to capture and understand human behavior without a need for explicit human-computer interaction. In this paper, we present a general-purpose architecture that acquires and understands human behaviors through a sensing floor. The pressure field generated by moving people is captured and analyzed. Specific actions and events are then detected by a low-level processing engine and sent to high-level interfaces providing different functions. The proposed architecture and sensors are modular, general-purpose, cheap, and suitable for both small- and large-area coverage. Some sample entertainment and virtual reality applications that we developed to test the platform are presented.

2015 Articolo su rivista

DOI IRIS

Active query process for digital video surveillance forensic applications

Authors: Coppi, Dalia; Calderara, Simone; Cucchiara, Rita

Published in: SIGNAL, IMAGE AND VIDEO PROCESSING

Multimedia forensics is a new emerging discipline regarding the analysis and exploitation of digital data as support for investigation to … (Read full abstract)

Multimedia forensics is a new emerging discipline regarding the analysis and exploitation of digital data as support for investigation to extract probative elements. Among them, visual data about people and people activities, extracted from videos in an efficient way, are becoming day by day more appealing for forensics, due to the availability of large video-surveillance footage. Thus, many research studies and prototypes investigate the analysis of soft biometrics data, such as people appearance and people trajectories. In this work, we propose new solutions for querying and retrieving visual data in an interactive and active fashion for soft biometrics in forensics. The innovative proposal joins the capability of transductive learning for semi-supervised search by similarity and a typical multimedia methodology based on user-guided relevance feedback to allow an active interaction with the visual data of people, appearance and trajectory in large surveillance areas. Approaches proposed are very general and can be exploited independently by the surveillance setting and the type of video analytic tools.

2015 Articolo su rivista

DOI IRIS

Publications by Rita Cucchiara

Performance measures and a data set for multi-target, multi-camera tracking

Quick, accurate, smart: 3D computer vision technology helps assessing confined animals' behaviour

Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features

Shot, scene and keyframe ordering for interactive video re-use

Socially Constrained Structural Learning for Groups Detection in Crowd

Spotting prejudice with nonverbal behaviours

Transductive People Tracking in Unconstrained Surveillance

A Deep Siamese Network for Scene Detection in Broadcast Videos

A General-Purpose Sensing Floor Architecture for Human-Environment Interaction

Active query process for digital video surveillance forensic applications