Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Embedded Recurrent Network for Head Pose Estimation in Car

Authors: Borghi, Guido; Gasparini, Riccardo; Vezzani, Roberto; Cucchiara, Rita

An accurate and fast driver's head pose estimation is a rich source of information, in particular in the automotive context. … (Read full abstract)

An accurate and fast driver's head pose estimation is a rich source of information, in particular in the automotive context. Head pose is a key element for driver's behavior investigation, pose analysis, attention monitoring and also a useful component to improve the efficacy of Human-Car Interaction systems. In this paper, a Recurrent Neural Network is exploited to tackle the problem of driver head pose estimation, directly and only working on depth images to be more reliable in presence of varying or insufficient illumination. Experimental results, obtained from two public dataset, namely Biwi Kinect Head Pose and ICT-3DHP Database, prove the efficacy of the proposed method that overcomes state-of-art works. Besides, the entire system is implemented and tested on two embedded boards with real time performance.

2017 Relazione in Atti di Convegno

Fast and Accurate Facial Landmark Localization in Depth Images for In-car Applications

Authors: Frigieri, Elia; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

A correct and reliable localization of facial landmark enables several applications in many fields, ranging from Human Computer Interaction to … (Read full abstract)

A correct and reliable localization of facial landmark enables several applications in many fields, ranging from Human Computer Interaction to video surveillance. For instance, it can provide a valuable input to monitor the driver physical state and attention level in automotive context. In this paper, we tackle the problem of facial landmark localization through a deep approach. The developed system runs in real time and, in particular, is more reliable than state-of-the-art competitors specially in presence of light changes and poor illumination, thanks to the use of depth images as input. We also collected and shared a new realistic dataset inside a car, called MotorMark, to train and test the system. In addition, we exploited the public Eurecom Kinect Face Dataset for the evaluation phase, achieving promising results both in terms of accuracy and computational speed.

2017 Relazione in Atti di Convegno

FOIL it! Find One mismatch between Image and Language caption

Authors: Shekhar, Ravi; Pezzelle, Sandro; Klimovich, Yauhen; Herbelot, Aurelie; Nabi, Moin; Sangineto, Enver; Bernardi, Raffaella

In this paper, we aim to understand whether current language and vision (LaVi) models truly grasp the interaction between the … (Read full abstract)

In this paper, we aim to understand whether current language and vision (LaVi) models truly grasp the interaction between the two modalities. To this end, we propose an extension of the MS-COCO dataset, FOIL-COCO, which associates images with both correct and ‘foil’ captions, that is, descriptions of the image that are highly similar to the original ones, but contain one single mistake (‘foil word’). We show that current LaVi models fall into the traps of this data and perform badly on three tasks: a) caption classification (correct vs. foil); b) foil word detection; c) foil word correction. Humans, in contrast, have near-perfect performance on those tasks. We demonstrate that merely utilising language cues is not enough to model FOIL-COCO and that it challenges the state-of-the-art by requiring a fine-grained understanding of the relation between text and image.

2017 Relazione in Atti di Convegno

From Depth Data to Head Pose Estimation: a Siamese approach

Authors: Venturelli, Marco; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

The correct estimation of the head pose is a problem of the great importance for many applications. For instance, it … (Read full abstract)

The correct estimation of the head pose is a problem of the great importance for many applications. For instance, it is an enabling technology in automotive for driver attention monitoring. In this paper, we tackle the pose estimation problem through a deep learning network working in regression manner. Traditional methods usually rely on visual facial features, such as facial landmarks or nose tip position. In contrast, we exploit a Convolutional Neural Network (CNN) to perform head pose estimation directly from depth data. We exploit a Siamese architecture and we propose a novel loss function to improve the learning of the regression network layer. The system has been tested on two public datasets, Biwi Kinect Head Pose and ICT-3DHP database. The reported results demonstrate the improvement in accuracy with respect to current state-of-the-art approaches and the real time capabilities of the overall framework.

2017 Relazione in Atti di Convegno

From Groups to Leaders and Back. Exploring Mutual Predictability Between Social Groups and Their Leaders

Authors: Solera, Francesco; Calderara, Simone; Cucchiara, Rita

Recently, social theories and empirical observations identified small groups and leaders as the basic elements which shape a crowd. This … (Read full abstract)

Recently, social theories and empirical observations identified small groups and leaders as the basic elements which shape a crowd. This leads to an intermediate level of abstraction that is placed between the crowd as a flow of people, and the crowd as a collection of individuals. Consequently, automatic analysis of crowds in computer vision is also experiencing a shift in focus from individuals to groups and from small groups to their leaders. In this chapter, we present state-of-the-art solutions to the groups and leaders detection problem, which are able to account for physical factors as well as for sociological evidence observed over short time windows. The presented algorithms are framed as structured learning problems over the set of individual trajectories. However, the way trajectories are exploited to predict the structure of the crowd is not fixed but rather learned from recorded and annotated data, enabling the method to adapt these concepts to different scenarios, densities, cultures, and other unobservable complexities. Additionally, we investigate the relation between leaders and their groups and propose the first attempt to exploit leadership as prior knowledge for group detection.

2017 Capitolo/Saggio

FuGePrior: A novel gene fusion prioritization algorithm based on accurate fusion structure analysis in cancer RNA-seq samples

Authors: Paciello, Giulia; Ficarra, Elisa

Published in: BMC BIOINFORMATICS

2017 Articolo su rivista

Generative Adversarial Models for People Attribute Recognition in Surveillance

Authors: Fabbri, Matteo; Calderara, Simone; Cucchiara, Rita

In this paper we propose a deep architecture for detecting people attributes (e.g. gender, race, clothing ...) in surveillance contexts. … (Read full abstract)

In this paper we propose a deep architecture for detecting people attributes (e.g. gender, race, clothing ...) in surveillance contexts. Our proposal explicitly deal with poor resolution and occlusion issues that often occur in surveillance footages by enhancing the images by means of Deep Convolutional Generative Adversarial Networks (DCGAN). Experiments show that by combining both our Generative Reconstruction and Deep Attribute Classification Network we can effectively extract attributes even when resolution is poor and in presence of strong occlusions up to 80% of the whole person figure.

2017 Relazione in Atti di Convegno

Guest Editorial Special Issue on Wearable and Ego-Vision Systems for Augmented Experience

Authors: Serra, G.; Cucchiara, R.; Kitani, K. M.; Civera, J.

Published in: IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS

2017 Articolo su rivista

Hierarchical Boundary-Aware Neural Encoder for Video Captioning

Authors: Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita

Published in: PROCEEDINGS - IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

The use of Recurrent Neural Networks for video captioning has recently gained a lot of attention, since they can be … (Read full abstract)

The use of Recurrent Neural Networks for video captioning has recently gained a lot of attention, since they can be used both to encode the input video and to generate the corresponding description. In this paper, we present a recurrent video encoding scheme which can discover and leverage the hierarchical structure of the video. Unlike the classical encoder-decoder approach, in which a video is encoded continuously by a recurrent layer, we propose a novel LSTM cell, which can identify discontinuity points between frames or segments and modify the temporal connections of the encoding layer accordingly. We evaluate our approach on three large-scale datasets: the Montreal Video Annotation dataset, the MPII Movie Description dataset and the Microsoft Video Description Corpus. Experiments show that our approach can discover appropriate hierarchical representations of input videos and improve the state of the art results on movie description datasets.

2017 Relazione in Atti di Convegno

Historical Handwritten Text Images Word Spotting through Sliding Window HOG Features

Authors: Bolelli, Federico; Borghi, Guido; Grana, Costantino

Published in: LECTURE NOTES IN COMPUTER SCIENCE

In this paper we present an innovative technique to semi-automatically index handwritten word images. The proposed method is based on … (Read full abstract)

In this paper we present an innovative technique to semi-automatically index handwritten word images. The proposed method is based on HOG descriptors and exploits Dynamic Time Warping technique to compare feature vectors elaborated from single handwritten words. Our strategy is applied to a new challenging dataset extracted from Italian civil registries of the XIX century. Experimental results, compared with some previously developed word spotting strategies, confirmed that our method outperforms competitors.

2017 Relazione in Atti di Convegno

Page 57 of 109 • Total publications: 1084