Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Driver Face Verification with Depth Maps

Authors: Borghi, Guido; Pini, Stefano; Vezzani, Roberto; Cucchiara, Rita

Published in: SENSORS

Face verification is the task of checking if two provided images contain the face of the same person or not. … (Read full abstract)

Face verification is the task of checking if two provided images contain the face of the same person or not. In this work, we propose a fully-convolutional Siamese architecture to tackle this task, achieving state-of-the-art results on three publicly-released datasets, namely Pandora, High-Resolution Range-based Face Database (HRRFaceD), and CurtinFaces. The proposed method takes depth maps as the input, since depth cameras have been proven to be more reliable in different illumination conditions. Thus, the system is able to work even in the case of the total or partial absence of external light sources, which is a key feature for automotive applications. From the algorithmic point of view, we propose a fully-convolutional architecture with a limited number of parameters, capable of dealing with the small amount of depth data available for training and able to run in real time even on a CPU and embedded boards. The experimental results show acceptable accuracy to allow exploitation in real-world applications with in-board cameras. Finally, exploiting the presence of faces occluded by various head garments and extreme head poses available in the Pandora dataset, we successfully test the proposed system also during strong visual occlusions. The excellent results obtained confirm the efficacy of the proposed method.

2019 Articolo su rivista

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters

Authors: Landi, Federico; Baraldi, Lorenzo; Corsini, Massimiliano; Cucchiara, Rita

In Vision-and-Language Navigation (VLN), an embodied agent needs to reach a target destination with the only guidance of a natural … (Read full abstract)

In Vision-and-Language Navigation (VLN), an embodied agent needs to reach a target destination with the only guidance of a natural language instruction. To explore the environment and progress towards the target location, the agent must perform a series of low-level actions, such as rotate, before stepping ahead. In this paper, we propose to exploit dynamic convolutional filters to encode the visual information and the lingual description in an efficient way. Differently from some previous works that abstract from the agent perspective and use high-level navigation spaces, we design a policy which decodes the information provided by dynamic convolution into a series of low-level, agent friendly actions. Results show that our model exploiting dynamic filters performs better than other architectures with traditional convolution, being the new state of the art for embodied VLN in the low-level action space. Additionally, we attempt to categorize recent work on VLN depending on their architectural choices and distinguish two main groups: we call them low-level actions and high-level actions models. To the best of our knowledge, we are the first to propose this analysis and categorization for VLN.

2019 Relazione in Atti di Convegno

End-to-end 6-DoF Object Pose Estimation through Differentiable Rasterization

Authors: Palazzi, Andrea; Bergamini, Luca; Calderara, Simone; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Here we introduce an approximated differentiable renderer to refine a 6-DoF pose prediction using only 2D alignment information. To this … (Read full abstract)

Here we introduce an approximated differentiable renderer to refine a 6-DoF pose prediction using only 2D alignment information. To this end, a two-branched convolutional encoder network is employed to jointly estimate the object class and its 6-DoF pose in the scene. We then propose a new formulation of an approximated differentiable renderer to re-project the 3D object on the image according to its predicted pose; in this way the alignment error between the observed and the re-projected object silhouette can be measured. Since the renderer is differentiable, it is possible to back-propagate through it to correct the estimated pose at test time in an online learning fashion. Eventually we show how to leverage the classification branch to profitably re-project a representative model of the predicted class (i.e. a medoid) instead. Each object in the scene is processed independently and novel viewpoints in which both objects arrangement and mutual pose are preserved can be rendered. Differentiable renderer code is available at:https://github.com/ndrplz/tensorflow-mesh-renderer.

2019 Relazione in Atti di Convegno

Experimental Prediction Intervals for Monitoring Wind Turbines: an Ensemble Approach

Authors: Cascianelli, Silvia; Astolfi, Davide; Costante, Gabriele; Castellani, Francesco; Fravolini, Mario Luca

2019 Relazione in Atti di Convegno

Exploiting Gene Expression Profiles for the Automated Prediction of Connectivity between Brain Regions

Authors: Roberti, Ilaria; Lovino, Marta; Di Cataldo, Santa; Ficarra, Elisa; Urgese, Gianvito

Published in: INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES

The brain comprises a complex system of neurons interconnected by an intricate network of anatomical links. While recent studies demonstrated … (Read full abstract)

The brain comprises a complex system of neurons interconnected by an intricate network of anatomical links. While recent studies demonstrated the correlation between anatomical connectivity patterns and gene expression of neurons, using transcriptomic information to automatically predict such patterns is still an open challenge. In this work, we present a completely data-driven approach relying on machine learning (i.e., neural networks) to learn the anatomical connection directly from a training set of gene expression data. To do so, we combined gene expression and connectivity data from the Allen Mouse Brain Atlas to generate thousands of gene expression profile pairs from different brain regions. To each pair, we assigned a label describing the physical connection between the corresponding brain regions. Then, we exploited these data to train neural networks, designed to predict brain area connectivity. We assessed our solution on two prediction problems (with three and two connectivity class categories) involving cortical and cerebellum regions. As demonstrated by our results, we distinguish between connected and unconnected regions with 85% prediction accuracy and good balance of precision and recall. In our future work we may extend the analysis to more complex brain structures and consider RNA-Seq data as additional input to our model.

2019 Articolo su rivista

Face Verification from Depth using Privileged Information

Authors: Borghi, Guido; Pini, Stefano; Grazioli, Filippo; Vezzani, Roberto; Cucchiara, Rita

In this paper, a deep Siamese architecture for depth-based face verification is presented. The proposed approach efficiently verifies if two … (Read full abstract)

In this paper, a deep Siamese architecture for depth-based face verification is presented. The proposed approach efficiently verifies if two face images belong to the same person while handling a great variety of head poses and occlusions. The architecture, namely JanusNet, consists in a combination of a depth, a RGB and a hybrid Siamese network. During the training phase, the hybrid network learns to extract complementary mid-level convolutional features which mimic the features of the RGB network, simultaneously leveraging on the light invariance of depth images. At testing time, the model, relying only on depth data, achieves state-of-art results and real time performance, despite the lack of deep-oriented depth-based datasets.

2019 Relazione in Atti di Convegno

Gait-Based Diplegia Classification Using LSMT Networks

Authors: Ferrari, Alberto; Bergamini, Luca; Guerzoni, Giorgio; Calderara, Simone; Bicocchi, Nicola; Vitetta, Giorgio; Borghi, Corrado; Neviani, Rita; Ferrari, Adriano

Published in: JOURNAL OF HEALTHCARE ENGINEERING

Diplegia is a specific subcategory of the wide spectrum of motion disorders gathered under the name of cerebral palsy. Recent … (Read full abstract)

Diplegia is a specific subcategory of the wide spectrum of motion disorders gathered under the name of cerebral palsy. Recent works proposed to use gait analysis for diplegia classification paving the way for automated analysis. A clinically established gait-based classification system divides diplegic patients into 4 main forms, each one associated with a peculiar walking pattern. In this work, we apply two different deep learning techniques, namely, multilayer perceptron and recurrent neural networks, to automatically classify children into the 4 clinical forms. For the analysis, we used a dataset comprising gait data of 174 patients collected by means of an optoelectronic system. The measurements describing walking patterns have been processed to extract 27 angular parameters and then used to train both kinds of neural networks. Classification results are comparable with those provided by experts in 3 out of 4 forms.

2019 Articolo su rivista

Give Ear to My Face: Modelling Multimodal Attention to Social Interactions

Authors: Boccignone, Giuseppe; Cuculo, Vittorio; D’Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella

Published in: LECTURE NOTES IN COMPUTER SCIENCE

We address the deployment of perceptual attention to social interactions as displayed in conversational clips, when relying on multimodal information … (Read full abstract)

We address the deployment of perceptual attention to social interactions as displayed in conversational clips, when relying on multimodal information (audio and video). A probabilistic modelling framework is proposed that goes beyond the classic saliency paradigm while integrating multiple information cues. Attentional allocation is determined not just by stimulus-driven selection but, importantly, by social value as modulating the selection history of relevant multimodal items. Thus, the construction of attentional priority is the result of a sampling procedure conditioned on the potential value dynamics of socially relevant objects emerging moment to moment within the scene. Preliminary experiments on a publicly available dataset are presented.

2019 Relazione in Atti di Convegno

Going Deeper into Colorectal Cancer Histopathology

Authors: Ponzio, Francesco; Macii, Enrico; Ficarra, Elisa; Di Cataldo, Santa

Published in: COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE

The early diagnosis of colorectal cancer (CRC) traditionally leverages upon the microscopic examination of histological slides by experienced pathologists, which … (Read full abstract)

The early diagnosis of colorectal cancer (CRC) traditionally leverages upon the microscopic examination of histological slides by experienced pathologists, which is very time-consuming and rises many issues about the reliability of the results. In this paper we propose using Convolutional Neural Networks (CNNs), a class of deep networks that are successfully used in many contexts of pattern recognition, to automatically distinguish the cancerous tissues from either healthy or benign lesions. For this purpose, we designed and compared different CNN-based classification frameworks, involving either training CNNs from scratch on three classes of colorectal images, or transfer learning from a different classification problem. While a CNN trained from scratch obtained very good (about 90%) classification accuracy in our tests, the same CNN model pre-trained on the ImageNet dataset obtained even better accuracy (around 96%) on the same testing samples, requiring much lesser computational resources.

2019 Capitolo/Saggio

Hand Gestures for the Human-Car Interaction: the Briareo dataset

Authors: Manganaro, Fabio; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Natural User Interfaces can be an effective way to reduce driver's inattention during the driving activity. To this end, in … (Read full abstract)

Natural User Interfaces can be an effective way to reduce driver's inattention during the driving activity. To this end, in this paper we propose a new dataset, called Briareo, specifically collected for the hand gesture recognition task in the automotive context. The dataset is acquired from an innovative point of view, exploiting different kinds of cameras, i.e. RGB, infrared stereo, and depth, that provide various types of images and 3D hand joints. Moreover, the dataset contains a significant amount of hand gesture samples, performed by several subjects, allowing the use of deep learning-based approaches. Finally, a framework for hand gesture segmentation and classification is presented, exploiting a method introduced to assess the quality of the proposed dataset.

2019 Relazione in Atti di Convegno

Page 45 of 106 • Total publications: 1059