Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Give Ear to My Face: Modelling Multimodal Attention to Social Interactions

Authors: Boccignone, Giuseppe; Cuculo, Vittorio; D’Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella

Published in: LECTURE NOTES IN COMPUTER SCIENCE

We address the deployment of perceptual attention to social interactions as displayed in conversational clips, when relying on multimodal information … (Read full abstract)

We address the deployment of perceptual attention to social interactions as displayed in conversational clips, when relying on multimodal information (audio and video). A probabilistic modelling framework is proposed that goes beyond the classic saliency paradigm while integrating multiple information cues. Attentional allocation is determined not just by stimulus-driven selection but, importantly, by social value as modulating the selection history of relevant multimodal items. Thus, the construction of attentional priority is the result of a sampling procedure conditioned on the potential value dynamics of socially relevant objects emerging moment to moment within the scene. Preliminary experiments on a publicly available dataset are presented.

2019 Relazione in Atti di Convegno

Going Deeper into Colorectal Cancer Histopathology

Authors: Ponzio, Francesco; Macii, Enrico; Ficarra, Elisa; Di Cataldo, Santa

Published in: COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE

The early diagnosis of colorectal cancer (CRC) traditionally leverages upon the microscopic examination of histological slides by experienced pathologists, which … (Read full abstract)

The early diagnosis of colorectal cancer (CRC) traditionally leverages upon the microscopic examination of histological slides by experienced pathologists, which is very time-consuming and rises many issues about the reliability of the results. In this paper we propose using Convolutional Neural Networks (CNNs), a class of deep networks that are successfully used in many contexts of pattern recognition, to automatically distinguish the cancerous tissues from either healthy or benign lesions. For this purpose, we designed and compared different CNN-based classification frameworks, involving either training CNNs from scratch on three classes of colorectal images, or transfer learning from a different classification problem. While a CNN trained from scratch obtained very good (about 90%) classification accuracy in our tests, the same CNN model pre-trained on the ImageNet dataset obtained even better accuracy (around 96%) on the same testing samples, requiring much lesser computational resources.

2019 Capitolo/Saggio

Hand Gestures for the Human-Car Interaction: the Briareo dataset

Authors: Manganaro, Fabio; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Natural User Interfaces can be an effective way to reduce driver's inattention during the driving activity. To this end, in … (Read full abstract)

Natural User Interfaces can be an effective way to reduce driver's inattention during the driving activity. To this end, in this paper we propose a new dataset, called Briareo, specifically collected for the hand gesture recognition task in the automotive context. The dataset is acquired from an innovative point of view, exploiting different kinds of cameras, i.e. RGB, infrared stereo, and depth, that provide various types of images and 3D hand joints. Moreover, the dataset contains a significant amount of hand gesture samples, performed by several subjects, allowing the use of deep learning-based approaches. Finally, a framework for hand gesture segmentation and classification is presented, exploiting a method introduced to assess the quality of the proposed dataset.

2019 Relazione in Atti di Convegno

How does Connected Components Labeling with Decision Trees perform on GPUs?

Authors: Allegretti, Stefano; Bolelli, Federico; Cancilla, Michele; Pollastri, Federico; Canalini, Laura; Grana, Costantino

Published in: LECTURE NOTES IN COMPUTER SCIENCE

In this paper the problem of Connected Components Labeling (CCL) in binary images using Graphic Processing Units (GPUs) is tackled … (Read full abstract)

In this paper the problem of Connected Components Labeling (CCL) in binary images using Graphic Processing Units (GPUs) is tackled by a different perspective. In the last decade, many novel algorithms have been released, specifically designed for GPUs. Because CCL literature concerning sequential algorithms is very rich, and includes many efficient solutions, designers of parallel algorithms were often inspired by techniques that had already proved successful in a sequential environment, such as the Union-Find paradigm for solving equivalences between provisional labels. However, the use of decision trees to minimize memory accesses, which is one of the main feature of the best performing sequential algorithms, was never taken into account when designing parallel CCL solutions. In fact, branches in the code tend to cause thread divergence, which usually leads to inefficiency. Anyway, this consideration does not necessarily apply to every possible scenario. Are we sure that the advantages of decision trees do not compensate for the cost of thread divergence? In order to answer this question, we chose three well-known sequential CCL algorithms, which employ decision trees as the cornerstone of their strategy, and we built a data-parallel version of each of them. Experimental tests on real case datasets show that, in most cases, these solutions outperform state-of-the-art algorithms, thus demonstrating the effectiveness of decision trees also in a parallel environment.

2019 Relazione in Atti di Convegno

Image-to-Image Translation to Unfold the Reality of Artworks: an Empirical Analysis

Authors: Tomei, Matteo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

State-of-the-art Computer Vision pipelines show poor performances on artworks and data coming from the artistic domain, thus limiting the applicability … (Read full abstract)

State-of-the-art Computer Vision pipelines show poor performances on artworks and data coming from the artistic domain, thus limiting the applicability of current architectures to the automatic understanding of the cultural heritage. This is mainly due to the difference in texture and low-level feature distribution between artistic and real images, on which state-of-the-art approaches are usually trained. To enhance the applicability of pre-trained architectures on artistic data, we have recently proposed an unpaired domain translation approach which can translate artworks to photo-realistic visualizations. Our approach leverages semantically-aware memory banks of real patches, which are used to drive the generation of the translated image while improving its realism. In this paper, we provide additional analyses and experimental results which demonstrate the effectiveness of our approach. In particular, we evaluate the quality of generated results in the case of the translation of landscapes, portraits and of paintings coming from four different styles using automatic distance metrics. Also, we analyze the response of pre-trained architecture for classification, detection and segmentation both in terms of feature distribution and entropy of prediction, and show that our approach effectively reduces the domain shift of paintings. As an additional contribution, we also provide a qualitative analysis of the reduction of the domain shift for detection, segmentation and image captioning.

2019 Relazione in Atti di Convegno

Improving the Performance of Thinning Algorithms with Directed Rooted Acyclic Graphs

Authors: Bolelli, Federico; Grana, Costantino

Published in: LECTURE NOTES IN COMPUTER SCIENCE

In this paper we propose a strategy to optimize the performance of thinning algorithms. This solution is obtained by combining … (Read full abstract)

In this paper we propose a strategy to optimize the performance of thinning algorithms. This solution is obtained by combining three proven strategies for binary images neighborhood exploration, namely modeling the problem with an optimal decision tree, reusing pixels from the previous step of the algorithm, and reducing the code footprint by means of Directed Rooted Acyclic Graphs. A complete and open-source benchmarking suite is also provided. Experimental results confirm that the proposed algorithms clearly outperform classical implementations.

2019 Relazione in Atti di Convegno

Latent Space Autoregression for Novelty Detection

Authors: Abati, Davide; Porrello, Angelo; Calderara, Simone; Cucchiara, Rita

Novelty detection is commonly referred to as the discrimination of observations that do not conform to a learned model of … (Read full abstract)

Novelty detection is commonly referred to as the discrimination of observations that do not conform to a learned model of regularity. Despite its importance in different application settings, designing a novelty detector is utterly complex due to the unpredictable nature of novelties and its inaccessibility during the training procedure, factors which expose the unsupervised nature of the problem. In our proposal, we design a general framework where we equip a deep autoencoder with a parametric density estimator that learns the probability distribution underlying its latent representations through an autoregressive procedure. We show that a maximum likelihood objective, optimized in conjunction with the reconstruction of normal samples, effectively acts as a regularizer for the task at hand, by minimizing the differential entropy of the distribution spanned by latent vectors. In addition to providing a very general formulation, extensive experiments of our model on publicly available datasets deliver on-par or superior performances if compared to state-of-the-art methods in one-class and video anomaly detection settings. Differently from prior works, our proposal does not make any assumption about the nature of the novelties, making our work readily applicable to diverse contexts.

2019 Relazione in Atti di Convegno

M-VAD Names: a Dataset for Video Captioning with Naming

Authors: Pini, Stefano; Cornia, Marcella; Bolelli, Federico; Baraldi, Lorenzo; Cucchiara, Rita

Published in: MULTIMEDIA TOOLS AND APPLICATIONS

Current movie captioning architectures are not capable of mentioning characters with their proper name, replacing them with a generic "someone" … (Read full abstract)

Current movie captioning architectures are not capable of mentioning characters with their proper name, replacing them with a generic "someone" tag. The lack of movie description datasets with characters' visual annotations surely plays a relevant role in this shortage. Recently, we proposed to extend the M-VAD dataset by introducing such information. In this paper, we present an improved version of the dataset, namely M-VAD Names, and its semi-automatic annotation procedure. The resulting dataset contains 63k visual tracks and 34k textual mentions, all associated with character identities. To showcase the features of the dataset and quantify the complexity of the naming task, we investigate multimodal architectures to replace the "someone" tags with proper character names in existing video captions. The evaluation is further extended by testing this application on videos outside of the M-VAD Names dataset.

2019 Articolo su rivista

Manual Annotations on Depth Maps for Human Pose Estimation

Authors: D'Eusanio, Andrea; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Few works tackle the Human Pose Estimation on depth maps. Moreover, these methods usually rely on automatically annotated datasets, and … (Read full abstract)

Few works tackle the Human Pose Estimation on depth maps. Moreover, these methods usually rely on automatically annotated datasets, and these annotations are often imprecise and unreliable, limiting the achievable accuracy using this data as ground truth. For this reason, in this paper we propose an annotation refinement tool of human poses, by means of body joints, and a novel set of fine joint annotations for the Watch-n-Patch dataset, which has been collected with the proposed tool. Furthermore, we present a fully-convolutional architecture that performs the body pose estimation directly on depth maps. The extensive evaluation shows that the proposed architecture outperforms the competitors in different training scenarios and is able to run in real-time.

2019 Relazione in Atti di Convegno

METODO DI VALUTAZIONE DI UNO STATO DI SALUTE DI UN ELEMENTO ANATOMICO, RELATIVO DISPOSITIVO DI VALUTAZIONE E RELATIVO SISTEMA DI VALUTAZIONE

Authors: Giuseppe, Marrucchella; Bergamini, Luca; Porrello, Angelo; Del Negro, Ercole; Capobianco Dondona, Andrea; Di Tondo, Francesco; Calderara, Simone

Sistema in grado di rilevare le lesioni delle mezzene al macello attraverso l'utilizzo di tecniche di deep learning per individuazioni … (Read full abstract)

Sistema in grado di rilevare le lesioni delle mezzene al macello attraverso l'utilizzo di tecniche di deep learning per individuazioni del tipo di lesioni presenti

2019 Brevetto

Page 48 of 109 • Total publications: 1084