Publications - AImageLab

Self-Supervised Optical Flow Estimation by Projective Bootstrap

Authors: Alletto, Stefano; Abati, Davide; Calderara, Simone; Cucchiara, Rita; Rigazio, Luca

Published in: IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Dense optical flow estimation is complex and time consuming, with state-of-the-art methods relying either on large synthetic data sets or … (Read full abstract)

Dense optical flow estimation is complex and time consuming, with state-of-the-art methods relying either on large synthetic data sets or on pipelines requiring up to a few minutes per frame pair. In this paper, we address the problem of optical flow estimation in the automotive scenario in a self-supervised manner. We argue that optical flow can be cast as a geometrical warping between two successive video frames and devise a deep architecture to estimate such transformation in two stages. First, a dense pixel-level flow is computed with a projective bootstrap on rigid surfaces. We show how such global transformation can be approximated with a homography and extend spatial transformer layers so that they can be employed to compute the flow field implied by such transformation. Subsequently, we refine the prediction by feeding a second, deeper network that accounts for moving objects. A final reconstruction loss compares the warping of frame Xₜ with the subsequent frame Xₜ₊₁ and guides both estimates. The model has the speed advantages of end-to-end deep architectures while achieving competitive performances, both outperforming recent unsupervised methods and showing good generalization capabilities on new automotive data sets.

2019 Articolo su rivista

DOI IRIS

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Authors: Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

Current captioning approaches can describe images using black-box architectures whose behavior is hardly controllable and explainable from the exterior. As … (Read full abstract)

Current captioning approaches can describe images using black-box architectures whose behavior is hardly controllable and explainable from the exterior. As an image can be described in infinite ways depending on the goal and the context at hand, a higher degree of controllability is needed to apply captioning algorithms in complex scenarios. In this paper, we introduce a novel framework for image captioning which can generate diverse descriptions by allowing both grounding and controllability. Given a control signal in the form of a sequence or set of image regions, we generate the corresponding caption through a recurrent architecture which predicts textual chunks explicitly grounded on regions, following the constraints of the given control. Experiments are conducted on Flickr30k Entities and on COCO Entities, an extended version of COCO in which we add grounding annotations collected in a semi-automatic manner. Results demonstrate that our method achieves state of the art performances on controllable image captioning, in terms of caption quality and diversity. Code and annotations are publicly available at: https://github.com/aimagelab/show-control-and-tell.

2019 Relazione in Atti di Convegno

DOI IRIS

SHREC 2019 Track: Online Gesture Recognition

Authors: Caputo, F. M.; Burato, S.; Pavan, G.; Voillemin, T.; Wannous, H.; Vandeborre, J. P.; Maghoumi, M.; Taranta, E. M.; Razmjoo, A.; J. J. Laviola Jr., ; Manganaro, Fabio; Pini, S.; Borghi, G.; Vezzani, R.; Cucchiara, R.; Nguyen, H.; Tran, M. T.; Giachetti, A.

This paper presents the results of the Eurographics 2019 SHape Retrieval Contest track on online gesture recognition. The goal of … (Read full abstract)

This paper presents the results of the Eurographics 2019 SHape Retrieval Contest track on online gesture recognition. The goal of this contest was to test state-of-the-art methods that can be used to online detect command gestures from hands' movements tracking on a basic benchmark where simple gestures are performed interleaving them with other actions. Unlike previous contests and benchmarks on trajectory-based gesture recognition, we proposed an online gesture recognition task, not providing pre-segmented gestures, but asking the participants to find gestures within recorded trajectories. The results submitted by the participants show that an online detection and recognition of sets of very simple gestures from 3D trajectories captured with a cheap sensor can be effectively performed. The best methods proposed could be, therefore, directly exploited to design effective gesture-based interfaces to be used in different contexts, from Virtual and Mixed reality applications to the remote control of home devices.

2019 Relazione in Atti di Convegno

DOI IRIS

Single-cell DNA Sequencing Data: a Pipeline for Multi-Sample Analysis

Authors: Marilisa, Montemurro; Grassi, Elena; Urgese, Gianvito; Emanuele, Parisi; Gabriele Pizzino, Carmelo; Bertotti, Andrea; Ficarra, Elisa

Nowadays, single-cell DNA (sc-DNA) sequencing is showing up to be a valuable instrument to investigate intra and inter-tumor heterogeneity and … (Read full abstract)

Nowadays, single-cell DNA (sc-DNA) sequencing is showing up to be a valuable instrument to investigate intra and inter-tumor heterogeneity and infer its evolutionary dynamics, by using the high-resolution data it produces. That is why the demand for analytical tools to manage this kind of data is increasing. Here we propose a pipeline capable of producing multi-sample copy-number variation (CNV) analysis on large-scale single-cell DNA sequencing data and investigate spatial and temporal tumor heterogeneity.

2019 Relazione in Atti di Convegno

IRIS

Single-cell DNA Sequencing Data: a Pipeline for Multi-Sample Analysis

Authors: Montemurro, Marilisa; Grassi, Elena; Urgese, Gianvito; Gabriele Pizzino, Carmelo; Bertotti, Andrea; Ficarra, Elisa

In order to help cancer researchers in understanding tumor heterogeneity and its evolutionary dynamics, we propose a software pipeline to … (Read full abstract)

In order to help cancer researchers in understanding tumor heterogeneity and its evolutionary dynamics, we propose a software pipeline to explore intra-tumor heterogeneity by means of scDNA sequencing data.

2019 Abstract in Atti di Convegno

IRIS

Skin Lesion Segmentation Ensemble with Diverse Training Strategies

Authors: Canalini, Laura; Pollastri, Federico; Bolelli, Federico; Cancilla, Michele; Allegretti, Stefano; Grana, Costantino

Published in: LECTURE NOTES IN COMPUTER SCIENCE

This paper presents a novel strategy to perform skin lesion segmentation from dermoscopic images. We design an effective segmentation pipeline, … (Read full abstract)

This paper presents a novel strategy to perform skin lesion segmentation from dermoscopic images. We design an effective segmentation pipeline, and explore several pre-training methods to initialize the features extractor, highlighting how different procedures lead the Convolutional Neural Network (CNN) to focus on different features. An encoder-decoder segmentation CNN is employed to take advantage of each pre-trained features extractor. Experimental results reveal how multiple initialization strategies can be exploited, by means of an ensemble method, to obtain state-of-the-art skin lesion segmentation accuracy.

2019 Relazione in Atti di Convegno

DOI IRIS

Social traits from stochastic paths in the core affect space

Authors: Boccignone, Giuseppe; Cuculo, Vittorio; D'Amelio, Alessandro; Lanzarotti, Raffaella

We discuss a preliminary investigation on the feasibility of inferring traits of social participation from the observable behaviour of individuals … (Read full abstract)

We discuss a preliminary investigation on the feasibility of inferring traits of social participation from the observable behaviour of individuals involved in dyadic interactions. Trait inference relies on a stochastic model of the dynamics occurring in the individual core affect state-space. Results obtained on a publicly available interaction dataset are presented and examined.

2019 Relazione in Atti di Convegno

DOI IRIS

Spotting Insects from Satellites: Modeling the Presence of Culicoides Imicola Through Deep CNNs

Authors: Vincenzi, Stefano; Porrello, Angelo; Buzzega, Pietro; Conte, Annamaria; Ippoliti, Carla; Candeloro, Luca; Di Lorenzo, Alessio; Capobianco Dondona, Andrea; Calderara, Simone

Nowadays, Vector-Borne Diseases (VBDs) raise a severe threat for public health, accounting for a considerable amount of human illnesses. Recently, … (Read full abstract)

Nowadays, Vector-Borne Diseases (VBDs) raise a severe threat for public health, accounting for a considerable amount of human illnesses. Recently, several surveillance plans have been put in place for limiting the spread of such diseases, typically involving on-field measurements. Such a systematic and effective plan still misses, due to the high costs and efforts required for implementing it. Ideally, any attempt in this field should consider the triangle vectors-host-pathogen, which is strictly linked to the environmental and climatic conditions. In this paper, we exploit satellite imagery from Sentinel-2 mission, as we believe they encode the environmental factors responsible for the vector's spread. Our analysis - conducted in a data-driver fashion - couples spectral images with ground-truth information on the abundance of Culicoides imicola. In this respect, we frame our task as a binary classification problem, underpinning Convolutional Neural Networks (CNNs) as being able to learn useful representation from multi-band images. Additionally, we provide a multi-instance variant, aimed at extracting temporal patterns from a short sequence of spectral images. Experiments show promising results, providing the foundations for novel supportive tools, which could depict where surveillance and prevention measures could be prioritized.

2019 Relazione in Atti di Convegno

DOI IRIS

Towards Cycle-Consistent Models for Text and Image Retrieval

Authors: Cornia, Marcella; Baraldi, Lorenzo; Rezazadegan Tavakoli, Hamed; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Cross-modal retrieval has been recently becoming an hot-spot research, thanks to the development of deeply-learnable architectures. Such architectures generally learn … (Read full abstract)

Cross-modal retrieval has been recently becoming an hot-spot research, thanks to the development of deeply-learnable architectures. Such architectures generally learn a joint multi-modal embedding space in which text and images could be projected and compared. Here we investigate a different approach, and reformulate the problem of cross-modal retrieval as that of learning a translation between the textual and visual domain. In particular, we propose an end-to-end trainable model which can translate text into image features and vice versa, and regularizes this mapping with a cycle-consistency criterion. Preliminary experimental evaluations show promising results with respect to ordinary visual-semantic models.

2019 Relazione in Atti di Convegno

DOI IRIS

Training adversarial discriminators for cross-channel abnormal event detection in crowds

Authors: Ravanbakhsh, M.; Sangineto, E.; Nabi, M.; Sebe, N.

Abnormal crowd behaviour detection attracts a large interest due to its importance in video surveillance scenarios.However, the ambiguity and the … (Read full abstract)

Abnormal crowd behaviour detection attracts a large interest due to its importance in video surveillance scenarios.However, the ambiguity and the lack of sufficient abnormal ground truth data makes end-to-end training of large deep networks hard in this domain. In this paper we propose to use Generative Adversarial Nets (GANs), which are trained to generate only the normal distribution of the data. During the adversarial GAN training, a discriminator (D) is used as a supervisor for the generator network(G) and vice versa. At testing time we use D to solve our discriminative task (abnormality detection), where D has been trained without the need of manually-annotated abnormal data. Moreover, in order to prevent G learn a trivial identity function, we use a cross-channel approach, forcing G to transform raw-pixel data in motion information and vice versa. The quantitative results on standard benchmarks show that our method outperforms previous state-of-the-art methods in both the frame-level and the pixel-level evaluation.

2019 Relazione in Atti di Convegno

DOI IRIS