Publications - AImageLab

Learn to See by Events: Color Frame Synthesis from Event and RGB Cameras

Authors: Pini, Stefano; Borghi, Guido; Vezzani, Roberto

Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output … (Read full abstract)

Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output a corresponding stream of asynchronous events. Despite having multiple advantages with respect to traditional cameras, their use is partially prevented by the limited applicability of traditional data processing and vision algorithms. To this aim, we present a framework which exploits the output stream of event cameras to synthesize RGB frames, relying on an initial or a periodic set of color key-frames and the sequence of intermediate events. Differently from existing work, we propose a deep learning-based frame synthesis method, consisting of an adversarial architecture combined with a recurrent module. Qualitative results and quantitative per-pixel, perceptual, and semantic evaluation on four public datasets confirm the quality of the synthesized images.

2020 Relazione in Atti di Convegno

DOI IRIS

Mercury: a vision-based framework for Driver Monitoring

Authors: Borghi, Guido; Pini, Stefano; Vezzani, Roberto; Cucchiara, Rita

In this paper, we propose a complete framework, namely Mercury, that combines Computer Vision and Deep Learning algorithms to continuously … (Read full abstract)

In this paper, we propose a complete framework, namely Mercury, that combines Computer Vision and Deep Learning algorithms to continuously monitor the driver during the driving activity. The proposed solution complies to the require-ments imposed by the challenging automotive context: the light invariance, in or-der to have a system able to work regardless of the time of day and the weather conditions. Therefore, infrared-based images, i.e. depth maps (in which each pixel corresponds to the distance between the sensor and that point in the scene), have been exploited in conjunction with traditional intensity images. Second, the non-invasivity of the system is required, since driver’s movements must not be impeded during the driving activity: in this context, the use of camer-as and vision-based algorithms is one of the best solutions. Finally, real-time per-formance is needed since a monitoring system must immediately react as soon as a situation of potential danger is detected.

2020 Relazione in Atti di Convegno

DOI IRIS

Meshed-Memory Transformer for Image Captioning

Authors: Cornia, Marcella; Stefanini, Matteo; Baraldi, Lorenzo; Cucchiara, Rita

Published in: PROCEEDINGS IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

Transformer-based architectures represent the state of the art in sequence modeling tasks like machine translation and language understanding. Their applicability … (Read full abstract)

Transformer-based architectures represent the state of the art in sequence modeling tasks like machine translation and language understanding. Their applicability to multi-modal contexts like image captioning, however, is still largely under-explored. With the aim of filling this gap, we present M² - a Meshed Transformer with Memory for Image Captioning. The architecture improves both the image encoding and the language generation steps: it learns a multi-level representation of the relationships between image regions integrating learned a priori knowledge, and uses a mesh-like connectivity at decoding stage to exploit low- and high-level features. Experimentally, we investigate the performance of the M² Transformer and different fully-attentive models in comparison with recurrent ones. When tested on COCO, our proposal achieves a new state of the art in single-model and ensemble configurations on the "Karpathy" test split and on the online test server. We also assess its performances when describing objects unseen in the training set. Trained models and code for reproducing the experiments are publicly available at :https://github.com/aimagelab/meshed-memory-transformer.

2020 Relazione in Atti di Convegno

DOI IRIS

Multi-omics Classification on Kidney Samples Exploiting Uncertainty-Aware Models

Authors: Lovino, Marta; Bontempo, Gianpaolo; Cirrincione, Giansalvo; Ficarra, Elisa

Due to the huge amount of available omic data, classifying samples according to various omics is a complex process. One … (Read full abstract)

Due to the huge amount of available omic data, classifying samples according to various omics is a complex process. One of the most common approaches consists of creating a classifier for each omic and subsequently making a consensus among the classifiers that assign to each sample the most voted class among the outputs on the individual omics. However, this approach does not consider the confidence in the prediction ignoring that biological information coming from a certain omic may be more reliable than others. Therefore, it is here proposed a method consisting of a tree-based multi-layer perceptron (MLP), which estimates the class-membership probabilities for classification. In this way, it is not only possible to give relevance to all the omics, but also to label as Unknown those samples for which the classifier is uncertain in its prediction. The method was applied to a dataset composed of 909 kidney cancer samples for which these three omics were available: gene expression (mRNA), microRNA expression (miRNA), and methylation profiles (meth) data. The method is valid also for other tissues and on other omics (e.g. proteomics, copy number alterations data, single nucleotide polymorphism data). The accuracy and weighted average f1-score of the model are both higher than 95%. This tool can therefore be particularly useful in clinical practice, allowing physicians to focus on the most interesting and challenging samples.

2020 Relazione in Atti di Convegno

DOI IRIS

Multimodal Hand Gesture Classification for the Human-Car Interaction

Authors: D’Eusanio, Andrea; Simoni, Alessandro; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Published in: INFORMATICS

2020 Articolo su rivista

DOI IRIS

On Gaze Deployment to Audio-Visual Cues of Social Interactions

Authors: Boccignone, G.; Cuculo, V.; D'Amelio, A.; Grossi, G.; Lanzarotti, R.

Published in: IEEE ACCESS

Attention supports our urge to forage on social cues. Under certain circumstances, we spend the majority of time scrutinising people, … (Read full abstract)

Attention supports our urge to forage on social cues. Under certain circumstances, we spend the majority of time scrutinising people, markedly their eyes and faces, and spotting persons that are talking. To account for such behaviour, this article develops a computational model for the deployment of gaze within a multimodal landscape, namely a conversational scene. Gaze dynamics is derived in a principled way by reformulating attention deployment as a stochastic foraging problem. Model simulation experiments on a publicly available dataset of eye-tracked subjects are presented. Results show that the simulated scan paths exhibit similar trends of eye movements of human observers watching and listening to conversational clips in a free-viewing condition

2020 Articolo su rivista

DOI IRIS

Online Continual Learning under Extreme Memory Constraints

Authors: Fini, Enrico; Lathuilière, Stéphane; Sangineto, Enver; Nabi, Moin; Ricci, Elisa

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2020 Relazione in Atti di Convegno

DOI IRIS

Optimized Block-Based Algorithms to Label Connected Components on GPUs

Authors: Allegretti, Stefano; Bolelli, Federico; Grana, Costantino

Published in: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Connected Components Labeling (CCL) is a crucial step of several image processing and computer vision pipelines. Many efficient sequential strategies … (Read full abstract)

Connected Components Labeling (CCL) is a crucial step of several image processing and computer vision pipelines. Many efficient sequential strategies exist, among which one of the most effective is the use of a block-based mask to drastically cut the number of memory accesses. In the last decade, aided by the fast development of Graphics Processing Units (GPUs), a lot of data parallel CCL algorithms have been proposed along with sequential ones. Applications that entirely run in GPU can benefit from parallel implementations of CCL that allow to avoid expensive memory transfers between host and device. In this paper, two new eight-connectivity CCL algorithms are proposed, namely Block-based Union Find (BUF) and Block-based Komura Equivalence (BKE). These algorithms optimize existing GPU solutions introducing a block-based approach. Extensions for three-dimensional datasets are also discussed. In order to produce a fair comparison with previously proposed alternatives, YACCLAB, a public CCL benchmarking framework, has been extended and made suitable for evaluating also GPU algorithms. Moreover, three-dimensional datasets have been added to its collection. Experimental results on real cases and synthetically generated datasets demonstrate the superiority of the new proposals with respect to state-of-the-art, both on 2D and 3D scenarios.

2020 Articolo su rivista

DOI IRIS

Ottimizzazione di Algoritmi per l’Elaborazione di Immagini Binarie

Authors: Bolelli, Federico

La procedura che rende un algoritmo più efficiente in termini di requisiti di memoria o tempo di esecuzione si chiama … (Read full abstract)

La procedura che rende un algoritmo più efficiente in termini di requisiti di memoria o tempo di esecuzione si chiama ottimizzazione e rappresenta un passaggio cruciale nell'elaborazione di immagini e video. È raro che il processo di ottimizzazione produca un algoritmo ottimo in senso assoluto, ma spesso occorre raggiungere un compromesso tra i requisiti di tempo e quelli di memoria. Ad ogni modo, esistono molti scenari in cui il tempo di esecuzione totale richiesto per completare un'attività è il vincolo più restrittivo. Gli algoritmi di elaborazione di immagini binarie, ad esempio, rappresentano un'operazione fondamentale nella maggior parte dei sistemi di analisi di immagini e video all'avanguardia, anche quando questi sono basati su tecniche di deep learning. Avere un'implementazione efficiente è quindi essenziale, specialmente quando questi sistemi devono essere impiegati in scenari con vincoli temporali, dove compromettere la qualità del risultato, o fare affidamento su hardware più performante, non è una strada percorribile. Questa tesi introduce ed esplora diversi approcci per l'ottimizzazione degli algoritmi di elaborazione di immagini binarie modellabili con tabelle decisionali. Esistono diversi problemi che possono essere definiti in questo modo: l’etichettatura delle componenti connesse, il thinning, il chain code e gli operatori morfologici sono alcuni di questi. In generale, tutti gli algoritmi in cui il valore di output per ciascun pixel dell'immagine è ottenuto dal valore del pixel stesso e di alcuni dei suoi vicini possono essere definiti utilizzando tabelle decisionali. Concentrandosi sull'etichettatura delle componenti connesse, vengono analizzati gli approcci all'avanguardia sia per ambienti sequenziali basati su CPU che per ambienti paralleli basati su CPU e GPU, focalizzandosi su come misurare in modo equo le prestazioni. Vengono quindi introdotti nuovi approcci per migliorare ulteriormente le prestazioni in termini di tempo totale di esecuzione, mostrando come queste tecniche possano essere generalizzate per migliorare qualsiasi algoritmo modellabile con tabelle decisionali. Infine, viene presentato un framework che consente di applicare automaticamente molte delle strategie di ottimizzazione precedentemente descritte ed analizzate ad un determinato algoritmo. Il framework, chiamato GRAPHGEN, prende come input una definizione del problema in termini di condizioni da verificare e azioni da eseguire ed è in grado di produrre come output il codice C/C++ che include tutte le ottimizzazioni necessarie. Rispetto agli approcci esistenti, gli algoritmi generati con GRAPHGEN hanno prestazioni significativamente migliori, sia su set di dati reali che su quelli sintetici.

2020 Tesi di dottorato

IRIS

Predicting the oncogenic potential of gene fusions using convolutional neural networks

Authors: Lovino, Marta; Urgese, Gianvito; Macii, Enrico; Santa Di Cataldo, ; Ficarra, Elisa

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Predicting the oncogenic potential of a gene fusion transcript is an important and challenging task in the study of cancer … (Read full abstract)

Predicting the oncogenic potential of a gene fusion transcript is an important and challenging task in the study of cancer development. To this date, the available approaches mostly rely on protein domain analysis to provide a probability score explaining the oncogenic potential of a gene fusion. In this paper, a Convolutional Neural Network model is proposed to discriminate gene fusions into oncogenic or non-oncogenic, exploiting only the protein sequence without protein domain information. Our proposed model obtained accuracy value close to 90% on a dataset of fused sequences.

2020 Relazione in Atti di Convegno

DOI IRIS