Publications - AImageLab

DEEPrior: a deep learning tool for the prioritization of gene fusions

Authors: Lovino, Marta; Ciaburri, Maria Serena; Urgese, Gianvito; Di Cataldo, Santa; Ficarra, Elisa

Published in: BIOINFORMATICS

Summary: In the last decade, increasing attention has been paid to the study of gene fusions. However, the problem of … (Read full abstract)

Summary: In the last decade, increasing attention has been paid to the study of gene fusions. However, the problem of determining whether a gene fusion is a cancer driver or just a passenger mutation is still an open issue. Here we present DEEPrior, an inherently flexible deep learning tool with two modes (Inference and Retraining). Inference mode predicts the probability of a gene fusion being involved in an oncogenic process, by directly exploiting the amino acid sequence of the fused protein. Retraining mode allows to obtain a custom prediction model including new data provided by the user. Availability and implementation: Both DEEPrior and the protein fusions dataset are freely available from GitHub at (https://github.com/bioinformatics-polito/DEEPrior). The tool was designed to operate in Python 3.7, with minimal additional libraries. Supplementary information: Supplementary data are available at Bioinformatics online.

2020 Articolo su rivista

DOI IRIS

Dual In-painting Model for Unsupervised Gaze Correction and Animation in the Wild

Authors: Zhang, Jichao; Chen, Jingjing; Tang, Hao; Wang, Wei; Yan, Yan; Sangineto, Enver; Sebe, Nicu

We address the problem of unsupervised gaze correction in the wild, presenting a solution that works without the need of … (Read full abstract)

We address the problem of unsupervised gaze correction in the wild, presenting a solution that works without the need of precise annotations of the gaze angle and the head pose. We created a new dataset called CelebAGaze consisting of two domains X, Y, where the eyes are either staring at the camera or somewhere else. Our method consists of three novel modules: the Gaze Correction module(GCM), the Gaze Animation module(GAM), and the Pretrained Autoencoder module (PAM). Specifically, GCM and GAM separately train a dual in-painting network using data from the domain X for gaze correction and data from the domain Y for gaze animation. Additionally, a Synthesis-As-Training method is proposed when training GAM to encourage the features encoded from the eye region to be correlated with the angle information, resulting in gaze animation achieved by interpolation in the latent space. To further preserve the identity information e.g., eye shape, iris color, we propose the PAM with an Autoencoder, which is based on Self-Supervised mirror learning where the bottleneck features are angle-invariant and which works as an extra input to the dual in-painting models. Extensive experiments validate the effectiveness of the proposed method for gaze correction and gaze animation in the wild and demonstrate the superiority of our approach in producing more compelling results than state-of-the-art baselines. Our code, the pretrained models and supplementary results are available at:https://github.com/zhangqianhui/GazeAnimation.

2020 Relazione in Atti di Convegno

DOI IRIS

Effective evaluation of clustering algorithms on single-cell CNA data

Authors: Montemurro, Marilisa; Urgese, Gianvito; Grassi, Elena; Pizzino, Carmelo Gabriele; Bertotti, Andrea; Ficarra, Elisa

Clustering methods are increasingly applied to single-cell DNA sequencing (scDNAseq) data to infer the subclonal structure of cancer. However, the … (Read full abstract)

Clustering methods are increasingly applied to single-cell DNA sequencing (scDNAseq) data to infer the subclonal structure of cancer. However, the complexity of these data exacerbates some data-science issues and affects clustering results. Additionally, determining whether such inferences are accurate and clusters recapitulate the real cell phylogeny is not trivial, mainly because ground truth information is not available for most experimental settings. Here, by exploiting simulated sequencing data representing known phylogenies of cancer cells, we propose a formal and systematic assessment of well-known clustering methods to study their performance and identify the approach providing the most accurate reconstruction of phylogenetic relationships.

2020 Relazione in Atti di Convegno

DOI IRIS

Evaluation of the Classification Accuracy of the Kidney Biopsy Direct Immunofluorescence through Convolutional Neural Networks

Authors: Ligabue, Giulia; Pollastri, Federico; Fontana, Francesco; Leonelli, Marco; Furci, Luciana; Giovanella, Silvia; Alfano, Gaetano; Cappelli, Gianni; Testa, Francesca; Bolelli, Federico; Grana, Costantino; Magistroni, Riccardo

Published in: CLINICAL JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY

Background and objectives: Immunohistopathology is an essential technique in the diagnostic workflow of a kidney biopsy. Deep learning is an … (Read full abstract)

Background and objectives: Immunohistopathology is an essential technique in the diagnostic workflow of a kidney biopsy. Deep learning is an effective tool in the elaboration of medical imaging. We wanted to evaluate the role of a convolutional neural network as a support tool for kidney immunofluorescence reporting. Design, setting, participants, & measurements: High-magnification (×400) immunofluorescence images of kidney biopsies performed from the year 2001 to 2018 were collected. The report, adopted at the Division of Nephrology of the AOU Policlinico di Modena, describes the specimen in terms of “appearance,” “distribution,” “location,” and “intensity” of the glomerular deposits identified with fluorescent antibodies against IgG, IgA, IgM, C1q and C3 complement fractions, fibrinogen, and κ- and λ-light chains. The report was used as ground truth for the training of the convolutional neural networks. Results: In total, 12,259 immunofluorescence images of 2542 subjects undergoing kidney biopsy were collected. The test set analysis showed accuracy values between 0.79 (“irregular capillary wall” feature) and 0.94 (“fine granular” feature). The agreement test of the results obtained by the convolutional neural networks with respect to the ground truth showed similar values to three pathologists of our center. Convolutional neural networks were 117 times faster than human evaluators in analyzing 180 test images. A web platform, where it is possible to upload digitized images of immunofluorescence specimens, is available to evaluate the potential of our approach. Conclusions: The data showed that the accuracy of convolutional neural networks is comparable with that of pathologists experienced in the field.

2020 Articolo su rivista

DOI IRIS

Explaining Digital Humanities by Aligning Images and Textual Descriptions

Authors: Cornia, Marcella; Stefanini, Matteo; Baraldi, Lorenzo; Corsini, Massimiliano; Cucchiara, Rita

Published in: PATTERN RECOGNITION LETTERS

Replicating the human ability to connect Vision and Language has recently been gaining a lot of attention in the Computer … (Read full abstract)

Replicating the human ability to connect Vision and Language has recently been gaining a lot of attention in the Computer Vision and the Natural Language Processing communities. This research effort has resulted in algorithms that can retrieve images from textual descriptions and vice versa, when realistic images and sentences with simple semantics are employed and when paired training data is provided. In this paper, we go beyond these limitations and tackle the design of visual-semantic algorithms in the domain of the Digital Humanities. This setting not only advertises more complex visual and semantic structures but also features a significant lack of training data which makes the use of fully-supervised approaches infeasible. With this aim, we propose a joint visual-semantic embedding that can automatically align illustrations and textual elements without paired supervision. This is achieved by transferring the knowledge learned on ordinary visual-semantic datasets to the artistic domain. Experiments, performed on two datasets specifically designed for this domain, validate the proposed strategies and quantify the domain shift between natural images and artworks.

2020 Articolo su rivista

DOI IRIS

Exploiting "uncertain" deep networks for data cleaning in digital pathology

Authors: Ponzio, Francesco; Deodato, Giacomo; Macii, Enrico; Di Cataldo, Santa; Ficarra, Elisa

Published in: PROCEEDINGS INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING

2020 Relazione in Atti di Convegno

DOI IRIS

Face-from-Depth for Head Pose Estimation on Depth Images

Authors: Borghi, Guido; Fabbri, Matteo; Vezzani, Roberto; Calderara, Simone; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Depth cameras allow to set up reliable solutions for people monitoring and behavior understanding, especially when unstable or poor illumination … (Read full abstract)

Depth cameras allow to set up reliable solutions for people monitoring and behavior understanding, especially when unstable or poor illumination conditions make unusable common RGB sensors. Therefore, we propose a complete framework for the estimation of the head and shoulder pose based on depth images only. A head detection and localization module is also included, in order to develop a complete end-to-end system. The core element of the framework is a Convolutional Neural Network, called POSEidon+, that receives as input three types of images and provides the 3D angles of the pose as output. Moreover, a Face-from-Depth component based on a Deterministic Conditional GAN model is able to hallucinate a face from the corresponding depth image. We empirically demonstrate that this positively impacts the system performances. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Experimental results show that our method overcomes several recent state-of-art works based on both intensity and depth input data, running in real-time at more than 30 frames per second.

2020 Articolo su rivista

DOI IRIS

Gender recognition in the wild with small sample size : A dictionary learning approach

Authors: D'Amelio, A.; Cuculo, V.; Bursic, S.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

In this work we address the problem of gender recognition from facial images acquired in the wild. This problem is … (Read full abstract)

In this work we address the problem of gender recognition from facial images acquired in the wild. This problem is particularly difficult due to the presence of variations in pose, ethnicity, age and image quality. Moreover, we consider the special case in which only a small sample size is available for the training phase. We rely on a feature representation obtained from the well known VGG-Face Deep Convolutional Neural Network (DCNN) and exploit the effectiveness of a sparse-driven sub-dictionary learning strategy which has proven to be able to represent both local and global characteristics of the train and probe faces. Results on the publicly available LFW dataset are provided in order to demonstrate the effectiveness of the proposed method.

2020 Relazione in Atti di Convegno

DOI IRIS

How to look next? A data-driven approach for scanpath prediction

Authors: Boccignone, G.; Cuculo, V.; D'Amelio, A.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

By and large, current visual attention models mostly rely, when considering static stimuli, on the following procedure. Given an image, … (Read full abstract)

By and large, current visual attention models mostly rely, when considering static stimuli, on the following procedure. Given an image, a saliency map is computed, which, in turn, might serve the purpose of predicting a sequence of gaze shifts, namely a scanpath instantiating the dynamics of visual attention deployment. The temporal pattern of attention unfolding is thus confined to the scanpath generation stage, whilst salience is conceived as a static map, at best conflating a number of factors (bottom-up information, top-down, spatial biases, etc.). In this note we propose a novel sequential scheme that consists of a three-stage processing relying on a center-bias model, a context/layout model, and an object-based model, respectively. Each stage contributes, at different times, to the sequential sampling of the final scanpath. We compare the method against classic scanpath generation that exploits state-of-the-art static saliency model. Results show that accounting for the structure of the temporal unfolding leads to gaze dynamics close to human gaze behaviour.

2020 Relazione in Atti di Convegno

DOI IRIS

Imparare a descrivere gli oggetti salienti presenti nelle immagini tramite la visione e il linguaggio

Authors: Cornia, Marcella

Replicare l’abilità degli esseri umani di connettere la visione e il linguaggio ha recentemente ottenuto molta attenzione nella visione e … (Read full abstract)

Replicare l’abilità degli esseri umani di connettere la visione e il linguaggio ha recentemente ottenuto molta attenzione nella visione e intelligenza artificiale, risultando in nuovi modelli e architetture capaci di descrivere le immagini in modo automatico attraverso delle frasi testuali. Questa attività, chiamata “image captioning”, non solo richiede di riconoscere gli oggetti salienti in un’immagine e di comprendere le loro interazioni, ma anche di poterli esprimere attraverso il linguaggio naturale. In questa tesi, vengono presentate soluzioni stato dell’arte per questi problemi affrontando tutti gli aspetti coinvolti nella generazione di descrizioni testuali. Infatti, quando gli esseri umani descrivono una scena, osservano un oggetto prima di nominarlo all’interno della frase. Questo avviene grazie a dei meccanismi selettivi che attirano lo sguardo degli esseri umani sulle parti salienti e rilevanti della scena. Motivati dall’importanza di stimare in maniera automatica il focus dell’attenzione degli esseri umani su immagini, la prima parte di questa dissertazione introduce due differenti modelli di predizione della salienza basati su reti neurali. Nel primo modello, viene utilizzata una combinazione di caratteristiche visuali estratte a differenti livelli di una rete neurale convolutiva per stimare la salienza di un’immagine. Nel secondo modello, invece, viene utilizzata un’architettura ricorrente insieme a meccanismi neurali attentivi che si focalizzano sulle regioni più salienti dell’immagine in modo da rifinire iterativamente la mappa di salienza predetta. Nonostante la predizione della salienza identifichi le regioni più rilevanti di un’immagine, non è mai stata incorporata in un’architettura di descrizione automatica in linguaggio naturale. In questa tesi, viene quindi anche mostrato come incorporare la predizione della salienza per migliorare la qualità delle descrizioni di immagini e viene introdotto un modello che considera sia le regioni salienti che il contesto dell’immagine durante la generazione della descrizione testuale. Inspirati dalla recente diffusione di modelli completamente attentivi, viene inoltre investigato l’uso del modello Transformer nel contesto della generazione automatica di descrizioni di immagini e viene proposta una nuova architettura nella quale vengono completamente abbandonate le reti ricorrenti precedentemente usate in questo contesto. Gli approcci classici di descrizione automatica non forniscono alcun controllo su quali regioni dell’immagine vengono descritte e quale importanza è data a ciascuna di esse. Questa mancanza di controllabilità limita l’applicabilità degli algoritmi di descrizione automatica a scenari complessi in cui è necessaria una qualche forma di controllo sul processo di generazione. Per affrontare questi problemi, viene presentato un modello in grado di generare descrizioni in linguaggio naturale diversificate sulla base di un segnale di controllo dato nella forma di un insieme di regioni dell’immagine che devono essere descritte. Su una linea differente, viene anche esplorata la possibilità di nominare con il proprio nome i personaggi presenti nei film, necessitando anche in questo caso di un certo grado di controllabilità sul modello di descrizione automatica. Nell’ultima parte della tesi, vengono presentate soluzioni di “cross-modal retrieval”, un’altra attività che combina visione e linguaggio e che consiste nel trovare le immagini corrispondenti ad una query testuale e viceversa. Infine, viene mostrata l’applicazione di queste tecniche di retrieval nel contesto dei beni culturali e delle digital humanities, ottenendo risultati promettenti sia con modelli supervisionati che non supervisionati.

2020 Tesi di dottorato

IRIS