Publications - AImageLab

DEEPrior: a deep learning tool for the prioritization of gene fusions

Authors: Lovino, Marta; Ciaburri, Maria Serena; Urgese, Gianvito; Di Cataldo, Santa; Ficarra, Elisa

Published in: BIOINFORMATICS

Summary: In the last decade, increasing attention has been paid to the study of gene fusions. However, the problem of … (Read full abstract)

Summary: In the last decade, increasing attention has been paid to the study of gene fusions. However, the problem of determining whether a gene fusion is a cancer driver or just a passenger mutation is still an open issue. Here we present DEEPrior, an inherently flexible deep learning tool with two modes (Inference and Retraining). Inference mode predicts the probability of a gene fusion being involved in an oncogenic process, by directly exploiting the amino acid sequence of the fused protein. Retraining mode allows to obtain a custom prediction model including new data provided by the user. Availability and implementation: Both DEEPrior and the protein fusions dataset are freely available from GitHub at (https://github.com/bioinformatics-polito/DEEPrior). The tool was designed to operate in Python 3.7, with minimal additional libraries. Supplementary information: Supplementary data are available at Bioinformatics online.

2020 Articolo su rivista

DOI IRIS

Multi-omics Classification on Kidney Samples Exploiting Uncertainty-Aware Models

Authors: Lovino, Marta; Bontempo, Gianpaolo; Cirrincione, Giansalvo; Ficarra, Elisa

Due to the huge amount of available omic data, classifying samples according to various omics is a complex process. One … (Read full abstract)

Due to the huge amount of available omic data, classifying samples according to various omics is a complex process. One of the most common approaches consists of creating a classifier for each omic and subsequently making a consensus among the classifiers that assign to each sample the most voted class among the outputs on the individual omics. However, this approach does not consider the confidence in the prediction ignoring that biological information coming from a certain omic may be more reliable than others. Therefore, it is here proposed a method consisting of a tree-based multi-layer perceptron (MLP), which estimates the class-membership probabilities for classification. In this way, it is not only possible to give relevance to all the omics, but also to label as Unknown those samples for which the classifier is uncertain in its prediction. The method was applied to a dataset composed of 909 kidney cancer samples for which these three omics were available: gene expression (mRNA), microRNA expression (miRNA), and methylation profiles (meth) data. The method is valid also for other tissues and on other omics (e.g. proteomics, copy number alterations data, single nucleotide polymorphism data). The accuracy and weighted average f1-score of the model are both higher than 95%. This tool can therefore be particularly useful in clinical practice, allowing physicians to focus on the most interesting and challenging samples.

2020 Relazione in Atti di Convegno

DOI IRIS

Predicting the oncogenic potential of gene fusions using convolutional neural networks

Authors: Lovino, Marta; Urgese, Gianvito; Macii, Enrico; Santa Di Cataldo, ; Ficarra, Elisa

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Predicting the oncogenic potential of a gene fusion transcript is an important and challenging task in the study of cancer … (Read full abstract)

Predicting the oncogenic potential of a gene fusion transcript is an important and challenging task in the study of cancer development. To this date, the available approaches mostly rely on protein domain analysis to provide a probability score explaining the oncogenic potential of a gene fusion. In this paper, a Convolutional Neural Network model is proposed to discriminate gene fusions into oncogenic or non-oncogenic, exploiting only the protein sequence without protein domain information. Our proposed model obtained accuracy value close to 90% on a dataset of fused sequences.

2020 Relazione in Atti di Convegno

DOI IRIS

Unsupervised Multi-Omic Data Fusion: the Neural Graph Learning Network

Authors: Barbiero, Pietro; Lovino, Marta; Siviero, Mattia; Ciravegna, Gabriele; Randazzo, Vincenzo; Ficarra, Elisa; Cirrincione, Giansalvo

Published in: LECTURE NOTES IN COMPUTER SCIENCE - 16th International Conference on Intelligent Computing, ICIC2020

In recent years, due to the high availability of omic data, data-driven biology has greatly expanded. However, the analysis of … (Read full abstract)

In recent years, due to the high availability of omic data, data-driven biology has greatly expanded. However, the analysis of different data sources is still an open challenge. A few multi-omics approaches have been proposed in the literature, none of which takes into consideration the intrinsic topology of each omic, though. In this work, an unsupervised learning method based on a deep neural network is proposed. Foreach omic, a separate network is trained, whose outputs are fused into a single graph; at this purpose, an innovative loss function has been designed to better represent the data cluster manifolds. The graph adjacency matrix is exploited to determine similarities among samples. With this approach, omics having a different number of features are merged into a unique representation. Quantitative and qualitative analyses show that the proposed method has comparable results to the state of the art. The method has great intrinsic flexibility as it can be customized according to the complexity of the tasks and it has a lot of room for future improvements compared to more fine-tuned methods, opening the way for future research.

2020 Relazione in Atti di Convegno

DOI IRIS

A Deep Learning Approach to the Screening of Oncogenic Gene Fusions in Humans

Authors: Lovino, Marta; Urgese, Gianvito; Macii, Enrico; Di Cataldo, Santa; Ficarra, Elisa

Published in: INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES

Gene fusions have a very important role in the study of cancer development. In this regard, predicting the probability of … (Read full abstract)

Gene fusions have a very important role in the study of cancer development. In this regard, predicting the probability of protein fusion transcripts of developing into a cancer is a very challenging and yet not fully explored research problem. To this date, all the available approaches in literature try to explain the oncogenic potential of gene fusions based on protein domain analysis, that is cancer-specific and not easy to adapt to newly developed information. In our work, we choose the raw protein sequences as the input baseline, and propose the use of deep learning, and more specifically Convolutional Neural Networks, to infer the oncogenity probability score of gene fusion transcripts and to group them into a number of categories (e.g., oncogenic/not oncogenic). This is an inherently flexible methodology that, unlike previous approaches, can be re-trained with very less efforts on newly available data (for example, from a different cancer). Based on experimental results on a large dataset of pre-annotated gene fusions, our method is able to predict the oncogenity potential of gene fusion transcripts with accuracy of about 72%, which increases to 86% if we consider the only instances that are classified with a high confidence level.

2019 Articolo su rivista

DOI IRIS

Exploiting Gene Expression Profiles for the Automated Prediction of Connectivity between Brain Regions

Authors: Roberti, Ilaria; Lovino, Marta; Di Cataldo, Santa; Ficarra, Elisa; Urgese, Gianvito

Published in: INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES

The brain comprises a complex system of neurons interconnected by an intricate network of anatomical links. While recent studies demonstrated … (Read full abstract)

The brain comprises a complex system of neurons interconnected by an intricate network of anatomical links. While recent studies demonstrated the correlation between anatomical connectivity patterns and gene expression of neurons, using transcriptomic information to automatically predict such patterns is still an open challenge. In this work, we present a completely data-driven approach relying on machine learning (i.e., neural networks) to learn the anatomical connection directly from a training set of gene expression data. To do so, we combined gene expression and connectivity data from the Allen Mouse Brain Atlas to generate thousands of gene expression profile pairs from different brain regions. To each pair, we assigned a label describing the physical connection between the corresponding brain regions. Then, we exploited these data to train neural networks, designed to predict brain area connectivity. We assessed our solution on two prediction problems (with three and two connectivity class categories) involving cortical and cerebellum regions. As demonstrated by our results, we distinguish between connected and unconnected regions with 85% prediction accuracy and good balance of precision and recall. In our future work we may extend the analysis to more complex brain structures and consider RNA-Seq data as additional input to our model.

2019 Articolo su rivista

DOI IRIS

Publications by Marta Lovino

DEEPrior: a deep learning tool for the prioritization of gene fusions

Multi-omics Classification on Kidney Samples Exploiting Uncertainty-Aware Models

Predicting the oncogenic potential of gene fusions using convolutional neural networks

Unsupervised Multi-Omic Data Fusion: the Neural Graph Learning Network

A Deep Learning Approach to the Screening of Oncogenic Gene Fusions in Humans

Exploiting Gene Expression Profiles for the Automated Prediction of Connectivity between Brain Regions