Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Predicting gene and protein expression levels from DNA and protein sequences with Perceiver

Authors: Stefanini, Matteo; Lovino, Marta; Cucchiara, Rita; Ficarra, Elisa

Published in: COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE

Background and objective: The functions of an organism and its biological processes result from the expression of genes and proteins. … (Read full abstract)

Background and objective: The functions of an organism and its biological processes result from the expression of genes and proteins. Therefore quantifying and predicting mRNA and protein levels is a crucial aspect of scientific research. Concerning the prediction of mRNA levels, the available approaches use the sequence upstream and downstream of the Transcription Start Site (TSS) as input to neural networks. The State-of-the-art models (e.g., Xpresso and Basenjii) predict mRNA levels exploiting Convolutional (CNN) or Long Short Term Memory (LSTM) Networks. However, CNN prediction depends on convolutional kernel size, and LSTM suffers from capturing long-range dependencies in the sequence. Concerning the prediction of protein levels, as far as we know, there is no model for predicting protein levels by exploiting the gene or protein sequences. Methods: Here, we exploit a new model type (called Perceiver) for mRNA and protein level prediction, exploiting a Transformer-based architecture with an attention module to attend to long-range interactions in the sequences. In addition, the Perceiver model overcomes the quadratic complexity of the standard Transformer architectures. This work's contributions are 1. DNAPerceiver model to predict mRNA levels from the sequence upstream and downstream of the TSS; 2. ProteinPerceiver model to predict protein levels from the protein sequence; 3. Protein&DNAPerceiver model to predict protein levels from TSS and protein sequences. Results: The models are evaluated on cell lines, mice, glioblastoma, and lung cancer tissues. The results show the effectiveness of the Perceiver-type models in predicting mRNA and protein levels. Conclusions: This paper presents a Perceiver architecture for mRNA and protein level prediction. In the future, inserting regulatory and epigenetic information into the model could improve mRNA and protein level predictions. The source code is freely available at https://github.com/MatteoStefanini/DNAPerceiver.

2023 Articolo su rivista

Revelio: A Modular and Effective Framework for Reproducible Training and Evaluation of Morphing Attack Detectors

Authors: Borghi, Guido; Di Domenico, Nicolò; Franco, Annalisa; Ferrara, Matteo; Maltoni, Davide

Published in: IEEE ACCESS

Morphing Attack, i.e. the elusion of face verification systems through a facial morphing operation between a criminal and an accomplice, … (Read full abstract)

Morphing Attack, i.e. the elusion of face verification systems through a facial morphing operation between a criminal and an accomplice, has recently emerged as a serious security threat. Despite the importance of this kind of attack, the development and comparison of Morphing Attack Detection (MAD) methods is still a challenging task, especially with deep learning approaches. Specifically, the lack of public datasets, the absence of common training and validation protocols, and the limited release of public source code hamper the reproducibility and objective comparison of new MAD systems. Usually, these aspects are mainly due to privacy concerns, that limit data transfers and storage, and to the recent introduction of the MAD task. Therefore, in this paper, we propose and publicly release Revelio, a modular framework for the reproducible development and evaluation of MAD systems. We include an overview of the modules, and describe the plugin system providing the possibility of extending native components with new functionalities. An extensive cross-datasets experimental evaluation is conducted to validate the framework and the performance of trained models on several publicly-released datasets, and to deeply analyze the main challenges in the MAD task based on single input images. We also propose a new metric, namely WAED, to summarize in a single value the error-based metrics commonly used in the MAD task, computed over different datasets, thus facilitating the comparative evaluation of different approaches. Finally, by exploiting Revelio, a new state-of-the-art MAD model (on SOTAMD single-image benchmark) is proposed and released.

2023 Articolo su rivista

Scoring Enzootic Pneumonia-like Lesions in Slaughtered Pigs: Traditional vs. Artificial-Intelligence-Based Methods

Authors: Hattab, Jasmine; Porrello, Angelo; Romano, Anastasia; Rosamilia, Alfonso; Ghidini, Sergio; Bernabò, Nicola; Capobianco Dondona, Andrea; Corradi, Attilio; Marruchella, Giuseppe

Published in: PATHOGENS

Artificial-intelligence-based methods are regularly used in the biomedical sciences, mainly in the field of diagnostic imaging. Recently, convolutional neural networks … (Read full abstract)

Artificial-intelligence-based methods are regularly used in the biomedical sciences, mainly in the field of diagnostic imaging. Recently, convolutional neural networks have been trained to score pleurisy and pneumonia in slaughtered pigs. The aim of this study is to further evaluate the performance of a convolutional neural network when compared with the gold standard (i.e., scores provided by a skilled operator along the slaughter chain through visual inspection and palpation). In total, 441 lungs (180 healthy and 261 diseased) are included in this study. Each lung was scored according to traditional methods, which represent the gold standard (Madec’s and Christensen’s grids). Moreover, the same lungs were photographed and thereafter scored by a trained convolutional neural network. Overall, the results reveal that the convolutional neural network is very specific (95.55%) and quite sensitive (85.05%), showing a rather high correlation when compared with the scores provided by a skilled veterinarian (Spearman’s coefficient = 0.831, p < 0.01). In summary, this study suggests that convolutional neural networks could be effectively used at slaughterhouses and stimulates further investigation in this field of research.

2023 Articolo su rivista

Sharing Cultural Heritage—The Case of the Lodovico Media Library

Authors: Al Kalak, Matteo; Baraldi, Lorenzo

Published in: MULTIMODAL TECHNOLOGIES AND INTERACTION

2023 Articolo su rivista

Spotting Virus from Satellites: Modeling the Circulation of West Nile Virus Through Graph Neural Networks

Authors: Bonicelli, Lorenzo; Porrello, Angelo; Vincenzi, Stefano; Ippoliti, Carla; Iapaolo, Federica; Conte, Annamaria; Calderara, Simone

Published in: IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

2023 Articolo su rivista

StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model

Authors: Xu, Z.; Sangineto, E.; Sebe, N.

Published in: PROCEEDINGS IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION

Despite the progress made in the style transfer task, most previous work focus on transferring only relatively simple features like … (Read full abstract)

Despite the progress made in the style transfer task, most previous work focus on transferring only relatively simple features like color or texture, while missing more abstract concepts such as overall art expression or painter-specific traits. However, these abstract semantics can be captured by models like DALL-E or CLIP, which have been trained using huge datasets of images and textual documents. In this paper, we propose StylerDALLE, a style transfer method that exploits both of these models and uses natural language to describe abstract art styles. Specifically, we formulate the language-guided style transfer task as a non-autoregressive token sequence translation, i.e., from input content image to output stylized image, in the discrete latent space of a large-scale pretrained vector-quantized tokenizer, e.g., the discrete variational auto-encoder (dVAE) of DALL-E. To incorporate style information, we propose a Reinforcement Learning strategy with CLIP-based language supervision that ensures stylization and content preservation simultaneously. Experimental results demonstrate the superiority of our method, which can effectively transfer art styles using language instructions at different granularities. Code is available at https://github.com/zipengxuc/StylerDALLE.

2023 Relazione in Atti di Convegno

Superpixel Positional Encoding to Improve ViT-based Semantic Segmentation Models

Authors: Amoroso, Roberto; Tomei, Matteo; Baraldi, Lorenzo; Cucchiara, Rita

2023 Relazione in Atti di Convegno

SynthCap: Augmenting Transformers with Synthetic Data for Image Captioning

Authors: Caffagni, Davide; Barraco, Manuele; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Image captioning is a challenging task that combines Computer Vision and Natural Language Processing to generate descriptive and accurate textual … (Read full abstract)

Image captioning is a challenging task that combines Computer Vision and Natural Language Processing to generate descriptive and accurate textual descriptions for input images. Research efforts in this field mainly focus on developing novel architectural components to extend image captioning models and using large-scale image-text datasets crawled from the web to boost final performance. In this work, we explore an alternative to web-crawled data and augment the training dataset with synthetic images generated by a latent diffusion model. In particular, we propose a simple yet effective synthetic data augmentation framework that is capable of significantly improving the quality of captions generated by a standard Transformer-based model, leading to competitive results on the COCO dataset.

2023 Relazione in Atti di Convegno

Towards Explainable Navigation and Recounting

Authors: Poppi, Samuele; Rawal, Niyati; Bigazzi, Roberto; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Explainability and interpretability of deep neural networks have become of crucial importance over the years in Computer Vision, concurrently with … (Read full abstract)

Explainability and interpretability of deep neural networks have become of crucial importance over the years in Computer Vision, concurrently with the need to understand increasingly complex models. This necessity has fostered research on approaches that facilitate human comprehension of neural methods. In this work, we propose an explainable setting for visual navigation, in which an autonomous agent needs to explore an unseen indoor environment while portraying and explaining interesting scenes with natural language descriptions. We combine recent advances in ongoing research fields, employing an explainability method on images generated through agent-environment interaction. Our approach uses explainable maps to visualize model predictions and highlight the correlation between the observed entities and the generated words, to focus on prominent objects encountered during the environment exploration. The experimental section demonstrates that our approach can identify the regions of the images that the agent concentrates on to describe its point of view, improving explainability.

2023 Relazione in Atti di Convegno

TrackFlow: Multi-Object Tracking with Normalizing Flows

Authors: Mancusi, Gianluca; Panariello, Aniello; Porrello, Angelo; Fabbri, Matteo; Calderara, Simone; Cucchiara, Rita

Published in: PROCEEDINGS IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION

The field of multi-object tracking has recently seen a renewed interest in the good old schema of tracking-by-detection, as its … (Read full abstract)

The field of multi-object tracking has recently seen a renewed interest in the good old schema of tracking-by-detection, as its simplicity and strong priors spare it from the complex design and painful babysitting of tracking-by-attention approaches. In view of this, we aim at extending tracking-by-detection to multi-modal settings, where a comprehensive cost has to be computed from heterogeneous information e.g., 2D motion cues, visual appearance, and pose estimates. More precisely, we follow a case study where a rough estimate of 3D information is also available and must be merged with other traditional metrics (e.g., the IoU). To achieve that, recent approaches resort to either simple rules or complex heuristics to balance the contribution of each cost. However, i) they require careful tuning of tailored hyperparameters on a hold-out set, and ii) they imply these costs to be independent, which does not hold in reality. We address these issues by building upon an elegant probabilistic formulation, which considers the cost of a candidate association as the negative log-likelihood yielded by a deep density estimator, trained to model the conditional joint probability distribution of correct associations. Our experiments, conducted on both simulated and real benchmarks, show that our approach consistently enhances the performance of several tracking-by-detection algorithms.

2023 Relazione in Atti di Convegno

Page 24 of 106 • Total publications: 1059