Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

A Second-Order Perspective on Model Compositionality and Incremental Learning

Authors: Porrello, Angelo; Bonicelli, Lorenzo; Buzzega, Pietro; Millunzi, Monica; Calderara, Simone; Cucchiara, Rita

2025 Relazione in Atti di Convegno

Accurate 3D Medical Image Segmentation with Mambas

Authors: Lumetti, Luca; Pipoli, Vittorio; Marchesini, Kevin; Ficarra, Elisa; Grana, Costantino; Bolelli, Federico

Published in: PROCEEDINGS INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING

CNNs and Transformer-based architectures are recently dominating the field of 3D medical segmentation. While CNNs face limitations in the local … (Read full abstract)

CNNs and Transformer-based architectures are recently dominating the field of 3D medical segmentation. While CNNs face limitations in the local receptive field, Transformers require significant memory and data, making them less suitable for analyzing large 3D medical volumes. Consequently, fully convolutional network models like U-Net are still leading the 3D segmentation scenario. Although efforts have been made to reduce the Transformers computational complexity, such optimized models still struggle with content-based reasoning. This paper examines Mamba, a Recurrent Neural Network (RNN) based on State Space Models (SSMs), which achieves linear complexity and has outperformed Transformers in long-sequence tasks. Specifically, we assess Mamba’s performance in 3D medical segmentation using three widely recognized and commonly employed datasets and propose architectural enhancements to improve its segmentation effectiveness by mitigating the primary shortcomings of existing Mamba-based solutions.

2025 Relazione in Atti di Convegno

Accurate and Efficient Low-Rank Model Merging in Core Space

Authors: Panariello, Aniello; Marczak, Daniel; Magistri, Simone; Porrello, Angelo; Twardowski, Bartłomiej; D Bagdanov, Andrew; Calderara, Simone; Van De Weijer, Joost

2025 Relazione in Atti di Convegno

AIGeN-Llama: An Adversarial Approach for Instruction Generation in VLN using Llama2 Model

Authors: Rawal, Niyati; Baraldi, Lorenzo; Cucchiara, Rita

Published in: CEUR WORKSHOP PROCEEDINGS

2025 Relazione in Atti di Convegno

Alfie: Democratising RGBA image generation with no $$$

Authors: Quattrini, Fabio; Pippi, Vittorio; Cascianelli, Silvia; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2025 Relazione in Atti di Convegno

AMONOT: Synthetic Aging for Differential Morphing Attack Detection Systems

Authors: Spathis, Giuseppe; Di Domenico, Nicolò; Borghi, Guido; Franco, Annalisa; Maltoni, Davide

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2025 Relazione in Atti di Convegno

An Attention-Based Representation Distillation Baseline for Multi-label Continual Learning

Authors: Menabue, Martin; Frascaroli, Emanuele; Boschini, Matteo; Bonicelli, Lorenzo; Porrello, Angelo; Calderara, Simone

Published in: LECTURE NOTES IN COMPUTER SCIENCE

The field of Continual Learning (CL) has inspired numerous researchers over the years, leading to increasingly advanced countermeasures to the … (Read full abstract)

The field of Continual Learning (CL) has inspired numerous researchers over the years, leading to increasingly advanced countermeasures to the issue of catastrophic forgetting. Most studies have focused on the single-class scenario, where each example comes with a single label. The recent literature has successfully tackled such a setting, with impressive results. Differently, we shift our attention to the multi-label scenario, as we feel it to be more representative of real-world open problems. In our work, we show that existing state-of-the-art CL methods fail to achieve satisfactory performance, thus questioning the real advance claimed in recent years. Therefore, we assess both old-style and novel strategies and propose, on top of them, an approach called Selective Class Attention Distillation (SCAD). It relies on a knowledge transfer technique that seeks to align the representations of the student network – which trains continuously and is subject to forgetting – with the teacher ones, which is pretrained and kept frozen. Importantly, our method is able to selectively transfer the relevant information from the teacher to the student, thereby preventing irrelevant information from harming the student’s performance during online training. To demonstrate the merits of our approach, we conduct experiments on two different multi-label datasets, showing that our method outperforms the current state-of-the-art Continual Learning methods. Our findings highlight the importance of addressing the unique challenges posed by multi-label environments in the field of Continual Learning. The code of SCAD is available at https://github.com/aimagelab/SCAD-LOD-2024.

2025 Relazione in Atti di Convegno

Architettura Software IoT per la Diagnosi e Identificazione dei Guasti a Misura d'Uomo

Authors: Bertoli, Annalisa

Negli ultimi anni, la complessità dei sistemi produttivi è aumentata significativamente a causa dei progressi nelle tecnologie derivanti dall'Industria 4.0, … (Read full abstract)

Negli ultimi anni, la complessità dei sistemi produttivi è aumentata significativamente a causa dei progressi nelle tecnologie derivanti dall'Industria 4.0, in particolare attraverso l'Internet of Things (IoT) e i big data. Questa evoluzione ha facilitato l'accesso senza precedenti a enormi quantità di dati, ma ha anche introdotto sfide nella raccolta dei dati e nella loro applicazione pratica per gli operatori che interagiscono con questi sistemi. Questa tesi presenta un'architettura IoT progettata per ambienti industriali reali, con l'obiettivo di dimostrare come i dati possano essere utilizzati efficacemente per monitorare le operazioni e i processi produttivi in tempo reale. L'approccio proposto migliora la capacità di rilevare e gestire i guasti, fornendo agli operatori le informazioni necessarie per prendere decisioni informate. Integrando sensori intelligenti e analisi avanzate, è possibile ottenere una visibilità dettagliata sullo stato del sistema, consentendo interventi di manutenzione tempestivi e preparando il terreno per future implementazioni di manutenzione predittiva. La ricerca include un'analisi di due casi di studio distinti, mostrando la versatilità dell'architettura in diverse applicazioni industriali. Illustra come l'utilizzo efficace dei dati possa ottimizzare l'efficienza operativa e ridurre i tempi di inattività, contribuendo così a una migliore gestione del sistema. Inoltre, questo approccio consente agli operatori umani di comprendere meglio i loro ambienti e di prendere decisioni autonome basate su informazioni in tempo reale.

2025

Augmenting and Mixing Transformers with Synthetic Data for Image Captioning

Authors: Caffagni, Davide; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: IMAGE AND VISION COMPUTING

Image captioning has attracted significant attention within the Computer Vision and Multimedia research domains, resulting in the development of effective … (Read full abstract)

Image captioning has attracted significant attention within the Computer Vision and Multimedia research domains, resulting in the development of effective methods for generating natural language descriptions of images. Concurrently, the rise of generative models has facilitated the production of highly realistic and high-quality images, particularly through recent advancements in latent diffusion models. In this paper, we propose to leverage the recent advances in Generative AI and create additional training data that can be effectively used to boost the performance of an image captioning model. Specifically, we combine real images with their synthetic counterparts generated by Stable Diffusion using a Mixup data augmentation technique to create novel training examples. Extensive experiments on the COCO dataset demonstrate the effectiveness of our solution in comparison to different baselines and state-of-the-art methods and validate the benefits of using synthetic data to augment the training stage of an image captioning model and improve the quality of the generated captions. Source code and trained models are publicly available at: https://github.com/aimagelab/synthcap_pp.

2025 Articolo su rivista

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Authors: Cocchi, Federico; Moratelli, Nicholas; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Multimodal LLMs (MLLMs) are the natural extension of large language models to handle multimodal inputs, combining text and image data. … (Read full abstract)

Multimodal LLMs (MLLMs) are the natural extension of large language models to handle multimodal inputs, combining text and image data. They have recently garnered attention due to their capability to address complex tasks involving both modalities. However, their effectiveness is limited to the knowledge acquired during training, which restricts their practical utility. In this work, we introduce a novel method to enhance the adaptability of MLLMs by integrating external knowledge sources. Our proposed model, Reflective LLaVA (ReflectiVA), utilizes reflective tokens to dynamically determine the need for external knowledge and predict the relevance of information retrieved from an external database. Tokens are trained following a two-stage two-model training recipe. This ultimately enables the MLLM to manage external knowledge while preserving fluency and performance on tasks where external knowledge is not needed. Through our experiments, we demonstrate the efficacy of ReflectiVA for knowledge-based visual question answering, highlighting its superior performance compared to existing methods. Source code and trained models are publicly available at https://github.com/aimagelab/ReflectiVA.

2025 Relazione in Atti di Convegno

Page 3 of 106 • Total publications: 1059