Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

DOLFIN: Balancing Stability and Plasticity in Federated Continual Learning

Authors: Moussadek, Omayma; Salami, Riccardo; Calderara, Simone

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Federated continual learning (FCL) enables models to learn new tasks across multiple distributed clients, protecting privacy and without forgetting previously … (Read full abstract)

Federated continual learning (FCL) enables models to learn new tasks across multiple distributed clients, protecting privacy and without forgetting previously acquired knowledge. However, current methods face challenges balancing performance, privacy preservation, and communication efficiency. We introduce a Distributed Online LoRA for Federated INcremental learning methodDOLFIN, a novel approach combining Vision Transformers with low-rank adapters designed to efficiently and stably learn new tasks in federated environments. Our method leverages LoRA for minimal communication overhead and incorporates Dual Gradient Projection Memory (DualGPM) to prevent forgetting. Evaluated on CIFAR-100, ImageNet-R, ImageNet-A, and CUB-200 under two Dirichlet heterogeneity settings,DOLFINconsistently surpasses six strong baselines in final average accuracy while matching their memory footprint. Orthogonal low-rank adapters offer an effective and scalable solution for privacy-preserving continual learning in federated settings.

2026 Relazione in Atti di Convegno

EARL: Embracing amnesic replay for learning with noisy labels

Authors: Millunzi, Monica; Bonicelli, Lorenzo; Porrello, Angelo; Credi, Jacopo; Kolm, Petter N.; Calderara, Simone

Published in: PATTERN RECOGNITION

Modern Deep Neural Networks struggle to retain knowledge in streaming data environments, often leading to forgetting during incremental training. Most … (Read full abstract)

Modern Deep Neural Networks struggle to retain knowledge in streaming data environments, often leading to forgetting during incremental training. Most Continual Learning (CL) approaches address this issue by rehearsing past data – stored in a replay buffer – while acquiring new knowledge. However, in practical scenarios, noisy labels can contaminate the replay buffer, undermining performance. This work builds upon the previous “May the Forgetting Be with You”, designed to tackle Continual Learning with Noisy Labels (CLN). By leveraging the distinct learning dynamics between correctly and incorrectly labeled examples, the method induces targeted forgetting to identify and filter out noisy labels. We propose EARL, which improves on its predecessor by introducing i) a detailed analysis of the learning dynamics occurring in the presence of noise, ii) a robust analysis under more realistic noise conditions, iii) an evaluation of performance using pre-trained backbones and modern prompt-based CL baselines, iv) a detailed study on the influence of different sampling strategies, v) experiments on Natural Language Processing (NLP) benchmarks. This work unravels the motivations and findings of the previous research, shedding light on the effectiveness of its components in achieving high performance and minimizing forgetting.

2026 Articolo su rivista

Enabling 8B Bitwise Autoregressive Image Generation on Edge GPUs

Authors: Vezzali, Enrico; Bolelli, Federico; Grana, Costantino; Benini, Luca; Li, Yawei

Visual Autoregressive (VAR) models face a severe "Memory Wall" on edge devices due to large model size and substantial KV-cache … (Read full abstract)

Visual Autoregressive (VAR) models face a severe "Memory Wall" on edge devices due to large model size and substantial KV-cache requirements. In this work, we analyze the Infinity VAR family (2B and 8B) and propose a compression pipeline for deployment on constrained NVIDIA Jetson systems. We diagnose critical bottlenecks: activation outliers reaching 353x the median and channel-skewed cache variance. To address this, we propose a hybrid pipeline combining SVDQuant—to structurally decouple weight outliers—and Asymmetric Per-Channel KV8 quantization. Our approach reduces the Infinity-8B footprint by 64% (37.1GB →13.3GB), fitting it on the mid-range Orin NX with a 4.1x speedup over Flux.1-dev (W4A4), while achieving superior aesthetic alignment (ImageReward 1.13 vs 0.935). Crucially, we also unlock entry-level feasibility for the Infinity-2B, compressing it from 16.0 to 7.71 GB to enable deployment on the Orin Nano. These results establish a new efficiency standard for high-fidelity generative AI at the edge. The code is available at https://github.com/Henvezz95/deepcompressor.

2026 Relazione in Atti di Convegno

Evoluzione della Conoscenza nell’Intelligenza Artificiale: Verso Reti Neurali Profonde Robuste e Modulari

Authors: Capitani, Giacomo

Le reti neurali profonde sono diventate un pilastro fondamentale dell’Intelligenza Artificiale moderna grazie alla loro straordinaria efficacia e versatilità. Tuttavia, … (Read full abstract)

Le reti neurali profonde sono diventate un pilastro fondamentale dell’Intelligenza Artificiale moderna grazie alla loro straordinaria efficacia e versatilità. Tuttavia, le loro capacità di generalizzazione dipendono tipicamente dall’assunzione che i dati siano indipendenti e distribuiti in modo identico, una condizione raramente soddisfatta negli scenari reali, dinamici ed evolutivi. Quando le distribuzioni dei dati variano, i modelli tendono a sfruttare scorciatoie (inclusi bias spurî e impliciti), a soffrire di catastrophic forgetting e a mostrare capacità compositive limitate. La presente tesi esplora come i modelli neurali possano essere guidati ad adattare, preservare, trasferire e comporre le proprie capacità oltre il semplice data fitting. La prima parte si concentra sulla mitigazione del bias in assenza di attributi protetti espliciti. Si sfruttano cluster latenti per formare gruppi semantici proxy che orientano l’ottimizzazione lontano dall’apprendimento di scorciatoie, migliorando così la robustezza. L’analisi viene poi estesa al continual learning, dove le strategie basate su rehearsal possono introdurre o amplificare correlazioni spurie se i segnali di debiasing non vengono gestiti correttamente. Per affrontare tale problema, vengono proposti meccanismi di rehearsal bilanciati, capaci di mantenere l’equilibrio in termini di valori di loss e mitigare correlazioni spurie sotto cambiamenti di distribuzione. La seconda parte indaga i modelli multimodali visione–linguaggio, rivelando che architetture simili a CLIP manifestano bias impliciti analoghi a quelli umani. Si introducono tecniche leggere di prompt steering per ridurre i bias impliciti nei compiti di image retrieval e classificazione. Successivamente, viene analizzato lo spazio dei parametri per determinare quando i task vector mantengono conoscenza trasferibile tra modelli addestrati su dataset distinti, e vengono definite procedure di allineamento basate su permutazioni per consentire il trasporto di conoscenza tra modelli. Infine, si dimostra che le proprietà geometriche del loss landscape, in particolare la sua piattezza, predicono la compatibilità tra modelli fine-tuned derivati da un pretraining comune, con applicazioni pratiche nella segmentazione medica 3D. Analisi sperimentali approfondite su diversi dataset e paradigmi di apprendimento supportano questi risultati. Complessivamente, i contributi delineano un quadro a quattro assi della generalizzazione nelle reti neurali: (i) mitigazione dell’apprendimento di scorciatoie a livello di dati e feature; (ii) prevenzione delle correlazioni spurie nel continual learning; (iii) disambiguazione semantica nell’allineamento multimodale; (iv) manipolazione della geometria dello spazio dei parametri per il trasferimento di conoscenza e il model merging. Attraverso questa prospettiva, la tesi propone principi e metodologie per lo sviluppo di sistemi neurali adattivi le cui capacità possano essere mantenute, trasferite e composte in modo robusto.

2026 Tesi di dottorato

Experience-dependent modulation of neural mechanisms underlying olfactory identification: preliminary evidence

Authors: Ricci, F.; Casadio, C.; Zanelli, V.; Carpentiero, O.; Caselli, M.; Nandi, A.; Masino, F.; Lui, F.; Benuzzi, F

2026 Relazione in Atti di Convegno

FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation

Authors: Saporita, Alessia; Pipoli, Vittorio; Bolelli, Federico; Baraldi, Lorenzo; Acquaviva, Andrea; Ficarra, Elisa

Multimodal Large Language Models (MLLMs) have achieved impressive performance across a variety of vision–language tasks. However, their internal working mechanisms … (Read full abstract)

Multimodal Large Language Models (MLLMs) have achieved impressive performance across a variety of vision–language tasks. However, their internal working mechanisms remain largely underexplored. In his work, we introduce FG-TRACER, a framework designed to analyze the information flow between visual and textual modalities in MLLMs in free-form generation. Notably, our numerically stabilized computational method enables the first systematic analysis of multimodal information flow in underexplored domains such as image captioning and chain-of-thought (CoT) reasoning. We apply FG-TRACER to two state-of-the-art MLLMs—LLaMA 3.2-Vision and LLaVA 1.5—across three vision–language benchmarks—TextVQA, COCO 2014, and ChartQA—and we conduct a word-level analysis of multimodal integration. Our findings uncover distinct patterns of multimodal fusion across models and tasks, demonstrating that fusion dynamics are both model- and task-dependent. Overall, FG-TRACER offers a robust methodology for probing the internal mechanisms of MLLMs in free-form settings, providing new insights into their multimodal reasoning strategies. Our source code is publicly available at https://anonymous.4open.science/r/FG-TRACER-CB5A/.

2026 Relazione in Atti di Convegno

Generating Synthetic Data with Large Language Models for Low-Resource Sentence Retrieval

Authors: Caffagni, Davide.; Cocchi, Federico; Mambelli, Anna; Tutrone, Fabio; Zanella, Marco; Cornia, Marcella.; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Sentence similarity search is a fundamental task in information retrieval, enabling applications such as search engines, question answering, and textual … (Read full abstract)

Sentence similarity search is a fundamental task in information retrieval, enabling applications such as search engines, question answering, and textual analysis. However, retrieval systems often struggle when training data are scarce, as is the case for low-resource languages or specialized domains such as ancient texts. To address this challenge, we propose a novel paradigm for domain-specific sentence similarity search, where the embedding space is shaped by a combination of limited real data and a large amount of synthetic data generated by Large Language Models (LLMs). Specifically, we employ LLMs to generate domain-specific sentence pairs and fine-tune a sentence embedding model, effectively distilling knowledge from the LLM to the retrieval model. We validate our method through a case study on biblical intertextuality in Latin, demonstrating that synthetic data augmentation significantly improves retrieval effectiveness in a domain with scarce annotated resources. More broadly, our approach offers a scalable and adaptable framework for enhancing retrieval in domain-specific contexts. Source code and trained models are available at https://github.com/aimagelab/biblical-retrieval-synthesis.

2026 Relazione in Atti di Convegno

Gradient-sign Masking for Task Vector Transport Across Pre-Trained Models

Authors: Rinaldi, Filippo; Panariello, Aniello; Salici, Giacomo; Liu, Fengyuan; Ciccone, Marco; Porrello, Angelo; Calderara, Simone

When a new release of a foundation model is published, practitioners typically need to repeat fine-tuning, even if the same … (Read full abstract)

When a new release of a foundation model is published, practitioners typically need to repeat fine-tuning, even if the same task was already tackled in the previous version. A promising alternative is to reuse the parameter changes (i.e., task vectors) that capture how a model adapts to a specific task. However, these vectors often fail to transfer across different pre-trained models because their parameter spaces are misaligned. In this work, we show that successful transfer depends strongly on the gradient-sign structure of the new model. Based on this insight, we propose GradFix, which approximates the ideal sign structure and leverages it to transfer knowledge using only a handful of labeled samples. Notably, this requires no additional fine-tuning: we only compute a few target-model gradients without parameter updates and mask the source task vector accordingly. This yields an update that is locally aligned with the target loss landscape, effectively rebasing the task vector onto the new pre-training. We provide a theoretical guarantee that our method ensures first-order descent. Empirically, we demonstrate significant performance gains on vision and language benchmarks, consistently outperforming naive task vector addition and few-shot fine-tuning. We further show that transporting task vectors improves multi-task and multi-source model merging. Code is available at https://github.com/fillo-rinaldi/GradFix.

2026 Relazione in Atti di Convegno

GramSR: Visual Feature Conditioning for Diffusion-Based Super-Resolution

Authors: D'Oronzio, Fabio; Putamorsi, Federico; Zini, Leonardo; Cornia, Marcella; Baraldi, Lorenzo

Despite recent advances, single-image super-resolution (SR) remains challenging, especially in real-world scenarios with complex degradations. Diffusion-based SR methods, particularly those … (Read full abstract)

Despite recent advances, single-image super-resolution (SR) remains challenging, especially in real-world scenarios with complex degradations. Diffusion-based SR methods, particularly those built on Stable Diffusion, leverage strong generative priors but commonly rely on text conditioning derived from semantic captioning. Such textual descriptions provide only high-level semantics and lack the spatially aligned visual information required for faithful restoration, leading to a representation gap between abstract semantics and spatially aligned visual details. To address this limitation, we propose GramSR, a one-step diffusion-based SR framework that replaces text conditioning with dense visual features extracted from the low-resolution input using a pre-trained DINOv3 encoder. GramSR adopts a three-stage LoRA architecture, where pixel-level, semantic-level, and texture-level LoRA modules are trained sequentially. The pixel-level module focuses on degradation removal using L2 loss, the semantic-level module enhances perceptual details via LPIPS and CSD losses, and the texture-level module enforces feature correlation consistency through a Gram matrix loss computed from DINOv3 features. At inference, independent guidance scales enable flexible control over degradation removal, semantic enhancement, and texture preservation. Extensive experiments on standard SR benchmarks demonstrate that GramSR consistently outperforms existing one-step diffusion-based methods, achieving superior structural fidelity and texture realism.

2026 Relazione in Atti di Convegno

Histological Brain Imaging Super-resolution with Frequency-guided Diffusion Models

Authors: Casari, Giovanni; Bolelli, Federico; Grana, Costantino

High-resolution histological imaging provides essential detail for quantitative brain modeling, yet acquiring whole-brain data at micrometer scale remains technically and … (Read full abstract)

High-resolution histological imaging provides essential detail for quantitative brain modeling, yet acquiring whole-brain data at micrometer scale remains technically and economically challenging. This work introduces Brain-SR, a diffusion-based super-resolution framework designed to reconstruct high-resolution cortical sections from low-resolution BigBrain data. Building upon the InvSR paradigm, our method performs resolution enhancement in the latent space of a pretrained variational autoencoder, guided by a task-specific noise-predictor network. A key contribution is a frequency-domain supervision term that compares the magnitude spectra of predicted and target patches, enforcing spectral consistency while remaining robust to local misalignments. Quantitative evaluations demonstrate that Brain-SR achieves substantial improvements in LPIPS (-27%) and FID (-58%) compared to baseline diffusion Super-Resolution, while spectral analysis confirms accurate recovery of the frequency distribution. The resulting reconstructions preserve neuronal structures consistent with high-resolution references, offering a practical step toward large-scale, morphologically faithful brain histology reconstruction. The code is publicly available to support reproducibility: https://github.com/AImageLab-zip/Brain-SR.

2026 Relazione in Atti di Convegno

Page 2 of 110 • Total publications: 1098