Publications - AImageLab

Evoluzione della Conoscenza nell’Intelligenza Artificiale: Verso Reti Neurali Profonde Robuste e Modulari

Authors: Capitani, Giacomo

Le reti neurali profonde sono diventate un pilastro fondamentale dell’Intelligenza Artificiale moderna grazie alla loro straordinaria efficacia e versatilità. Tuttavia, … (Read full abstract)

Le reti neurali profonde sono diventate un pilastro fondamentale dell’Intelligenza Artificiale moderna grazie alla loro straordinaria efficacia e versatilità. Tuttavia, le loro capacità di generalizzazione dipendono tipicamente dall’assunzione che i dati siano indipendenti e distribuiti in modo identico, una condizione raramente soddisfatta negli scenari reali, dinamici ed evolutivi. Quando le distribuzioni dei dati variano, i modelli tendono a sfruttare scorciatoie (inclusi bias spurî e impliciti), a soffrire di catastrophic forgetting e a mostrare capacità compositive limitate. La presente tesi esplora come i modelli neurali possano essere guidati ad adattare, preservare, trasferire e comporre le proprie capacità oltre il semplice data fitting. La prima parte si concentra sulla mitigazione del bias in assenza di attributi protetti espliciti. Si sfruttano cluster latenti per formare gruppi semantici proxy che orientano l’ottimizzazione lontano dall’apprendimento di scorciatoie, migliorando così la robustezza. L’analisi viene poi estesa al continual learning, dove le strategie basate su rehearsal possono introdurre o amplificare correlazioni spurie se i segnali di debiasing non vengono gestiti correttamente. Per affrontare tale problema, vengono proposti meccanismi di rehearsal bilanciati, capaci di mantenere l’equilibrio in termini di valori di loss e mitigare correlazioni spurie sotto cambiamenti di distribuzione. La seconda parte indaga i modelli multimodali visione–linguaggio, rivelando che architetture simili a CLIP manifestano bias impliciti analoghi a quelli umani. Si introducono tecniche leggere di prompt steering per ridurre i bias impliciti nei compiti di image retrieval e classificazione. Successivamente, viene analizzato lo spazio dei parametri per determinare quando i task vector mantengono conoscenza trasferibile tra modelli addestrati su dataset distinti, e vengono definite procedure di allineamento basate su permutazioni per consentire il trasporto di conoscenza tra modelli. Infine, si dimostra che le proprietà geometriche del loss landscape, in particolare la sua piattezza, predicono la compatibilità tra modelli fine-tuned derivati da un pretraining comune, con applicazioni pratiche nella segmentazione medica 3D. Analisi sperimentali approfondite su diversi dataset e paradigmi di apprendimento supportano questi risultati. Complessivamente, i contributi delineano un quadro a quattro assi della generalizzazione nelle reti neurali: (i) mitigazione dell’apprendimento di scorciatoie a livello di dati e feature; (ii) prevenzione delle correlazioni spurie nel continual learning; (iii) disambiguazione semantica nell’allineamento multimodale; (iv) manipolazione della geometria dello spazio dei parametri per il trasferimento di conoscenza e il model merging. Attraverso questa prospettiva, la tesi propone principi e metodologie per lo sviluppo di sistemi neurali adattivi le cui capacità possano essere mantenute, trasferite e composte in modo robusto.

2026 Tesi di dottorato

IRIS

Towards Unbiased Continual Learning: Avoiding Forgetting in the Presence of Spurious Correlations

Authors: Capitani, Giacomo; Bonicelli, Lorenzo; Porrello, Angelo; Bolelli, Federico; Calderara, Simone; Ficarra, Elisa

Published in: IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION

2025 Relazione in Atti di Convegno

DOI IRIS

U-Net Transplant: The Role of Pre-training for Model Merging in 3D Medical Segmentation

Authors: Lumetti, Luca; Capitani, Giacomo; Ficarra, Elisa; Grana, Costantino; Calderara, Simone; Porrello, Angelo; Bolelli, Federico

Despite their remarkable success in medical image segmentation, the life cycle of deep neural networks remains a challenge in clinical … (Read full abstract)

Despite their remarkable success in medical image segmentation, the life cycle of deep neural networks remains a challenge in clinical applications. These models must be regularly updated to integrate new medical data and customized to meet evolving diagnostic standards, regulatory requirements, commercial needs, and privacy constraints. Model merging offers a promising solution, as it allows working with multiple specialized networks that can be created and combined dynamically instead of relying on monolithic models. While extensively studied in standard 2D classification, the potential of model merging for 3D segmentation remains unexplored. This paper presents an efficient framework that allows effective model merging in the domain of 3D image segmentation. Our approach builds upon theoretical analysis and encourages wide minima during pre-training, which we demonstrate to facilitate subsequent model merging. Using U-Net 3D, we evaluate the method on distinct anatomical structures with the ToothFairy2 and BTCV Abdomen datasets. To support further research, we release the source code and all the model weights in a dedicated repository: https://github.com/LucaLumetti/UNetTransplant

2025 Relazione in Atti di Convegno

IRIS

Update Your Transformer to the Latest Release: Re-Basin of Task Vectors

Authors: Rinaldi, Filippo; Capitani, Giacomo; Bonicelli, Lorenzo; Crisostomi, Donato; Bolelli, Federico; Rodolà, Emanuele; Ficarra, Elisa; Calderara, Simone; Porrello, Angelo

Published in: PROCEEDINGS OF MACHINE LEARNING RESEARCH

Foundation models serve as the backbone for numerous specialized models developed through fine-tuning. However, when the underlying pretrained model is … (Read full abstract)

Foundation models serve as the backbone for numerous specialized models developed through fine-tuning. However, when the underlying pretrained model is updated or retrained (e.g., on larger and more curated datasets), the fine-tuned model becomes obsolete, losing its utility and requiring retraining. This raises the question: is it possible to transfer fine-tuning to a new release of the model? In this work, we investigate how to transfer fine-tuning to a new checkpoint without having to re-train, in a data-free manner. To do so, we draw principles from model re-basin and provide a recipe based on weight permutations to re-base the modifications made to the original base model, often called task vector. In particular, our approach tailors model re-basin for Transformer models, taking into account the challenges of residual connections and multi-head attention layers. Specifically, we propose a two-level method rooted in spectral theory, initially permuting the attention heads and subsequently adjusting parameters within select pairs of heads. Through extensive experiments on visual and textual tasks, we achieve the seamless transfer of fine-tuned knowledge to new pre-trained backbones without relying on a single training step or datapoint.

2025 Relazione in Atti di Convegno

IRIS

Beyond the Surface: Comprehensive Analysis of Implicit Bias in Vision-Language Models

Authors: Capitani, Giacomo; Lucarini, Alice; Bonicelli, Lorenzo; Bolelli, Federico; Calderara, Simone; Vezzali, Loris; Ficarra, Elisa

Implicit biases, subtle and unconscious attitudes, permeate various facets of human decision-making and are similarly pervasive in Artificial Intelligence (AI) … (Read full abstract)

Implicit biases, subtle and unconscious attitudes, permeate various facets of human decision-making and are similarly pervasive in Artificial Intelligence (AI) systems. These biases can stem from shortcut learning, where models rely on superficial patterns that do not capture the underlying phenomena. Inspired by social psychology studies, we introduce two novel metrics to analyze implicit biases in visual-language models. Our comprehensive analysis of 90 open-clip models reveals widespread anomalies related to ethnicity and gender. The first metric considers the cosine similarity between images and text prompts related to social stereotypes. The second metric adapts the Implicit Association Test (IAT), which evaluates prejudice and hidden discrimination within human behavior. Our findings illustrate that conventional text-based debiasing efforts can inadvertently amplify second-order biases instead of mitigating them. Furthermore, in expanding our evaluation to multimodal Large Language Models (LLMs), we demonstrate disparities in the tendency to generate semantically positive or negative outputs, depending on the ethnicity or gender of the individuals depicted in the input images.

2024 Relazione in Atti di Convegno

IRIS

ClusterFix: A Cluster-Based Debiasing Approach without Protected-Group Supervision

Authors: Capitani, Giacomo; Bolelli, Federico; Porrello, Angelo; Calderara, Simone; Ficarra, Elisa

The failures of Deep Networks can sometimes be ascribed to biases in the data or algorithmic choices. Existing debiasing approaches … (Read full abstract)

The failures of Deep Networks can sometimes be ascribed to biases in the data or algorithmic choices. Existing debiasing approaches exploit prior knowledge to avoid unintended solutions; we acknowledge that, in real-world settings, it could be unfeasible to gather enough prior information to characterize the bias, or it could even raise ethical considerations. We hence propose a novel debiasing approach, termed ClusterFix, which does not require any external hint about the nature of biases. Such an approach alters the standard empirical risk minimization and introduces a per-example weight, encoding how critical and far from the majority an example is. Notably, the weights consider how difficult it is for the model to infer the correct pseudo-label, which is obtained in a self-supervised manner by dividing examples into multiple clusters. Extensive experiments show that the misclassification error incurred in identifying the correct cluster allows for identifying examples prone to bias-related issues. As a result, our approach outperforms existing methods on standard benchmarks for bias removal and fairness.

2024 Relazione in Atti di Convegno

DOI IRIS

Publications by Giacomo Capitani

Evoluzione della Conoscenza nell’Intelligenza Artificiale: Verso Reti Neurali Profonde Robuste e Modulari

Towards Unbiased Continual Learning: Avoiding Forgetting in the Presence of Spurious Correlations

U-Net Transplant: The Role of Pre-training for Model Merging in 3D Medical Segmentation

Update Your Transformer to the Latest Release: Re-Basin of Task Vectors

Beyond the Surface: Comprehensive Analysis of Implicit Bias in Vision-Language Models

ClusterFix: A Cluster-Based Debiasing Approach without Protected-Group Supervision