Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

CLOSED-FORM MERGING OF PARAMETER-EFFICIENT MODULES FOR FEDERATED CONTINUAL LEARNING

Authors: Salami, R.; Buzzega, P.; Mosconi, M.; Bonato, J.; Sabetta, L.; Calderara, S.

Model merging has emerged as a crucial technique in Deep Learning, enabling the integration of multiple models into a unified … (Read full abstract)

Model merging has emerged as a crucial technique in Deep Learning, enabling the integration of multiple models into a unified system while preserving performance and scalability. In this respect, the compositional properties of low-rank adaptation techniques (e.g., LoRA) have proven beneficial, as simple averaging LoRA modules yields a single model that mostly integrates the capabilities of all individual modules. Building on LoRA, we take a step further by imposing that the merged model matches the responses of all learned modules. Solving this objective in closed form yields an indeterminate system with A and B as unknown variables, indicating the existence of infinitely many closed-form solutions. To address this challenge, we introduce LoRM, an alternating optimization strategy that trains one LoRA matrix at a time. This allows solving for each unknown variable individually, thus finding a unique solution. We apply our proposed methodology to Federated Class-Incremental Learning (FCIL), ensuring alignment of model responses both between clients and across tasks. Our method demonstrates state-of-the-art performance across a range of FCIL scenarios. The code to reproduce our experiments is available at this http URL.

2025 Relazione in Atti di Convegno

Context-guided Prompt Learning for Continual WSI Classification

Authors: Corso, Giulia; Miccolis, Francesca; Porrello, Angelo; Bolelli, Federico; Calderara, Simone; Ficarra, Elisa

Whole Slide Images (WSIs) are crucial in histological diagnostics, providing high-resolution insights into cellular structures. In addition to challenges like … (Read full abstract)

Whole Slide Images (WSIs) are crucial in histological diagnostics, providing high-resolution insights into cellular structures. In addition to challenges like the gigapixel scale of WSIs and the lack of pixel-level annotations, privacy restrictions further complicate their analysis. For instance, in a hospital network, different facilities need to collaborate on WSI analysis without the possibility of sharing sensitive patient data. A more practical and secure approach involves sharing models capable of continual adaptation to new data. However, without proper measures, catastrophic forgetting can occur. Traditional continual learning techniques rely on storing previous data, which violates privacy restrictions. To address this issue, this paper introduces Context Optimization Multiple Instance Learning (CooMIL), a rehearsal-free continual learning framework explicitly designed for WSI analysis. It employs a WSI-specific prompt learning procedure to adapt classification models across tasks, efficiently preventing catastrophic forgetting. Evaluated on four public WSI datasets from TCGA projects, our model significantly outperforms state-of-the-art methods within the WSI-based continual learning framework. The source code is available at https://github.com/FrancescaMiccolis/CooMIL.

2025 Relazione in Atti di Convegno

Continual Facial Features Transfer for Facial Expression Recognition

Authors: Maharjan, R. S.; Bonicelli, L.; Romeo, M.; Calderara, S.; Cangelosi, A.; Cucchiara, R.

Published in: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

2025 Articolo su rivista

Decoding Facial Expressions in Video: A Multiple Instance Learning Perspective on Action Units

Authors: Del Gaudio, Livia; Cuculo, Vittorio; Cucchiara, Rita

Facial expression recognition (FER) in video sequences is a longstanding challenge in affective computing and computer vision, particularly due to … (Read full abstract)

Facial expression recognition (FER) in video sequences is a longstanding challenge in affective computing and computer vision, particularly due to the temporal complexity and subtlety of emotional expressions. In this paper, we propose a novel pipeline that leverages facial Action Units (AUs) as structured time series descriptors of facial muscle activity, enabling emotion classification in videos through a Multiple Instance Learning (MIL) framework. Our approach models each video as a bag of AU-based instances, capturing localized temporal patterns, and allows for robust learning even when only coarse video-level emotion labels are available. Crucially, the approach incorporates interpretability mechanisms that highlight the temporal segments most influential to the final prediction, providing informed decision-making and facilitating downstream analysis. Experimental results on benchmark FER video datasets demonstrate that our method achieves competitive performance using only visual data, without requiring multimodal signals or frame-level supervision. This highlights its potential as an interpretable and efficient solution for weakly supervised emotion recognition in real-world scenarios.

2025 Relazione in Atti di Convegno

Deep Learning for Classifying Anti-Shigella Opsono- Phagocytosis-Promoting Monoclonal Antibodies

Authors: Pianfetti, Elena; Cardamone, Dario; Roscioli, Emanuele; Ciano, Giorgio; Maccari, Giuseppe; Sala, Claudia; Micoli, Francesca; Rappuoli, Rino; Medini, Duccio; Ficarra, Elisa

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Shigellosis is an acute small intestine infection caused by different species of Shigella. Worldwide, the emergence of antibiotic-resistant strains aggravates … (Read full abstract)

Shigellosis is an acute small intestine infection caused by different species of Shigella. Worldwide, the emergence of antibiotic-resistant strains aggravates the impact of Shigella infections. In this context, human monoclonal antibodies (mAbs) offer an alternative to traditional antimicrobials. However, identifying a potent candidate mAb requires intense and meticulous efforts. Here, we show the potential of Deep Learning to screen mAbs rapidly. We measured the phagocytosis-promoting activity of mAbs by analyzing images collected with a high-throughput and high-content confocal fluorescence microscope. We acquired images of S. sonnei and S. flexneri infecting THP-1-derived macrophages and evaluated the effect of different mAbs and of a wide selection of Deep Learning tools. We found that our model can generalize on strains and mAbs not encountered in training. Importantly, our approach enables the screening and characterization of multiple anti-Shigella mAbs at the same time, facilitating the identification of potent antibacterial candidates. Our code is available on the GitHub repository vOPA_Shigella.

2025 Relazione in Atti di Convegno

Depth-Based Privileged Information for Boosting 3D Human Pose Estimation on RGB

Authors: Simoni, A.; Marchetti, F.; Borghi, G.; Becattini, F.; Davoli, D.; Garattoni, L.; Francesca, G.; Seidenari, L.; Vezzani, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2025 Relazione in Atti di Convegno

Diffusion Transformers for Tabular Data Time Series Generation

Authors: Garuti, Fabrizio; Sangineto, Enver; Luetto, Simone; Forni, Lorenzo; Cucchiara, Rita

2025 Relazione in Atti di Convegno

DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection

Authors: Cappellino, Chiara; Mancusi, Gianluca; Mosconi, Matteo; Porrello, Angelo; Calderara, Simone; Cucchiara, Rita

2025 Relazione in Atti di Convegno

DualPose: Dual-Block Transformer Decoder with Contrastive Denoising for Multi-Person Pose Estimation

Authors: Fincato, M.; Vezzani, R.

Published in: SENSORS

Multi-person pose estimation is the task of detecting and regressing the keypoint coordinates of multiple people in a single image. … (Read full abstract)

Multi-person pose estimation is the task of detecting and regressing the keypoint coordinates of multiple people in a single image. Significant progress has been achieved in recent years, especially with the introduction of transformer-based end-to-end methods. In this paper, we present DualPose, a novel framework that enhances multi-person pose estimation by leveraging a dual-block transformer decoding architecture. Class prediction and keypoint estimation are split into parallel blocks so each sub-task can be separately improved and the risk of interference is reduced. This architecture improves the precision of keypoint localization and the model's capacity to accurately classify individuals. To improve model performance, the Keypoint-Block uses parallel processing of self-attentions, providing a novel strategy that improves keypoint localization accuracy and precision. Additionally, DualPose incorporates a contrastive denoising (CDN) mechanism, leveraging positive and negative samples to stabilize training and improve robustness. Thanks to CDN, a variety of training samples are created by introducing controlled noise into the ground truth, improving the model's ability to discern between valid and incorrect keypoints. DualPose achieves state-of-the-art results outperforming recent end-to-end methods, as shown by extensive experiments on the MS COCO and CrowdPose datasets. The code and pretrained models are publicly available.

2025 Articolo su rivista

ECoGNet: an EEG-based Effective Connectivity Graph Neural Network for Brain Disorder Detection

Authors: Burger, Jacopo; Cuculo, Vittorio; D'Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella

Alzheimer’s Disease (AD) and Frontotemporal Dementia (FTD), among the most prevalent neurodegenerative disorders, disrupt brain activity and connectivity, highlighting the … (Read full abstract)

Alzheimer’s Disease (AD) and Frontotemporal Dementia (FTD), among the most prevalent neurodegenerative disorders, disrupt brain activity and connectivity, highlighting the need for tools that can effectively capture these alterations. Effective Connectivity Networks (ECNs), which model causal interactions between brain regions, offer a promising approach to characterizing AD and FTD related neural changes. In this study, we estimate ECNs from EEG traces using a state-of-the-art causal discovery method specifically designed for time-series data, to recover the causal structure of the interactions between brain areas. The recovered ECNs are integrated into a novel Graph Neural Network architecture (ECoGNet), where nodes represent brain regions and edge features encode causal relationships. Our method combines ECNs with features summarizing local brain dynamics to improve AD and FTD detection. Evaluated on a publicly available EEG dataset, the proposed approach demonstrates superior performance compared to models that either use non-causal connectivity networks or omit connectivity information entirely.

2025 Relazione in Atti di Convegno

Page 8 of 110 • Total publications: 1098