Publications - AImageLab

BarBeR: A Barcode Benchmarking Repository

Authors: Vezzali, E.; Bolelli, F.; Santi, S.; Grana, C.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Since their invention in 1949, barcodes have remained the preferred method for automatic data capture, playing a crucial role in … (Read full abstract)

Since their invention in 1949, barcodes have remained the preferred method for automatic data capture, playing a crucial role in supply chain management. To detect a barcode in an image, multiple algorithms have been proposed in the literature, with a significant increase of interest in the topic since the rise of deep learning. However, research in the field suffers from many limitations, including the scarcity of public datasets and code implementations, which hampers the reproducibility and reliability of published results. For this reason, we developed "BarBeR" (Barcode Benchmark Repository), a benchmark designed for testing and comparing barcode detection algorithms. This benchmark includes the code implementation of various detection algorithms for barcodes, along with a suite of useful metrics. It offers a range of test setups and can be expanded to include any localization algorithm. In addition, we provide a large, annotated dataset of 8748 barcode images, combining multiple public barcode datasets with standardized annotation formats for both detection and segmentation tasks. Finally, we share the results obtained from running the benchmark on our dataset, offering valuable insights into the performance of different algorithms.

2025 Relazione in Atti di Convegno

DOI IRIS

Bits2Bites: Intra-oral Scans Occlusal Classification

Authors: Borghi, Lorenzo; Lumetti, Luca; Cremonini, Francesca; Rizzo, Federico; Grana, Costantino; Lombardo, Luca; Bolelli, Federico

We introduce Bits2Bites, the first publicly available dataset for occlusal classification from intra-oral scans, comprising 200 paired upper and lower … (Read full abstract)

We introduce Bits2Bites, the first publicly available dataset for occlusal classification from intra-oral scans, comprising 200 paired upper and lower dental arches annotated across multiple clinically relevant dimensions (sagittal, vertical, transverse, and midline relationships). Leveraging this resource, we propose a multi-task learning benchmark that jointly predicts five occlusal traits from raw 3D point clouds using state-of-the-art point-based neural architectures. Our approach includes extensive ablation studies assessing the benefits of multi-task learning against single-task baselines, as well as the impact of automatically-predicted anatomical landmarks as input features. Results demonstrate the feasibility of directly inferring comprehensive occlusion information from unstructured 3D data, achieving promising performance across all tasks. Our entire dataset, code, and pretrained models are publicly released to foster further research in automated orthodontic diagnosis.

2025 Relazione in Atti di Convegno

IRIS

Context-guided Prompt Learning for Continual WSI Classification

Authors: Corso, Giulia; Miccolis, Francesca; Porrello, Angelo; Bolelli, Federico; Calderara, Simone; Ficarra, Elisa

Whole Slide Images (WSIs) are crucial in histological diagnostics, providing high-resolution insights into cellular structures. In addition to challenges like … (Read full abstract)

Whole Slide Images (WSIs) are crucial in histological diagnostics, providing high-resolution insights into cellular structures. In addition to challenges like the gigapixel scale of WSIs and the lack of pixel-level annotations, privacy restrictions further complicate their analysis. For instance, in a hospital network, different facilities need to collaborate on WSI analysis without the possibility of sharing sensitive patient data. A more practical and secure approach involves sharing models capable of continual adaptation to new data. However, without proper measures, catastrophic forgetting can occur. Traditional continual learning techniques rely on storing previous data, which violates privacy restrictions. To address this issue, this paper introduces Context Optimization Multiple Instance Learning (CooMIL), a rehearsal-free continual learning framework explicitly designed for WSI analysis. It employs a WSI-specific prompt learning procedure to adapt classification models across tasks, efficiently preventing catastrophic forgetting. Evaluated on four public WSI datasets from TCGA projects, our model significantly outperforms state-of-the-art methods within the WSI-based continual learning framework. The source code is available at https://github.com/FrancescaMiccolis/CooMIL.

2025 Relazione in Atti di Convegno

IRIS

Enhancing Testicular Ultrasound Image Classification Through Synthetic Data and Pretraining Strategies

Authors: Morelli, Nicola; Marchesini, Kevin; Lumetti, Luca; Santi, Daniele; Grana, Costantino; Bolelli, Federico

Testicular ultrasound imaging is vital for assessing male infertility, with testicular inhomogeneity serving as a key biomarker. However, subjective interpretation … (Read full abstract)

Testicular ultrasound imaging is vital for assessing male infertility, with testicular inhomogeneity serving as a key biomarker. However, subjective interpretation and the scarcity of publicly available datasets pose challenges to automated classification. In this study, we explore supervised and unsupervised pretraining strategies using a ResNet-based architecture, supplemented by diffusion-based generative models to synthesize realistic ultrasound images. Our results demonstrate that pretraining significantly enhances classification performance compared to training from scratch, and synthetic data can effectively substitute real images in the pretraining process, alleviating data-sharing constraints. These methods offer promising advancements toward robust, clinically valuable automated analysis of male infertility. The source code is publicly available at https://github.com/AImageLab-zip/TesticulUS/.

2025 Relazione in Atti di Convegno

IRIS

IM-Fuse: A Mamba-based Fusion Block for Brain Tumor Segmentation with Incomplete Modalities

Authors: Pipoli, Vittorio; Saporita, Alessia; Marchesini, Kevin; Grana, Costantino; Ficarra, Elisa; Bolelli, Federico

Brain tumor segmentation is a crucial task in medical imaging that involves the integrated modeling of four distinct imaging modalities … (Read full abstract)

Brain tumor segmentation is a crucial task in medical imaging that involves the integrated modeling of four distinct imaging modalities to identify tumor regions accurately. Unfortunately, in real-life scenarios, the full availability of such four modalities is often violated due to scanning cost, time, and patient condition. Consequently, several deep learning models have been developed to address the challenge of brain tumor segmentation under conditions of missing imaging modalities. However, the majority of these models have been evaluated using the 2018 version of the BraTS dataset, which comprises only $285$ volumes. In this study, we reproduce and extensively analyze the most relevant models using BraTS2023, which includes 1,250 volumes, thereby providing a more comprehensive and reliable comparison of their performance. Furthermore, we propose and evaluate the adoption of Mamba as an alternative fusion mechanism for brain tumor segmentation in the presence of missing modalities. Experimental results demonstrate that transformer-based architectures achieve leading performance on BraTS2023, outperforming purely convolutional models that were instead superior in BraTS2018. Meanwhile, the proposed Mamba-based architecture exhibits promising performance in comparison to state-of-the-art models, competing and even outperforming transformers. The source code of the proposed approach is publicly released alongside the benchmark developed for the evaluation: https://github.com/AImageLab-zip/IM-Fuse.

2025 Relazione in Atti di Convegno

DOI IRIS

Investigating the ABCDE Rule in Convolutional Neural Networks

Authors: Bolelli, Federico; Lumetti, Luca; Marchesini, Kevin; Candeloro, Ettore; Grana, Costantino

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Convolutional Neural Networks (CNNs) have been broadly employed in dermoscopic image analysis, mainly due to the large amount of data … (Read full abstract)

Convolutional Neural Networks (CNNs) have been broadly employed in dermoscopic image analysis, mainly due to the large amount of data gathered by the International Skin Imaging Collaboration (ISIC). But where do neural networks look? Several authors have claimed that the ISIC dataset is affected by strong biases, i.e. spurious correlations between samples that machine learning models unfairly exploit while discarding the useful patterns they are expected to learn. These strong claims have been supported by showing that deep learning models maintain excellent performance even when "no information about the lesion remains" in the debased input images. With this paper, we explore the interpretability of CNNs in dermoscopic image analysis by analyzing which characteristics are considered by autonomous classification algorithms. Starting from a standard setting, experiments presented in this paper gradually conceal well-known crucial dermoscopic features and thoroughly investigate how CNNs performance subsequently evolves. Experimental results carried out on two well-known CNNs, EfficientNet-B3, and ResNet-152, demonstrate that neural networks autonomously learn to extract features that are notoriously important for melanoma detection. Even when some of such features are removed, the others are still enough to achieve satisfactory classification performance. Obtained results demonstrate that literature claims on biases are not supported by carried-out experiments. Finally, to demonstrate the generalization capabilities of state-of-the-art CNN models for skin lesion classification, a large private dataset has been employed as an additional test set.

2025 Relazione in Atti di Convegno

DOI IRIS

Location Matters: Harnessing Spatial Information to Enhance the Segmentation of the Inferior Alveolar Canal in CBCTs

Authors: Lumetti, Luca; Pipoli, Vittorio; Bolelli, Federico; Ficarra, Elisa; Grana, Costantino

Published in: LECTURE NOTES IN COMPUTER SCIENCE

The segmentation of the Inferior Alveolar Canal (IAC) plays a central role in maxillofacial surgery, drawing significant attention in the … (Read full abstract)

The segmentation of the Inferior Alveolar Canal (IAC) plays a central role in maxillofacial surgery, drawing significant attention in the current research. Because of their outstanding results, deep learning methods are widely adopted in the segmentation of 3D medical volumes, including the IAC in Cone Beam Computed Tomography (CBCT) data. One of the main challenges when segmenting large volumes, including those obtained through CBCT scans, arises from the use of patch-based techniques, mandatory to fit memory constraints. Such training approaches compromise neural network performance due to a reduction in the global contextual information. Performance degradation is prominently evident when the target objects are small with respect to the background, as it happens with the inferior alveolar nerve that develops across the mandible, but involves only a few voxels of the entire scan. In order to target this issue and push state-of-the-art performance in the segmentation of the IAC, we propose an innovative approach that exploits spatial information of extracted patches and integrates it into a Transformer architecture. By incorporating prior knowledge about patch location, our model improves state of the art by ~2 points on the Dice score when integrated with the standard U-Net architecture. The source code of our proposal is publicly released.

2025 Relazione in Atti di Convegno

DOI IRIS

Machine Learning-Based Prediction of Emergency Department Prolonged Length of Stay: A Case Study from Italy

Authors: Perliti Scorzoni, Paolo; Giovanetti, Anita; Bolelli, Federico; Grana, Costantino

Published in: AHFE INTERNATIONAL

Overcrowding in Emergency Departments (EDs) is a pressing concern driven by high patient demand and limited resources. Prolonged Length of … (Read full abstract)

Overcrowding in Emergency Departments (EDs) is a pressing concern driven by high patient demand and limited resources. Prolonged Length of Stay (pLOS), a major contributor to this congestion, may lead to adverse outcomes, including patients leaving without being seen, suboptimal clinical care, increased mortality rates, provider burnout, and escalating healthcare costs. This study investigates the application of various Machine Learning (ML) algorithms to predict both LOS and pLOS. A retrospective analysis examined 32,967 accesses at a northern Italian hospital’s ED between 2022 and 2024. Twelve classification algorithms were evaluated in forecasting pLOS, using clinically relevant thresholds. Two data variants were employed for model comparison: one containing only structured data (e.g., demographics and clinical information), while a second one also including features extracted from free-text nursing notes. To enhance the accuracy of LOS prediction, novel queue-based variables capturing the real-time state of the ED were incorporated as additional dynamic predictors. Compared to single-algorithm models, ensemble models demonstrated superior robustness in forecasting both ED-LOS and ED-pLOS. These findings highlight the potential for integrating ML into EDs practices as auxiliary tools to provide valuable insights into patient flow. By identifying patients at high risk of pLOS, healthcare professionals can proactively implement strategies to expedite care, optimize resource allocation, and ultimately improve patient outcomes and ED efficiency, promoting a more effective and sustainable public healthcare delivery.

2025 Relazione in Atti di Convegno

DOI IRIS

MedShapeNet – a large-scale dataset of 3D medical shapes for computer vision

Authors: Li, Jianning; Zhou, Zongwei; Yang, Jiancheng; Pepe, Antonio; Gsaxner, Christina; Luijten, Gijs; Qu, Chongyu; Zhang, Tiezheng; Chen, Xiaoxi; Li, Wenxuan; Wodzinski, Marek Michal; Friedrich, Paul; Xie, Kangxian; Jin, Yuan; Ambigapathy, Narmada; Nasca, Enrico; Solak, Naida; Melito Gian, Marco; Duc Vu, Viet; Memon Afaque, R.; Schlachta, Christopher; De Ribaupierre, Sandrine; Patel, Rajnikant; Eagleson, Roy; Chen Xiaojun Mächler, Heinrich; Kirschke Jan, Stefan; De La Rosa, Ezequiel; Christ Patrick, Ferdinand; Hongwei Bran, Li; Ellis David, G.; Aizenberg Michele, R.; Gatidis, Sergios; Küstner, Thomas; Shusharina, Nadya; Heller, Nicholas; Rearczyk, Vincent; Depeursinge, Adrien; Hatt, Mathieu; Sekuboyina, Anjany; Löffler Maximilian, T.; Liebl, Hans; Dorent, Reuben; Vercauteren, Tom; Shapey, Jonathan; Kujawa, Aaron; Cornelissen, Stefan; Langenhuizen, Patrick; Ben-Hamadou, Achraf; Rekik, Ahmed; Pujades, Sergi; Boyer, Edmond; Bolelli, Federico; Grana, Costantino; Lumetti, Luca; Salehi, Hamidreza;

Published in: BIOMEDIZINISCHE TECHNIK

Objectives: The shape is commonly used to describe the objects. State-of-the-art algorithms in medical imaging are predominantly diverging from computer … (Read full abstract)

Objectives: The shape is commonly used to describe the objects. State-of-the-art algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surfacemodels are used. This is seen from the growing popularity of ShapeNet (51,300 models) and Princeton ModelNet (127,915 models). However, a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D models of surgical instruments is missing. Methods: We present MedShapeNet to translate datadriven vision algorithms to medical applications and to adapt state-of-the-art vision algorithms to medical problems. As a unique feature, we directly model the majority of shapes on the imaging data of real patients. We present use cases in classifying brain tumors, skull reconstructions, multi-class anatomy completion, education, and 3D printing. Results: By now, MedShapeNet includes 23 datasets with more than 100,000 shapes that are paired with annotations (ground truth). Our data is freely accessible via aweb interface and a Python application programming interface and can be used for discriminative, reconstructive, and variational benchmarks as well as various applications in virtual, augmented, or mixed reality, and 3D printing. Conclusions: MedShapeNet contains medical shapes from anatomy and surgical instruments and will continue to collect data for benchmarks and applications. The project page is: https://medshapenet.ikim.nrw/.

2025 Articolo su rivista

DOI IRIS

MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models

Authors: Pipoli, Vittorio; Saporita, Alessia; Bolelli, Federico; Cornia, Marcella; Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita; Ficarra, Elisa

Recently, Multimodal Large Language Models (MLLMs) have emerged as a leading framework for enhancing the ability of Large Language Models … (Read full abstract)

Recently, Multimodal Large Language Models (MLLMs) have emerged as a leading framework for enhancing the ability of Large Language Models (LLMs) to interpret non-linguistic modalities. Despite their impressive capabilities, the robustness of MLLMs under conditions where one or more modalities are missing remains largely unexplored. In this paper, we investigate the extent to which MLLMs can maintain performance when faced with missing modality inputs. Moreover, we propose a novel framework to mitigate the aforementioned issue called Retrieval-Augmented Generation for missing modalities (MissRAG). It consists of a novel multimodal RAG technique alongside a tailored prompt engineering strategy designed to enhance model robustness by mitigating the impact of absent modalities while preventing the burden of additional instruction tuning. To demonstrate the effectiveness of our techniques, we conducted comprehensive evaluations across five diverse datasets, covering tasks such as audio-visual question answering, audio-visual captioning, and multimodal sentiment analysis.

2025 Relazione in Atti di Convegno

IRIS

Publications by Federico Bolelli

BarBeR: A Barcode Benchmarking Repository

Bits2Bites: Intra-oral Scans Occlusal Classification

Context-guided Prompt Learning for Continual WSI Classification

Enhancing Testicular Ultrasound Image Classification Through Synthetic Data and Pretraining Strategies

IM-Fuse: A Mamba-based Fusion Block for Brain Tumor Segmentation with Incomplete Modalities

Investigating the ABCDE Rule in Convolutional Neural Networks

Location Matters: Harnessing Spatial Information to Enhance the Segmentation of the Inferior Alveolar Canal in CBCTs

Machine Learning-Based Prediction of Emergency Department Prolonged Length of Stay: A Case Study from Italy

MedShapeNet – a large-scale dataset of 3D medical shapes for computer vision

MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models