Publications by Federico Bolelli

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Federico Bolelli

Enabling 8B Bitwise Autoregressive Image Generation on Edge GPUs

Authors: Vezzali, Enrico; Bolelli, Federico; Grana, Costantino; Benini, Luca; Li, Yawei

Visual Autoregressive (VAR) models face a severe "Memory Wall" on edge devices due to large model size and substantial KV-cache … (Read full abstract)

Visual Autoregressive (VAR) models face a severe "Memory Wall" on edge devices due to large model size and substantial KV-cache requirements. In this work, we analyze the Infinity VAR family (2B and 8B) and propose a compression pipeline for deployment on constrained NVIDIA Jetson systems. We diagnose critical bottlenecks: activation outliers reaching 353x the median and channel-skewed cache variance. To address this, we propose a hybrid pipeline combining SVDQuant—to structurally decouple weight outliers—and Asymmetric Per-Channel KV8 quantization. Our approach reduces the Infinity-8B footprint by 64% (37.1GB →13.3GB), fitting it on the mid-range Orin NX with a 4.1x speedup over Flux.1-dev (W4A4), while achieving superior aesthetic alignment (ImageReward 1.13 vs 0.935). Crucially, we also unlock entry-level feasibility for the Infinity-2B, compressing it from 16.0 to 7.71 GB to enable deployment on the Orin Nano. These results establish a new efficiency standard for high-fidelity generative AI at the edge. The code is available at https://github.com/Henvezz95/deepcompressor.

2026 Relazione in Atti di Convegno

FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation

Authors: Saporita, Alessia; Pipoli, Vittorio; Bolelli, Federico; Baraldi, Lorenzo; Acquaviva, Andrea; Ficarra, Elisa

Multimodal Large Language Models (MLLMs) have achieved impressive performance across a variety of vision–language tasks. However, their internal working mechanisms … (Read full abstract)

Multimodal Large Language Models (MLLMs) have achieved impressive performance across a variety of vision–language tasks. However, their internal working mechanisms remain largely underexplored. In his work, we introduce FG-TRACER, a framework designed to analyze the information flow between visual and textual modalities in MLLMs in free-form generation. Notably, our numerically stabilized computational method enables the first systematic analysis of multimodal information flow in underexplored domains such as image captioning and chain-of-thought (CoT) reasoning. We apply FG-TRACER to two state-of-the-art MLLMs—LLaMA 3.2-Vision and LLaVA 1.5—across three vision–language benchmarks—TextVQA, COCO 2014, and ChartQA—and we conduct a word-level analysis of multimodal integration. Our findings uncover distinct patterns of multimodal fusion across models and tasks, demonstrating that fusion dynamics are both model- and task-dependent. Overall, FG-TRACER offers a robust methodology for probing the internal mechanisms of MLLMs in free-form settings, providing new insights into their multimodal reasoning strategies. Our source code is publicly available at https://anonymous.4open.science/r/FG-TRACER-CB5A/.

2026 Relazione in Atti di Convegno

Histological Brain Imaging Super-resolution with Frequency-guided Diffusion Models

Authors: Casari, Giovanni; Bolelli, Federico; Grana, Costantino

High-resolution histological imaging provides essential detail for quantitative brain modeling, yet acquiring whole-brain data at micrometer scale remains technically and … (Read full abstract)

High-resolution histological imaging provides essential detail for quantitative brain modeling, yet acquiring whole-brain data at micrometer scale remains technically and economically challenging. This work introduces Brain-SR, a diffusion-based super-resolution framework designed to reconstruct high-resolution cortical sections from low-resolution BigBrain data. Building upon the InvSR paradigm, our method performs resolution enhancement in the latent space of a pretrained variational autoencoder, guided by a task-specific noise-predictor network. A key contribution is a frequency-domain supervision term that compares the magnitude spectra of predicted and target patches, enforcing spectral consistency while remaining robust to local misalignments. Quantitative evaluations demonstrate that Brain-SR achieves substantial improvements in LPIPS (-27%) and FID (-58%) compared to baseline diffusion Super-Resolution, while spectral analysis confirms accurate recovery of the frequency distribution. The resulting reconstructions preserve neuronal structures consistent with high-resolution references, offering a practical step toward large-scale, morphologically faithful brain histology reconstruction. The code is publicly available to support reproducibility: https://github.com/AImageLab-zip/Brain-SR.

2026 Relazione in Atti di Convegno

Multi-Structure Segmentation in CBCT Volumes: the ToothFairy2 Challenge

Authors: Bolelli, Federico; Lumetti, Luca; Van Nistelrooij, Niels; Vinayahalingam, Shankeeth; Di Bartolomeo, Mattia; Marchesini, Kevin; Pellacani, Arrigo; Candeloro, Ettore; Rosati, Gabriele; Xi, Tong; Isensee, Fabian; Kirchhoff, Yannick; Krämer, Lars; Rokuss, Maximilian; Ulrich, Constantin; Maier-Hein, Klaus; Jiang, Yuxian; Liu, Yusheng; Wang, Lisheng; Wang, Haoshen; Chen, Siyu; Cui, Zhiming; Shi, Pengcheng; Pan, Zhaohong; Liang, Xiaokun; Ma, Qi; Konukoglu, Ender; Wodzinski, Marek; Müller, Henning; Mai, Haipeng; Dang, Xiaobing; Bhandary, Shrajan; Grosu, Radu; Bergé, Stefaan; Anesi, Alexandre; Grana, Costantino

Published in: MEDICAL IMAGE ANALYSIS

Cone-beam computed tomography (CBCT) is widely used for dento-maxillofacial diagnostics and treatment planning, and comprehensive multi-structure segmentation remains time-consuming, limiting … (Read full abstract)

Cone-beam computed tomography (CBCT) is widely used for dento-maxillofacial diagnostics and treatment planning, and comprehensive multi-structure segmentation remains time-consuming, limiting large-scale, reproducible research. In this article, we present ToothFairy2, a MICCAI 2024 challenge on multi-structure segmentation in maxillofacial CBCT. The accompanying dataset comprises 530 CBCT volumes (480 public training, 50 hidden test) with expert 3D annotations of 42 classes, including maxilla, mandible, crowns, bridges, implants, inferior alveolar canals, maxillary sinuses, pharynx, and teeth using the International Tooth Numbering System (FDI). 26 international teams participated in ToothFairy2, and their methods were run and evaluated for voxel-wise multi-class segmentation using a standardized protocol. This report extends the evaluation of teeth to also investigate the current capabilities of tooth detection and FDI numbering. Furthermore, ranking stability was analyzed to assess the robustness of the final challenge outcome. Overall, challenge participants achieved consistently high performance for large, high-contrast structures such as jawbones, pharynx, and most teeth, while maxillary sinuses, dental restorations, and fine structures remain challenging due to class imbalance and metal artifacts. Analysis of tooth-related metrics further revealed that assigning correct FDI numbers was more challenging than delineating individual teeth. By releasing CBCT data, 3D annotations, baseline models, and evaluation code, ToothFairy2 establishes a long-term benchmark to drive the development of automated methods for robust, clinically meaningful multi-structure segmentation in maxillofacial CBCT.

2026 Articolo su rivista

Ontology-Grounded Structured Prediction for Dental CBCT Reporting

Authors: Lumetti, Luca; Di Bartolomeo, Mattia; Pellacani, Arrigo; Anesi, Alex; Grana, Costantino; Bolelli, Federico

We present a dataset and baseline for ontology-grounded structured prediction from dental Cone-Beam Computed Tomography (CBCT) volumes. Building on the … (Read full abstract)

We present a dataset and baseline for ontology-grounded structured prediction from dental Cone-Beam Computed Tomography (CBCT) volumes. Building on the public ToothFairy3 benchmark (532 volumes with expert-level segmentations), we contribute (i) a total of 893 free-text clinical reports for 529 publicly available CBCT volumes, (ii) their conversion into validated RDF/Turtle (Resource Description Framework) instances aligned with a clinician-designed OWL (Web Ontology Language) ontology spanning 13 finding types and multiple qualifier axes, and (iii) a strong baseline demonstrating the effectiveness of our setup and establishing a foundation for future work. We formulate CBCT reporting as a three-stage structured prediction problem—i.e., finding detection, anatomical slot allocation, and property prediction—and introduce a hierarchical evaluation suite of six clinically interpretable metrics that decouple detection, localization, and characterization. A baseline model using frozen multi-scale VoxTell features, a structure-indexed encoder, and ontology-driven prediction heads achieves strong results under 5-fold cross-validation, with stage-decoupled analysis identifying presence detection as the primary deployment bottleneck. Dataset, ontology, and code are publicly released: https://github.com/AImageLab-zip/CBCT-Report

2026 Relazione in Atti di Convegno

The paper has a GitHub, the GitHub has a README, the README has nothing: Reproducibility Signals for Review Support

Authors: Bolelli, Federico; Santoli, Davide; Marchesini, Kevin; Lumetti, Luca; Grana, Costantino

Reproducibility policies promise "checkable" medical-imaging science, yet many submissions still ship unverifiable artifacts. Our analysis of 3722 MICCAI papers shows … (Read full abstract)

Reproducibility policies promise "checkable" medical-imaging science, yet many submissions still ship unverifiable artifacts. Our analysis of 3722 MICCAI papers shows code-linking rising from 51.8% (2021) to 72.5% (2025), but ~13% of linked repositories are inaccessible or empty. We present paper-snitch, a reviewer-facing decision-support tool that turns these signals into an evidence-grounded report. Paper-snitch parses PDFs, resolves and sanity-checks repositories, and applies policy-aware checklists aligned with MICCAI expectations, producing a review-time verifiability score decomposed into interpretable sub-scores plus criterion-linked excerpts and artifacts reviewers can inspect. It never executes untrusted code or attempts GPU-heavy reproduction, focusing instead on bounded, verifiable checks. We compare paper-snitch on 100 randomly sampled MICCAI 2025 papers with human annotators using shared evaluation criteria, indicating that automated, bounded checks can scale reproducibility screening while keeping final decisions with reviewers.

2026 Relazione in Atti di Convegno

ToothFairy3: Scaling CBCT Maxillofacial Segmentation to 77 Classes with U-Mamba2

Authors: Lumetti, Luca; Tan, Zhi Qin; Borghi, Lorenzo; Van Nistelrooij, Niels; Rosati, Gabriele; Addison, Owen; Li, Yupeng; Vinayahalingam, Shankeeth; Grana, Costantino; Bolelli, Federico

Accurate delineation of maxillofacial anatomy in Cone-Beam Computed Tomography (CBCT) is essential for dental planning, but robust automated segmentation remains … (Read full abstract)

Accurate delineation of maxillofacial anatomy in Cone-Beam Computed Tomography (CBCT) is essential for dental planning, but robust automated segmentation remains challenging, due to limited public multi-structure datasets and the high computational burden of 3D deep learning models. We present and release ToothFairy3, a large-scale CBCT benchmark that extends ToothFairy2 with 102 additional fully annotated scans and an expanded taxonomy covering 77 classes, including 32 tooth-specific pulp cavities and small neurovascular structures. ToothFairy3 comprises 582 volumes (over 40000 annotated objects), with 532 released with voxel-level labels and 50 held out for leakage-free, server-side evaluation. We also introduce U-Mamba2, an efficient U-Net-style architecture that inserts a Mamba2 state-space block at the bottleneck to capture global context with favorable computational scaling. Our proposed domain-informed training further improves the learning of maxillofacial anatomies. Across CNN, Transformer, and Mamba baselines, U-Mamba2 achieves competitive Dice/HD95 scores with lower latency and, compared with training on state-of-the-art public CBCT datasets, ToothFairy3-trained models generalize best to the hidden test set, particularly for maxillary structures.

2026 Relazione in Atti di Convegno

A Deep-Learning-Based Method for Real-Time Barcode Segmentation on Edge CPUs

Authors: Vezzali, Enrico; Vorabbi, Lorenzo; Grana, Costantino; Bolelli, Federico

Barcodes are a critical technology in industrial automation, logistics, and retail, enabling fast and reliable data capture. While deep learning … (Read full abstract)

Barcodes are a critical technology in industrial automation, logistics, and retail, enabling fast and reliable data capture. While deep learning has significantly improved barcode localization accuracy, most modern architectures remain too computationally demanding for real-time deployment on embedded systems without dedicated hardware acceleration. In this work, we present BaFaLo (Barcode Fast Localizer), an ultra-lightweight segmentation-based neural network for barcode localization. Our model is specifically optimized for real-time performance on low-power CPUs while maintaining high localization accuracy for both 1D and 2D barcodes. It features a two-branch architecture—comprising a local feature extractor and a global context module—and is tailored for low-resolution inputs to improve inference speed further. We benchmark BaFaLo against several lightweight architectures for object detection or segmentation, including YOLO Nano, Fast-SCNN, BiSeNet V2, and ContextNet, using the BarBeR dataset. BaFaLo achieves the fastest inference time among all deep-learning models tested, operating at 57.62ms per frame on a single CPU core of a Raspberry Pi 3B+. Despite its compact design, it achieves a decoding rate nearly equivalent to YOLO Nano for 1D barcodes and only 3.5 percentage points lower for 2D barcodes while being approximately nine times faster.

2025 Relazione in Atti di Convegno

Accurate 3D Medical Image Segmentation with Mambas

Authors: Lumetti, Luca; Pipoli, Vittorio; Marchesini, Kevin; Ficarra, Elisa; Grana, Costantino; Bolelli, Federico

Published in: PROCEEDINGS INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING

CNNs and Transformer-based architectures are recently dominating the field of 3D medical segmentation. While CNNs face limitations in the local … (Read full abstract)

CNNs and Transformer-based architectures are recently dominating the field of 3D medical segmentation. While CNNs face limitations in the local receptive field, Transformers require significant memory and data, making them less suitable for analyzing large 3D medical volumes. Consequently, fully convolutional network models like U-Net are still leading the 3D segmentation scenario. Although efforts have been made to reduce the Transformers computational complexity, such optimized models still struggle with content-based reasoning. This paper examines Mamba, a Recurrent Neural Network (RNN) based on State Space Models (SSMs), which achieves linear complexity and has outperformed Transformers in long-sequence tasks. Specifically, we assess Mamba’s performance in 3D medical segmentation using three widely recognized and commonly employed datasets and propose architectural enhancements to improve its segmentation effectiveness by mitigating the primary shortcomings of existing Mamba-based solutions.

2025 Relazione in Atti di Convegno

BarBeR - Barcode Benchmark Repository: Implementation and Reproducibility Notes

Authors: Vezzali, Enrico; Bolelli, Federico; Santi, Stefano; Grana, Costantino

This paper provides a detailed description of how to install, set up, and use "BarBeR" (Barcode Benchmark Repository) to reproduce … (Read full abstract)

This paper provides a detailed description of how to install, set up, and use "BarBeR" (Barcode Benchmark Repository) to reproduce the results presented in the ICPR 2024 paper "BarBeR: A Barcode Benchmarking Repository". The paper details the tests available in the repository and how the configuration parameters affect and influence experimental results.

2025 Relazione in Atti di Convegno
2 3 »

Page 1 of 9 • Total publications: 89