Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

A Workflow for Cost- and Time-Aware Refueling Itinerary Optimization

Authors: Savarese, Marco; Zaccagnino, Carmine; De Blasi, Antonio; Salici, Giacomo; Cascianelli, Silvia; Vezzani, Roberto; Grazia, Carlo Augusto

The complete workflow of the RI-PIENO framework is presented, a system for refueling itinerary optimization that extends the original PIENO … (Read full abstract)

The complete workflow of the RI-PIENO framework is presented, a system for refueling itinerary optimization that extends the original PIENO design. While prior work introduced the conceptual modules of RI-PIENO, their operational pipeline was not described in detail. This study makes the workflow explicit, covering the end-to-end process from CAN Bus data acquisition and stop detection to the construction of daily trip graphs, refueling optimization, and mileage prediction. By clarifying the sequence of operations, the contribution provides a reproducible and extensible foundation for future research and development.

2026 Relazione in Atti di Convegno

An Investigation on Incremental Learning from Unbalanced Streamed Data

Authors: Borghi, Guido; Graffieti, Gabriele; Vezzani, Roberto

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2026 Relazione in Atti di Convegno

CAMNet: Leveraging Cooperative Awareness Messages for Vehicle Trajectory Prediction

Authors: Grasselli, Mattia; Porrello, Angelo; Grazia, Carlo Augusto

2026 Relazione in Atti di Convegno

DOLFIN: Balancing Stability and Plasticity in Federated Continual Learning

Authors: Moussadek, Omayma; Salami, Riccardo; Calderara, Simone

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Federated continual learning (FCL) enables models to learn new tasks across multiple distributed clients, protecting privacy and without forgetting previously … (Read full abstract)

Federated continual learning (FCL) enables models to learn new tasks across multiple distributed clients, protecting privacy and without forgetting previously acquired knowledge. However, current methods face challenges balancing performance, privacy preservation, and communication efficiency. We introduce a Distributed Online LoRA for Federated INcremental learning methodDOLFIN, a novel approach combining Vision Transformers with low-rank adapters designed to efficiently and stably learn new tasks in federated environments. Our method leverages LoRA for minimal communication overhead and incorporates Dual Gradient Projection Memory (DualGPM) to prevent forgetting. Evaluated on CIFAR-100, ImageNet-R, ImageNet-A, and CUB-200 under two Dirichlet heterogeneity settings,DOLFINconsistently surpasses six strong baselines in final average accuracy while matching their memory footprint. Orthogonal low-rank adapters offer an effective and scalable solution for privacy-preserving continual learning in federated settings.

2026 Relazione in Atti di Convegno

FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation

Authors: Saporita, Alessia; Pipoli, Vittorio; Bolelli, Federico; Baraldi, Lorenzo; Acquaviva, Andrea; Ficarra, Elisa

Multimodal Large Language Models (MLLMs) have achieved impressive performance across a variety of vision–language tasks. However, their internal working mechanisms … (Read full abstract)

Multimodal Large Language Models (MLLMs) have achieved impressive performance across a variety of vision–language tasks. However, their internal working mechanisms remain largely underexplored. In his work, we introduce FG-TRACER, a framework designed to analyze the information flow between visual and textual modalities in MLLMs in free-form generation. Notably, our numerically stabilized computational method enables the first systematic analysis of multimodal information flow in underexplored domains such as image captioning and chain-of-thought (CoT) reasoning. We apply FG-TRACER to two state-of-the-art MLLMs—LLaMA 3.2-Vision and LLaVA 1.5—across three vision–language benchmarks—TextVQA, COCO 2014, and ChartQA—and we conduct a word-level analysis of multimodal integration. Our findings uncover distinct patterns of multimodal fusion across models and tasks, demonstrating that fusion dynamics are both model- and task-dependent. Overall, FG-TRACER offers a robust methodology for probing the internal mechanisms of MLLMs in free-form settings, providing new insights into their multimodal reasoning strategies. Our source code is publicly available at https://anonymous.4open.science/r/FG-TRACER-CB5A/.

2026 Relazione in Atti di Convegno

Generating Synthetic Data with Large Language Models for Low-Resource Sentence Retrieval

Authors: Caffagni, Davide.; Cocchi, Federico; Mambelli, Anna; Tutrone, Fabio; Zanella, Marco; Cornia, Marcella.; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Sentence similarity search is a fundamental task in information retrieval, enabling applications such as search engines, question answering, and textual … (Read full abstract)

Sentence similarity search is a fundamental task in information retrieval, enabling applications such as search engines, question answering, and textual analysis. However, retrieval systems often struggle when training data are scarce, as is the case for low-resource languages or specialized domains such as ancient texts. To address this challenge, we propose a novel paradigm for domain-specific sentence similarity search, where the embedding space is shaped by a combination of limited real data and a large amount of synthetic data generated by Large Language Models (LLMs). Specifically, we employ LLMs to generate domain-specific sentence pairs and fine-tune a sentence embedding model, effectively distilling knowledge from the LLM to the retrieval model. We validate our method through a case study on biblical intertextuality in Latin, demonstrating that synthetic data augmentation significantly improves retrieval effectiveness in a domain with scarce annotated resources. More broadly, our approach offers a scalable and adaptable framework for enhancing retrieval in domain-specific contexts. Source code and trained models are available at https://github.com/aimagelab/biblical-retrieval-synthesis.

2026 Relazione in Atti di Convegno

Gradient-sign Masking for Task Vector Transport Across Pre-Trained Models

Authors: Rinaldi, Filippo; Panariello, Aniello; Salici, Giacomo; Liu, Fengyuan; Ciccone, Marco; Porrello, Angelo; Calderara, Simone

When a new release of a foundation model is published, practitioners typically need to repeat fine-tuning, even if the same … (Read full abstract)

When a new release of a foundation model is published, practitioners typically need to repeat fine-tuning, even if the same task was already tackled in the previous version. A promising alternative is to reuse the parameter changes (i.e., task vectors) that capture how a model adapts to a specific task. However, these vectors often fail to transfer across different pre-trained models because their parameter spaces are misaligned. In this work, we show that successful transfer depends strongly on the gradient-sign structure of the new model. Based on this insight, we propose GradFix, which approximates the ideal sign structure and leverages it to transfer knowledge using only a handful of labeled samples. Notably, this requires no additional fine-tuning: we only compute a few target-model gradients without parameter updates and mask the source task vector accordingly. This yields an update that is locally aligned with the target loss landscape, effectively rebasing the task vector onto the new pre-training. We provide a theoretical guarantee that our method ensures first-order descent. Empirically, we demonstrate significant performance gains on vision and language benchmarks, consistently outperforming naive task vector addition and few-shot fine-tuning. We further show that transporting task vectors improves multi-task and multi-source model merging. Code is available at https://github.com/fillo-rinaldi/GradFix.

2026 Relazione in Atti di Convegno

Histological Brain Imaging Super-resolution with Frequency-guided Diffusion Models

Authors: Casari, Giovanni; Bolelli, Federico; Grana, Costantino

High-resolution histological imaging provides essential detail for quantitative brain modeling, yet acquiring whole-brain data at micrometer scale remains technically and … (Read full abstract)

High-resolution histological imaging provides essential detail for quantitative brain modeling, yet acquiring whole-brain data at micrometer scale remains technically and economically challenging. This work introduces Brain-SR, a diffusion-based super-resolution framework designed to reconstruct high-resolution cortical sections from low-resolution BigBrain data. Building upon the InvSR paradigm, our method performs resolution enhancement in the latent space of a pretrained variational autoencoder, guided by a task-specific noise-predictor network. A key contribution is a frequency-domain supervision term that compares the magnitude spectra of predicted and target patches, enforcing spectral consistency while remaining robust to local misalignments. Quantitative evaluations demonstrate that Brain-SR achieves substantial improvements in LPIPS (-27%) and FID (-58%) compared to baseline diffusion Super-Resolution, while spectral analysis confirms accurate recovery of the frequency distribution. The resulting reconstructions preserve neuronal structures consistent with high-resolution references, offering a practical step toward large-scale, morphologically faithful brain histology reconstruction. The code is publicly available to support reproducibility: https://github.com/AImageLab-zip/Brain-SR.

2026 Relazione in Atti di Convegno

Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

Authors: Lobba, Davide; Sanguigni, Fulvio; Ren, Bin; Cornia, Marcella; Cucchiara, Rita; Sebe, Nicu

Virtual try-on (VTON) has been widely explored for rendering garments onto person images, while its inverse task, virtual try-off (VTOFF), … (Read full abstract)

Virtual try-on (VTON) has been widely explored for rendering garments onto person images, while its inverse task, virtual try-off (VTOFF), remains largely overlooked. VTOFF aims to recover standardized product images of garments directly from photos of clothed individuals. This capability is of great practical importance for e-commerce platforms, large-scale dataset curation, and the training of foundation models. Unlike VTON, which must handle diverse poses and styles, VTOFF naturally benefits from a consistent output format in the form of flat garment images. However, existing methods face two major limitations: (i) exclusive reliance on visual cues from a single photo often leads to ambiguity, and (ii) generated images usually suffer from loss of fine details, limiting their real-world applicability. To address these challenges, we introduce TEMU-VTOFF, a Text-Enhanced MUlti-category framework for VTOFF. Our architecture is built on a dual DiT-based backbone equipped with a multimodal attention mechanism that jointly exploits image, text, and mask information to resolve visual ambiguities and enable robust feature learning across garment categories. To explicitly mitigate detail degradation, we further design an alignment module that refines garment structures and textures, ensuring high-quality outputs. Extensive experiments on VITON-HD and Dress Code show that TEMU-VTOFF achieves new state-of-the-art performance, substantially improving both visual realism and consistency with target garments. Code and models are available at: https://temu-vtoff-page.github.io/.

2026 Relazione in Atti di Convegno

Modulation of Aerobic Glycolysis Genes During the Progression of Retinitis Pigmentosa

Authors: Adani, E.; Vasquez, S. S. V.; Lovino, M.; Bighinati, A.; Cappellino, L.; D'Alessandro, S.; Kalatzis, V.; Marigo, V.

Published in: INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE

PURPOSE. Photoreceptors are retinal cells with a high glucose metabolism and retinal degeneration, specifically retinitis pigmentosa (RP), affects glycolysis. We … (Read full abstract)

PURPOSE. Photoreceptors are retinal cells with a high glucose metabolism and retinal degeneration, specifically retinitis pigmentosa (RP), affects glycolysis. We aimed to evaluate changes in the expression of genes related to glucose metabolism in rod photoreceptors at different stages of retinal degeneration in murine models and human retinal organoids. METHODS. RNA sequencing (RNA-seq) analysis was performed on a photoreceptor-like cell line induced to undergo degeneration and validated by real-time qPCR analysis of retinas from two murine models and one human organoid model of RP. Bioinformatic analysis was performed on published RNA-seq datasets from three murine RP models. Real-time qPCR analysis was also performed on retinas treated with an adeno-associated virus type 2 vector carrying the neurotrophic H105A peptide, derived from the pigment epithelium-derived factor. RESULTS. The aerobic glycolysis genes, Hk2, Pkm1, Pkm2, Ldha, and Slc6a6 and other glucose metabolism genes were found downregulated in the in vitro model of photoreceptor degeneration and in the in vivo RhoP23H/+, rd1, and rd10 models at early stages of the disease. The decreased expression of the aerobic glycolysis genes, except for PKM2, was confirmed in human organoids with mutations in the USH2A gene associated with RP. Expression was partially recovered in RhoP23H/+ retinas after treatment with the adeno-associated virus type 2 vector expressing the neurotrophic H105A peptide. CONCLUSIONS. Glucose metabolism gene expression was found altered during the progression of RP in murine and human models of the disease. Expression was partially recovered in a molecular response to the treatment with the neurotrophic factor H105A.

2026 Articolo su rivista
2 3 »

Page 1 of 106 • Total publications: 1059