Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering

Authors: Compagnoni, Alberto; Morini, Marco; Sarto, Sara; Cocchi, Federico; Caffagni, Davide; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Multimodal Large Language Models (MLLMs) have shown impressive capabilities in jointly understanding text, images, and videos, often evaluated via Visual … (Read full abstract)

Multimodal Large Language Models (MLLMs) have shown impressive capabilities in jointly understanding text, images, and videos, often evaluated via Visual Question Answering (VQA). However, even state-of-the-art MLLMs struggle with domain-specific or knowledge-intensive queries, where relevant information is underrepresented in pre-training data. Knowledge-based VQA (KB-VQA) addresses this by retrieving external documents to condition answer generation, but current retrieval-augmented approaches suffer from low precision, noisy passages, and limited reasoning. To address this, we propose ReAG, a novel Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages, ensuring high-quality additional context. The model follows a multi-stage training strategy leveraging reinforcement learning to enhance reasoning over retrieved content, while supervised fine-tuning serves only as a cold start. Extensive experiments on Encyclopedic-VQA and InfoSeek demonstrate that ReAG significantly outperforms prior methods, improving answer accuracy and providing interpretable reasoning grounded in retrieved evidence. Our source code is publicly available at: https://github.com/aimagelab/ReAG.

2026 Relazione in Atti di Convegno

Scalare l’Intelligenza Artificiale per l’Analisi di Immagini Orali e Dentali

Authors: Lumetti, Luca

La tomografia computerizzata a fascio conico (Cone Beam Computed Tomography, CBCT) è centrale nella pratica odontoiatrica e maxillo-facciale contemporanea, ma … (Read full abstract)

La tomografia computerizzata a fascio conico (Cone Beam Computed Tomography, CBCT) è centrale nella pratica odontoiatrica e maxillo-facciale contemporanea, ma i progressi nell’analisi automatizzata sono stati limitati dalla scarsità di dataset pubblici disponibili. Questa tesi affronta tale collo di bottiglia creando un ecosistema aperto ed estensibile che combina dataset, strumenti di annotazione, progressi algoritmici e dimostra come questi elementi interagiscano ciclicamente per accelerare la ricerca e la traduzione in prodotti clinici. Il dataset Maxillo è stato il primo nel suo genere, fornendo 91 volumi densamente annotati e 256 scansioni annotate in modo sparso per l’annotazione del Canale Alveolare Inferiore. La serie ToothFairy, a cui questa tesi ha contribuito, si è basata su queste fondamenta: la prima versione di ToothFairy ha aumentato le annotazioni dense a 156 volumi; ToothFairy2 si è espansa fino a 480 volumi CBCT, ciascuno con 42 classi semantiche; e ToothFairy3 ha ulteriormente ampliato il corpus a 532 volumi e 77 classi, migliorando al contempo la qualità delle annotazioni e la diversità degli scanner utilizzati. A complemento delle CBCT, il dataset Bits2Bites, anch'esso parte di questa tesi, ha fornito 200 coppie di scansioni intra-orali registrate con annotazioni multi-etichetta di occlusione. Tutte le risorse sono state rilasciate in modo aperto per consentire benchmarking riproducibili e sviluppi successivi. Per scalare le annotazioni senza sacrificare la fedeltà clinica, ho sviluppato strumenti di annotazione semi-automatizzati e una rigorosa pipeline di controllo qualità che combina modelli predittivi con la revisione da parte di esperti. Fondamentalmente, la creazione dei dataset, gli strumenti e lo sviluppo dei modelli sono progrediti in modo ciclico: dati aggiuntivi hanno permesso modelli migliori; modelli migliori hanno alimentato strumenti di annotazione più rapidi e accurati; e strumenti migliorati hanno a loro volta prodotto dataset più grandi e di qualità superiore, costituendo il contributo intellettuale centrale di questo lavoro. Su questa base di dati, ho migliorato i metodi di segmentazione volumetrica: moduli basati su architettura transformer che codificano esplicitamente le relazioni spaziali tra patch per preservare il dettaglio a livello di voxel aggregando al contempo il contesto a lungo raggio, e adattamenti dell'architettura Mamba per una segmentazione 3D efficiente e ad alta precisione. Infine, ho introdotto U-Net Transplant, un framework di fusione di modelli che propone tecniche innovative per aggiornare e specializzare modelli clinici senza un riaddestramento completo, riducendo i costi di rideploy, lo spazio di archiviazione e i rischi di esposizione dei dati. Nel complesso, questo ecosistema ha fornito il più grande benchmark CBCT aperto per la segmentazione maxillo-facciale fino ad oggi, insieme a un insieme coerente di metodi e strumenti che hanno migliorato in modo sostanziale l’accuratezza, l’efficienza e la gestione del ciclo di vita dell’IA clinica, abilitando una ricerca e un’implementazione dell’IA dentale più rapide, sicure e riproducibili.

2026 Tesi di dottorato

Searching for New Possible Peripheral Biomarkers of Cognitive Decline in Down Syndrome: The Role of IL-18 Pathway and its Interaction with TGF-β1 and TNF-α

Authors: Grasso, M.; Fidilio, A.; L'Episcopo, F.; Recupero, M.; Barone, C.; Lovino, M.; Alboni, S.; Bacalini, M. G.; Caruso, G.; Greco, D.; Buono, S.; De La Torre, R.; Tascedda, F.; Blom, J. M.; Benatti, C.; Caraci, F.

Published in: NEUROMOLECULAR MEDICINE

Down syndrome (DS) represents one of the most common genetic disorders attributable to a partial or complete trisomy of chromosome … (Read full abstract)

Down syndrome (DS) represents one of the most common genetic disorders attributable to a partial or complete trisomy of chromosome 21 that affects about 1 in 700 individuals at birth. The diagnosis of Alzheimer's Disease (AD)-correlated cognitive decline in this population requires new approaches and new biomarkers that comprehensively assess health status and early cognitive decline. In this observational study, we explored for the first time the relation of IL-18, a cytokine member of IL-1 family involved in both innate and acquired immune responses, with DS associated cognitive decline. We observed that plasma total IL-18, in subjects with DS over 35 with and without AD-related cognitive decline, and plasma concentrations of its binding protein in subjects with DS (19-35 years) were correlated with lower plasma concentrations of Transforming Growth Factor (TGF-beta 1), which are linked to an increased rate of cognitive decline in adults with DS. In addition, we found a significant association between low baseline concentrations of Free IL-18, the active form of the cytokine, and an increased rate of cognitive decline at 12 months, calculated as delta of the Test for Severe Impairment (dTSI), in individuals with DS (19-35 years). Finally, we demonstrated a reduction of Free IL-18/TNF-alpha ratio, considered as a new possible double biomarker, in both young and older adult DS subjects without AD-related cognitive decline (area under the receiver operating curve (AUC) was 0.82 and 0.71, respectively), suggesting the advantage of the composite biomarkers in the discrimination of patients from healthy people over single biomarkers.

2026 Articolo su rivista

Segment-wise Anomaly Detection via Compression Tokens in Industrial Production Lines

Authors: Salici, Giacomo; Köhler, Stefan; Fiorina, Andrea; Zannella, Franco; Porrello, Angelo; Calderara, Simone

We present a predictive maintenance approach for industrial production lines based on multivariate segment-wise time-series analysis. To address the high … (Read full abstract)

We present a predictive maintenance approach for industrial production lines based on multivariate segment-wise time-series analysis. To address the high cost of collecting anomalous samples, we propose a novelty detection framework in which a transformer autoencoder is trained in a semi-supervised fashion exclusively on nominal sequences, and anomaly scores are derived from reconstruction error at test time. We introduce a set of learnable “compression tokens” into the transformer encoder; these tokens serve as the bottleneck from which the decoder reconstructs the input. We compare this model against an MLP-based autoencoder baseline; the results show that the novelty-detection model remains strong, with near-perfect performance under time-aware and device-aware validation, which are the conditions that most faithfully simulate deployment.

2026 Relazione in Atti di Convegno

Sketch2Stitch: GANs for Abstract Sketch-Based Dress Synthesis

Authors: Farooq Khan, Faizan; Mohamed Bakr, Eslam; Morelli, Davide; Cornia, Marcella; Cucchiara, Rita; Elhoseiny, Mohamed

In the realm of creative expression, not everyone possesses the gift of effortlessly translating their imaginative visions into flawless sketches. … (Read full abstract)

In the realm of creative expression, not everyone possesses the gift of effortlessly translating their imaginative visions into flawless sketches. More often than not, the outcome resembles an abstract, perhaps even slightly distorted representation. The art of producing impeccable sketches is not only challenging but also a time-consuming process. Our work is the first of this kind in transforming abstract, sometimes deformed garment sketches into photorealistic catalog images, to empower the everyday individual to become their own fashion designer. We create Sketch2Stitch, a dataset featuring over 65,000 abstract sketch images generated from garments of DressCode and VITONHD, two benchmark datasets in the virtual try-on task. Sketch2Stitch is the first dataset in the literature to provide abstract sketches in the fashion domain. We propose a StyleGAN-based generative framework that bridges freehand sketching with photorealistic garment synthesis. We demonstrate that our framework allows users to sketch rough outlines and optionally provide color hints, producing realistic designs in seconds. Experimental results demonstrate, both quantitatively and qualitatively, that the proposed framework achieves superior performance against various baselines and existing methods on both subsets of our dataset. Our work highlights a pathway toward AI-assisted fashion design tools, democratizing garment ideation for students, independent designers, and casual creators.

2026 Relazione in Atti di Convegno

Tecniche avanzate di Intelligenza Artificiale per l’apprendimento continuo e robusto su dati strutturati

Authors: Menabue, Martin

I metodi di Intelligenza Artificiale hanno raggiunto risultati notevoli in diversi ambiti, ma la loro applicazione efficace a dati dinamici … (Read full abstract)

I metodi di Intelligenza Artificiale hanno raggiunto risultati notevoli in diversi ambiti, ma la loro applicazione efficace a dati dinamici e strutturati rimane una sfida significativa. Questa tesi indaga tecniche avanzate di IA per l’apprendimento continuo e robusto in scenari in cui i dati evolvono nel tempo e presentano complesse dipendenze. La ricerca esplora diverse direzioni complementari per affrontare le limitazioni dei modelli attuali in termini di adattabilità e resilienza. In primo luogo, vengono studiati metodi di apprendimento continuo per consentire alle reti neurali di apprendere da flussi sequenziali di dati senza dimenticare le conoscenze acquisite in precedenza. Viene proposto un approccio basato sulla distillazione che sfrutta i Vision Transformer, in cui le rappresentazioni di attenzione vengono trasferite tra modelli teacher e student, migliorando la stabilità. Inoltre, viene sviluppata una strategia di prompt learning basata sugli embedding del modello CLIP, che seleziona dinamicamente prompt specifici per ciascun task, migliorando le prestazioni. La seconda linea di ricerca della tesi riguarda il federated learning, un contesto distribuito in cui le informazioni strutturate emergono naturalmente dalla collaborazione tra i client. Viene introdotto un nuovo meccanismo di difesa contro gli attacchi backdoor, che sfrutta le proprietà spettrali delle rappresentazioni locali dei dati per identificare e mitigare i partecipanti malevoli attraverso tecniche di sintesi e allineamento dei dati. Infine, la tesi analizza attacchi backdoor adattivi e le relative difese, sottolineando come tali vulnerabilità rappresentino una minaccia critica per i processi e le infrastrutture industriali. Nel complesso, il lavoro contribuisce alla progettazione di modelli di IA capaci di adattamento continuo, collaborazione sicura e sfruttamento efficace delle informazioni strutturali per applicazioni reali e industriali.

2026 Tesi di dottorato

The aporetic dialogs of Modena on gender differences: Is it all about testosterone? Episode III: Mathematics

Authors: Brigante, G.; Costantino, F.; Bellelli, A.; Boni, S.; Furini, C.; Cucchiara, R.; Simoni, M.

Published in: ANDROLOGY

This report is the transcript of what was discussed in a convention at the Endocrinology Unit in Modena, Italy, in … (Read full abstract)

This report is the transcript of what was discussed in a convention at the Endocrinology Unit in Modena, Italy, in the form of the aporetic dialogs of ancient Greece. It is the third episode of a series of four discussions on the differences between males and females, with a multidisciplinary approach. In this work, the role of testosterone in gender differences in the aptitude for mathematics is explored. First, the definitions of mathematical abilities were provided together with any gender difference in the distribution of females and males in science, technology, engineering, and mathematics subjects. A clear predominance of males is evident at most science, technology, engineering, and mathematics education levels, especially in advanced academic careers. Then, the discussants were divided into two groups: group 1, which illustrated the thesis that testosterone promotes the development of logical‒mathematical skills, and group 2, which, in contrast, asserted the inconsistency of a direct role of testosterone in improving cognitive abilities and that socio-cultural factors should be considered on the basis of this gender gap. In the end, an expert referee (a female engineer) tried to resolve the aporia: are the two theories equivalent or is one superior?.

2026 Articolo su rivista

The Biblical Heritage in Ancient Latin Christian Literature: Advancing Intertextual Mapping Through Sentence Embeddings

Authors: Mambelli, Anna; Bigoni, Laura; Dainese, Davide; Tutrone, Fabio; Caffagni, Davide; Cocchi, Federico; Zanella, Marco; Cornia, Marcella; Cucchiara, Rita

Published in: UMANISTICA DIGITALE

This study presents an interdisciplinary methodology for detecting biblical references in Latin patristic literature through an innovative combination of rigorous … (Read full abstract)

This study presents an interdisciplinary methodology for detecting biblical references in Latin patristic literature through an innovative combination of rigorous philological approach and Natural Language Processing (NLP) techniques. Focusing on one of the most influential ancient Christian commentaries on the Bible, Augustine of Hippo’s De Genesi ad litteram, and its relationship with Latin biblical texts (specifically, Jerome’s Vulgate and pre-Vulgate versions), this research introduces a token-based classification system for intertextual references, enriched with semantic annotations and supported by the INCEpTION platform. The first section shows how this numerical classification system accounts for exact matches, lemmatized forms, roots, synonyms, and other forms of semantic parallels (here referred to as “structures”), capturing a wide spectrum of textual similarity. To enhance automatic retrieval of these intertextual connections, we fine-tune BERT-based language models for Latin, incorporating contrastive learning and hard negative mining. In the second section, experimental results show that finetuned models significantly outperform baseline models at various levels of textual similarity. This work highlights the utility of computational models in overcoming the traditional dichotomy between explicit quotations and implicit allusions, embracing multiple intermediate nuances of similarity and offering a scalable approach to the study of intertextuality in ancient writings.

2026 Articolo su rivista

The olfactory functional network in the Alzheimer’s disease continuum: a resting state fMRI study

Authors: Ballotta, Daniela; Casadio, Claudia; Tondelli, Manuela; Zanelli, Vanessa; Ricci, Francesco; Carpentiero, Omar; Lui, Fausta; Filippini, Nicola; Chiari, Annalisa; Molinari, Maria Angela; Benuzzi, Francesca

Published in: FRONTIERS IN AGING NEUROSCIENCE

2026 Articolo su rivista

The paper has a GitHub, the GitHub has a README, the README has nothing: Reproducibility Signals for Review Support

Authors: Bolelli, Federico; Santoli, Davide; Marchesini, Kevin; Lumetti, Luca; Grana, Costantino

Reproducibility policies promise "checkable" medical-imaging science, yet many submissions still ship unverifiable artifacts. Our analysis of 3722 MICCAI papers shows … (Read full abstract)

Reproducibility policies promise "checkable" medical-imaging science, yet many submissions still ship unverifiable artifacts. Our analysis of 3722 MICCAI papers shows code-linking rising from 51.8% (2021) to 72.5% (2025), but ~13% of linked repositories are inaccessible or empty. We present paper-snitch, a reviewer-facing decision-support tool that turns these signals into an evidence-grounded report. Paper-snitch parses PDFs, resolves and sanity-checks repositories, and applies policy-aware checklists aligned with MICCAI expectations, producing a review-time verifiability score decomposed into interpretable sub-scores plus criterion-linked excerpts and artifacts reviewers can inspect. It never executes untrusted code or attempts GPU-heavy reproduction, focusing instead on bounded, verifiable checks. We compare paper-snitch on 100 randomly sampled MICCAI 2025 papers with human annotators using shared evaluation criteria, indicating that automated, bounded checks can scale reproducibility screening while keeping final decisions with reviewers.

2026 Relazione in Atti di Convegno

Page 4 of 110 • Total publications: 1091