Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

PATHOS: Pathology attention framework for treatment response stratification in ovarian high-grade serous carcinomas following neoadjuvant chemotherapy on H&E images

Authors: Miccolis, F.; Lovino, M.; Lehtonen, O.; Hynninen, J.; Hautaniemi, S.; Virtanen, A.; Ficarra, E.

Published in: JOURNAL OF PATHOLOGY INFORMATICS

Ovarian high-grade serous carcinoma (ovarian HGSC) is a clinically challenging disease with a poor prognosis, particularly for patients receiving neoadjuvant … (Read full abstract)

Ovarian high-grade serous carcinoma (ovarian HGSC) is a clinically challenging disease with a poor prognosis, particularly for patients receiving neoadjuvant chemotherapy (NACT) before debulking surgery. In this study, we evaluate the progression-free interval (PFI) after NACT based on hematoxylin and eosin-stained whole-slide images (WSIs) of omental tumor tissue. Digital pathology tools are emerging, aiming at assisting pathologists in diagnosis and analysis; however, distinguishing features associated with response to NACT remain elusive. Multiple instance learning (MIL) coupled with attention mechanisms has shown promise in predicting treatment response from WSIs. Additionally, segmentation tools can identify and delineate regions in WSIs. Whereas some efforts have been made to develop explainable models for clinical outcome, there remains a need for genuinely interpretable models for pathologists. This article introduces the PATHOS framework, a novel approach to explaining crucial features of treatment response based on the PFI time in NACT treated patients from WSIs. PATHOS is composed of three blocks: (1) MIL block to identify informative regions, (2) panoptic segmentation and downstream analysis block for feature computation, and (3) classification block to predict the PFI. The results demonstrate that PATHOS enhances the interpretability of response to NACT in ovarian HGSC patients by highlighting pathologically significant features relevant to PFI prediction, such as tumor cell morphology, stromal abundance, and the spatial distribution of stromal regions. Furthermore, PATHOS identifies approximately 10% of the total WSI area as an informative region for clinical outcome.

2026 Articolo su rivista

PopEYE - Infrared Ocular Image Dataset for Eye State and Gaze-Direction Classification

Authors: Gibertoni, Giovanni; Borghi, Guido; Rovati, Luigi

The PopEYE dataset is a specialized collection of 14,976 near-infrared (NIR) images of the human eye region, specifically designed to … (Read full abstract)

The PopEYE dataset is a specialized collection of 14,976 near-infrared (NIR) images of the human eye region, specifically designed to support the development and benchmarking of computer vision algorithms for eye-state detection and coarse gaze-direction classification. Each image is provided in a fixed resolution of 772 × 520 pixels in 8-bit grayscale PNG format. The acquisition was performed frontally using a custom-developed Maxwellian-view optical configuration, consisting of a board-level CMOS camera and a specialized lens system where the subject's eye is precisely positioned at the focal point. This setup ensures a high-contrast representation of the anterior segment, making the pupil, iris, limbus, and portions of the sclera and eyelids clearly distinguishable under stable 850 nm infrared illumination. The dataset is categorized into six mutually exclusive classes identified through manual annotation supported by fixed visual aids and an expert system algorithm. The classification includes a correct positioning class for eyes open and properly aligned for clinical measurements (8,160 images), a closed class representing full eye closures such as blinks or sustained lid closure (1,790 images), and four directional classes representing gaze shifts relative to the central optical axis, specifically up (1,379 images), down (1,015 images), left (1,296 images), and right (1,336 images). The data captures the natural anatomical variability of 22 subjects and incorporates common real-world artifacts such as specular reflections from NIR sources and partial pupil occlusions by eyelashes or eyelids. By providing standardized labels and high-resolution NIR imagery, PopEYE serves as a robust resource for training machine learning models intended for real-time patient monitoring during ophthalmic examinations.

2026 Banca dati

RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models

Authors: Mattioli, Gabriele; Turri, Evelyn; Sarto, Sara; Baraldi, Lorenzo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Tool learning with foundation models aims to endow AI systems with the ability to invoke external resources — such as … (Read full abstract)

Tool learning with foundation models aims to endow AI systems with the ability to invoke external resources — such as APIs, computational utilities, and specialized models — to solve complex tasks beyond the reach of standalone language generation. While recent advances in Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) have expanded their reasoning and perception capabilities, existing tool-use methods are predominantly limited to text-only inputs and closed-world settings. Consequently, they struggle to interpret multimodal user instructions and cannot generalize to tools unseen during training. In this work, we introduce RaTA-Tool, a novel framework for open-world multimodal tool selection. Rather than learning direct mappings from user queries to fixed tool identifiers, our approach enables an MLLM to convert a multimodal query into a structured task description and subsequently retrieve the most appropriate tool by matching this representation against semantically rich, machine-readable tool descriptions. This retrieval-based formulation naturally supports extensibility to new tools without retraining. To further improve alignment between task descriptions and tool selection, we incorporate a preference-based optimization stage using Direct Preference Optimization (DPO). To support research in this setting, we also introduce the first dataset for open-world multimodal tool use, featuring standardized tool descriptions derived from Hugging Face model cards. Extensive experiments demonstrate that our approach significantly improves tool-selection performance, particularly in open-world, multimodal scenarios.

2026 Relazione in Atti di Convegno

ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering

Authors: Compagnoni, Alberto; Morini, Marco; Sarto, Sara; Cocchi, Federico; Caffagni, Davide; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Multimodal Large Language Models (MLLMs) have shown impressive capabilities in jointly understanding text, images, and videos, often evaluated via Visual … (Read full abstract)

Multimodal Large Language Models (MLLMs) have shown impressive capabilities in jointly understanding text, images, and videos, often evaluated via Visual Question Answering (VQA). However, even state-of-the-art MLLMs struggle with domain-specific or knowledge-intensive queries, where relevant information is underrepresented in pre-training data. Knowledge-based VQA (KB-VQA) addresses this by retrieving external documents to condition answer generation, but current retrieval-augmented approaches suffer from low precision, noisy passages, and limited reasoning. To address this, we propose ReAG, a novel Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages, ensuring high-quality additional context. The model follows a multi-stage training strategy leveraging reinforcement learning to enhance reasoning over retrieved content, while supervised fine-tuning serves only as a cold start. Extensive experiments on Encyclopedic-VQA and InfoSeek demonstrate that ReAG significantly outperforms prior methods, improving answer accuracy and providing interpretable reasoning grounded in retrieved evidence. Our source code is publicly available at: https://github.com/aimagelab/ReAG.

2026 Relazione in Atti di Convegno

Robust Zero-Shot Generalization for Open-Vocabulary Action Recognition via Task Arithmetic

Authors: Morandi, Francesca; Moussadek, Omayma; Venturini, Federico; Suardi, Mauro; Banzatti, Alessandro; Cannarile, Francesco; Porrello, Angelo; Calderara, Simone

2026 Relazione in Atti di Convegno

Scalare l’Intelligenza Artificiale per l’Analisi di Immagini Orali e Dentali

Authors: Lumetti, Luca

La tomografia computerizzata a fascio conico (Cone Beam Computed Tomography, CBCT) è centrale nella pratica odontoiatrica e maxillo-facciale contemporanea, ma … (Read full abstract)

La tomografia computerizzata a fascio conico (Cone Beam Computed Tomography, CBCT) è centrale nella pratica odontoiatrica e maxillo-facciale contemporanea, ma i progressi nell’analisi automatizzata sono stati limitati dalla scarsità di dataset pubblici disponibili. Questa tesi affronta tale collo di bottiglia creando un ecosistema aperto ed estensibile che combina dataset, strumenti di annotazione, progressi algoritmici e dimostra come questi elementi interagiscano ciclicamente per accelerare la ricerca e la traduzione in prodotti clinici. Il dataset Maxillo è stato il primo nel suo genere, fornendo 91 volumi densamente annotati e 256 scansioni annotate in modo sparso per l’annotazione del Canale Alveolare Inferiore. La serie ToothFairy, a cui questa tesi ha contribuito, si è basata su queste fondamenta: la prima versione di ToothFairy ha aumentato le annotazioni dense a 156 volumi; ToothFairy2 si è espansa fino a 480 volumi CBCT, ciascuno con 42 classi semantiche; e ToothFairy3 ha ulteriormente ampliato il corpus a 532 volumi e 77 classi, migliorando al contempo la qualità delle annotazioni e la diversità degli scanner utilizzati. A complemento delle CBCT, il dataset Bits2Bites, anch'esso parte di questa tesi, ha fornito 200 coppie di scansioni intra-orali registrate con annotazioni multi-etichetta di occlusione. Tutte le risorse sono state rilasciate in modo aperto per consentire benchmarking riproducibili e sviluppi successivi. Per scalare le annotazioni senza sacrificare la fedeltà clinica, ho sviluppato strumenti di annotazione semi-automatizzati e una rigorosa pipeline di controllo qualità che combina modelli predittivi con la revisione da parte di esperti. Fondamentalmente, la creazione dei dataset, gli strumenti e lo sviluppo dei modelli sono progrediti in modo ciclico: dati aggiuntivi hanno permesso modelli migliori; modelli migliori hanno alimentato strumenti di annotazione più rapidi e accurati; e strumenti migliorati hanno a loro volta prodotto dataset più grandi e di qualità superiore, costituendo il contributo intellettuale centrale di questo lavoro. Su questa base di dati, ho migliorato i metodi di segmentazione volumetrica: moduli basati su architettura transformer che codificano esplicitamente le relazioni spaziali tra patch per preservare il dettaglio a livello di voxel aggregando al contempo il contesto a lungo raggio, e adattamenti dell'architettura Mamba per una segmentazione 3D efficiente e ad alta precisione. Infine, ho introdotto U-Net Transplant, un framework di fusione di modelli che propone tecniche innovative per aggiornare e specializzare modelli clinici senza un riaddestramento completo, riducendo i costi di rideploy, lo spazio di archiviazione e i rischi di esposizione dei dati. Nel complesso, questo ecosistema ha fornito il più grande benchmark CBCT aperto per la segmentazione maxillo-facciale fino ad oggi, insieme a un insieme coerente di metodi e strumenti che hanno migliorato in modo sostanziale l’accuratezza, l’efficienza e la gestione del ciclo di vita dell’IA clinica, abilitando una ricerca e un’implementazione dell’IA dentale più rapide, sicure e riproducibili.

2026 Tesi di dottorato

Searching for New Possible Peripheral Biomarkers of Cognitive Decline in Down Syndrome: The Role of IL-18 Pathway and its Interaction with TGF-β1 and TNF-α

Authors: Grasso, M.; Fidilio, A.; L'Episcopo, F.; Recupero, M.; Barone, C.; Lovino, M.; Alboni, S.; Bacalini, M. G.; Caruso, G.; Greco, D.; Buono, S.; De La Torre, R.; Tascedda, F.; Blom, J. M.; Benatti, C.; Caraci, F.

Published in: NEUROMOLECULAR MEDICINE

Down syndrome (DS) represents one of the most common genetic disorders attributable to a partial or complete trisomy of chromosome … (Read full abstract)

Down syndrome (DS) represents one of the most common genetic disorders attributable to a partial or complete trisomy of chromosome 21 that affects about 1 in 700 individuals at birth. The diagnosis of Alzheimer's Disease (AD)-correlated cognitive decline in this population requires new approaches and new biomarkers that comprehensively assess health status and early cognitive decline. In this observational study, we explored for the first time the relation of IL-18, a cytokine member of IL-1 family involved in both innate and acquired immune responses, with DS associated cognitive decline. We observed that plasma total IL-18, in subjects with DS over 35 with and without AD-related cognitive decline, and plasma concentrations of its binding protein in subjects with DS (19-35 years) were correlated with lower plasma concentrations of Transforming Growth Factor (TGF-beta 1), which are linked to an increased rate of cognitive decline in adults with DS. In addition, we found a significant association between low baseline concentrations of Free IL-18, the active form of the cytokine, and an increased rate of cognitive decline at 12 months, calculated as delta of the Test for Severe Impairment (dTSI), in individuals with DS (19-35 years). Finally, we demonstrated a reduction of Free IL-18/TNF-alpha ratio, considered as a new possible double biomarker, in both young and older adult DS subjects without AD-related cognitive decline (area under the receiver operating curve (AUC) was 0.82 and 0.71, respectively), suggesting the advantage of the composite biomarkers in the discrimination of patients from healthy people over single biomarkers.

2026 Articolo su rivista

Segment-wise Anomaly Detection via Compression Tokens in Industrial Production Lines

Authors: Salici, Giacomo; Köhler, Stefan; Fiorina, Andrea; Zannella, Franco; Porrello, Angelo; Calderara, Simone

We present a predictive maintenance approach for industrial production lines based on multivariate segment-wise time-series analysis. To address the high … (Read full abstract)

We present a predictive maintenance approach for industrial production lines based on multivariate segment-wise time-series analysis. To address the high cost of collecting anomalous samples, we propose a novelty detection framework in which a transformer autoencoder is trained in a semi-supervised fashion exclusively on nominal sequences, and anomaly scores are derived from reconstruction error at test time. We introduce a set of learnable “compression tokens” into the transformer encoder; these tokens serve as the bottleneck from which the decoder reconstructs the input. We compare this model against an MLP-based autoencoder baseline; the results show that the novelty-detection model remains strong, with near-perfect performance under time-aware and device-aware validation, which are the conditions that most faithfully simulate deployment.

2026 Relazione in Atti di Convegno

Sketch2Stitch: GANs for Abstract Sketch-Based Dress Synthesis

Authors: Farooq Khan, Faizan; Mohamed Bakr, Eslam; Morelli, Davide; Cornia, Marcella; Cucchiara, Rita; Elhoseiny, Mohamed

In the realm of creative expression, not everyone possesses the gift of effortlessly translating their imaginative visions into flawless sketches. … (Read full abstract)

In the realm of creative expression, not everyone possesses the gift of effortlessly translating their imaginative visions into flawless sketches. More often than not, the outcome resembles an abstract, perhaps even slightly distorted representation. The art of producing impeccable sketches is not only challenging but also a time-consuming process. Our work is the first of this kind in transforming abstract, sometimes deformed garment sketches into photorealistic catalog images, to empower the everyday individual to become their own fashion designer. We create Sketch2Stitch, a dataset featuring over 65,000 abstract sketch images generated from garments of DressCode and VITONHD, two benchmark datasets in the virtual try-on task. Sketch2Stitch is the first dataset in the literature to provide abstract sketches in the fashion domain. We propose a StyleGAN-based generative framework that bridges freehand sketching with photorealistic garment synthesis. We demonstrate that our framework allows users to sketch rough outlines and optionally provide color hints, producing realistic designs in seconds. Experimental results demonstrate, both quantitatively and qualitatively, that the proposed framework achieves superior performance against various baselines and existing methods on both subsets of our dataset. Our work highlights a pathway toward AI-assisted fashion design tools, democratizing garment ideation for students, independent designers, and casual creators.

2026 Relazione in Atti di Convegno

Tecniche avanzate di Intelligenza Artificiale per l’apprendimento continuo e robusto su dati strutturati

Authors: Menabue, Martin

I metodi di Intelligenza Artificiale hanno raggiunto risultati notevoli in diversi ambiti, ma la loro applicazione efficace a dati dinamici … (Read full abstract)

I metodi di Intelligenza Artificiale hanno raggiunto risultati notevoli in diversi ambiti, ma la loro applicazione efficace a dati dinamici e strutturati rimane una sfida significativa. Questa tesi indaga tecniche avanzate di IA per l’apprendimento continuo e robusto in scenari in cui i dati evolvono nel tempo e presentano complesse dipendenze. La ricerca esplora diverse direzioni complementari per affrontare le limitazioni dei modelli attuali in termini di adattabilità e resilienza. In primo luogo, vengono studiati metodi di apprendimento continuo per consentire alle reti neurali di apprendere da flussi sequenziali di dati senza dimenticare le conoscenze acquisite in precedenza. Viene proposto un approccio basato sulla distillazione che sfrutta i Vision Transformer, in cui le rappresentazioni di attenzione vengono trasferite tra modelli teacher e student, migliorando la stabilità. Inoltre, viene sviluppata una strategia di prompt learning basata sugli embedding del modello CLIP, che seleziona dinamicamente prompt specifici per ciascun task, migliorando le prestazioni. La seconda linea di ricerca della tesi riguarda il federated learning, un contesto distribuito in cui le informazioni strutturate emergono naturalmente dalla collaborazione tra i client. Viene introdotto un nuovo meccanismo di difesa contro gli attacchi backdoor, che sfrutta le proprietà spettrali delle rappresentazioni locali dei dati per identificare e mitigare i partecipanti malevoli attraverso tecniche di sintesi e allineamento dei dati. Infine, la tesi analizza attacchi backdoor adattivi e le relative difese, sottolineando come tali vulnerabilità rappresentino una minaccia critica per i processi e le infrastrutture industriali. Nel complesso, il lavoro contribuisce alla progettazione di modelli di IA capaci di adattamento continuo, collaborazione sicura e sfruttamento efficace delle informazioni strutturali per applicazioni reali e industriali.

2026 Tesi di dottorato

Page 4 of 110 • Total publications: 1098