Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing

Authors: Baldrati, Alberto; Morelli, Davide; Cornia, Marcella; Bertini, Marco; Cucchiara, Rita

Published in: ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS

Fashion illustration is a crucial medium for designers to convey their creative vision and transform design concepts into tangible representations … (Read full abstract)

Fashion illustration is a crucial medium for designers to convey their creative vision and transform design concepts into tangible representations that showcase the interplay between clothing and the human body. In the context of fashion design, computer vision techniques have the potential to enhance and streamline the design process. Departing from prior research primarily focused on virtual try-on, this paper tackles the task of multimodal-conditioned fashion image editing. Our approach aims to generate human-centric fashion images guided by multimodal prompts, including text, human body poses, garment sketches, and fabric textures. To address this problem, we propose extending latent diffusion models to incorporate these multiple modalities and modifying the structure of the denoising network, taking multimodal prompts as input. To condition the proposed architecture on fabric textures, we employ textual inversion techniques and let diverse cross-attention layers of the denoising network attend to textual and texture information, thus incorporating different granularity conditioning details. Given the lack of datasets for the task, we extend two existing fashion datasets, Dress Code and VITON-HD, with multimodal annotations. Experimental evaluations demonstrate the effectiveness of our proposed approach in terms of realism and coherence concerning the provided multimodal inputs.

2026 Articolo su rivista

PopEYE - Infrared Ocular Image Dataset for Eye State and Gaze-Direction Classification

Authors: Gibertoni, Giovanni; Borghi, Guido; Rovati, Luigi

The PopEYE dataset is a specialized collection of 14,976 near-infrared (NIR) images of the human eye region, specifically designed to … (Read full abstract)

The PopEYE dataset is a specialized collection of 14,976 near-infrared (NIR) images of the human eye region, specifically designed to support the development and benchmarking of computer vision algorithms for eye-state detection and coarse gaze-direction classification. Each image is provided in a fixed resolution of 772 × 520 pixels in 8-bit grayscale PNG format. The acquisition was performed frontally using a custom-developed Maxwellian-view optical configuration, consisting of a board-level CMOS camera and a specialized lens system where the subject's eye is precisely positioned at the focal point. This setup ensures a high-contrast representation of the anterior segment, making the pupil, iris, limbus, and portions of the sclera and eyelids clearly distinguishable under stable 850 nm infrared illumination. The dataset is categorized into six mutually exclusive classes identified through manual annotation supported by fixed visual aids and an expert system algorithm. The classification includes a correct positioning class for eyes open and properly aligned for clinical measurements (8,160 images), a closed class representing full eye closures such as blinks or sustained lid closure (1,790 images), and four directional classes representing gaze shifts relative to the central optical axis, specifically up (1,379 images), down (1,015 images), left (1,296 images), and right (1,336 images). The data captures the natural anatomical variability of 22 subjects and incorporates common real-world artifacts such as specular reflections from NIR sources and partial pupil occlusions by eyelashes or eyelids. By providing standardized labels and high-resolution NIR imagery, PopEYE serves as a robust resource for training machine learning models intended for real-time patient monitoring during ophthalmic examinations.

2026 Banca dati

ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering

Authors: Compagnoni, Alberto; Morini, Marco; Sarto, Sara; Cocchi, Federico; Caffagni, Davide; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Multimodal Large Language Models (MLLMs) have shown impressive capabilities in jointly understanding text, images, and videos, often evaluated via Visual … (Read full abstract)

Multimodal Large Language Models (MLLMs) have shown impressive capabilities in jointly understanding text, images, and videos, often evaluated via Visual Question Answering (VQA). However, even state-of-the-art MLLMs struggle with domain-specific or knowledge-intensive queries, where relevant information is underrepresented in pre-training data. Knowledge-based VQA (KB-VQA) addresses this by retrieving external documents to condition answer generation, but current retrieval-augmented approaches suffer from low precision, noisy passages, and limited reasoning. To address this, we propose ReAG, a novel Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages, ensuring high-quality additional context. The model follows a multi-stage training strategy leveraging reinforcement learning to enhance reasoning over retrieved content, while supervised fine-tuning serves only as a cold start. Extensive experiments on Encyclopedic-VQA and InfoSeek demonstrate that ReAG significantly outperforms prior methods, improving answer accuracy and providing interpretable reasoning grounded in retrieved evidence. Our source code is publicly available at: https://github.com/aimagelab/ReAG.

2026 Relazione in Atti di Convegno

Searching for New Possible Peripheral Biomarkers of Cognitive Decline in Down Syndrome: The Role of IL-18 Pathway and its Interaction with TGF-β1 and TNF-α

Authors: Grasso, M.; Fidilio, A.; L'Episcopo, F.; Recupero, M.; Barone, C.; Lovino, M.; Alboni, S.; Bacalini, M. G.; Caruso, G.; Greco, D.; Buono, S.; De La Torre, R.; Tascedda, F.; Blom, J. M.; Benatti, C.; Caraci, F.

Published in: NEUROMOLECULAR MEDICINE

Down syndrome (DS) represents one of the most common genetic disorders attributable to a partial or complete trisomy of chromosome … (Read full abstract)

Down syndrome (DS) represents one of the most common genetic disorders attributable to a partial or complete trisomy of chromosome 21 that affects about 1 in 700 individuals at birth. The diagnosis of Alzheimer's Disease (AD)-correlated cognitive decline in this population requires new approaches and new biomarkers that comprehensively assess health status and early cognitive decline. In this observational study, we explored for the first time the relation of IL-18, a cytokine member of IL-1 family involved in both innate and acquired immune responses, with DS associated cognitive decline. We observed that plasma total IL-18, in subjects with DS over 35 with and without AD-related cognitive decline, and plasma concentrations of its binding protein in subjects with DS (19-35 years) were correlated with lower plasma concentrations of Transforming Growth Factor (TGF-beta 1), which are linked to an increased rate of cognitive decline in adults with DS. In addition, we found a significant association between low baseline concentrations of Free IL-18, the active form of the cytokine, and an increased rate of cognitive decline at 12 months, calculated as delta of the Test for Severe Impairment (dTSI), in individuals with DS (19-35 years). Finally, we demonstrated a reduction of Free IL-18/TNF-alpha ratio, considered as a new possible double biomarker, in both young and older adult DS subjects without AD-related cognitive decline (area under the receiver operating curve (AUC) was 0.82 and 0.71, respectively), suggesting the advantage of the composite biomarkers in the discrimination of patients from healthy people over single biomarkers.

2026 Articolo su rivista

Sketch2Stitch: GANs for Abstract Sketch-Based Dress Synthesis

Authors: Farooq Khan, Faizan; Mohamed Bakr, Eslam; Morelli, Davide; Cornia, Marcella; Cucchiara, Rita; Elhoseiny, Mohamed

In the realm of creative expression, not everyone possesses the gift of effortlessly translating their imaginative visions into flawless sketches. … (Read full abstract)

In the realm of creative expression, not everyone possesses the gift of effortlessly translating their imaginative visions into flawless sketches. More often than not, the outcome resembles an abstract, perhaps even slightly distorted representation. The art of producing impeccable sketches is not only challenging but also a time-consuming process. Our work is the first of this kind in transforming abstract, sometimes deformed garment sketches into photorealistic catalog images, to empower the everyday individual to become their own fashion designer. We create Sketch2Stitch, a dataset featuring over 65,000 abstract sketch images generated from garments of DressCode and VITONHD, two benchmark datasets in the virtual try-on task. Sketch2Stitch is the first dataset in the literature to provide abstract sketches in the fashion domain. We propose a StyleGAN-based generative framework that bridges freehand sketching with photorealistic garment synthesis. We demonstrate that our framework allows users to sketch rough outlines and optionally provide color hints, producing realistic designs in seconds. Experimental results demonstrate, both quantitatively and qualitatively, that the proposed framework achieves superior performance against various baselines and existing methods on both subsets of our dataset. Our work highlights a pathway toward AI-assisted fashion design tools, democratizing garment ideation for students, independent designers, and casual creators.

2026 Relazione in Atti di Convegno

The aporetic dialogs of Modena on gender differences: Is it all about testosterone? Episode III: Mathematics

Authors: Brigante, G.; Costantino, F.; Bellelli, A.; Boni, S.; Furini, C.; Cucchiara, R.; Simoni, M.

Published in: ANDROLOGY

This report is the transcript of what was discussed in a convention at the Endocrinology Unit in Modena, Italy, in … (Read full abstract)

This report is the transcript of what was discussed in a convention at the Endocrinology Unit in Modena, Italy, in the form of the aporetic dialogs of ancient Greece. It is the third episode of a series of four discussions on the differences between males and females, with a multidisciplinary approach. In this work, the role of testosterone in gender differences in the aptitude for mathematics is explored. First, the definitions of mathematical abilities were provided together with any gender difference in the distribution of females and males in science, technology, engineering, and mathematics subjects. A clear predominance of males is evident at most science, technology, engineering, and mathematics education levels, especially in advanced academic careers. Then, the discussants were divided into two groups: group 1, which illustrated the thesis that testosterone promotes the development of logical‒mathematical skills, and group 2, which, in contrast, asserted the inconsistency of a direct role of testosterone in improving cognitive abilities and that socio-cultural factors should be considered on the basis of this gender gap. In the end, an expert referee (a female engineer) tried to resolve the aporia: are the two theories equivalent or is one superior?.

2026 Articolo su rivista

The olfactory functional network in the Alzheimer’s disease continuum: a resting state fMRI study

Authors: Ballotta, Daniela; Casadio, Claudia; Tondelli, Manuela; Zanelli, Vanessa; Ricci, Francesco; Carpentiero, Omar; Lui, Fausta; Filippini, Nicola; Chiari, Annalisa; Molinari, Maria Angela; Benuzzi, Francesca

Published in: FRONTIERS IN AGING NEUROSCIENCE

2026 Articolo su rivista

3D Pose Nowcasting: Forecast the future to improve the present

Authors: Simoni, A.; Marchetti, F.; Borghi, G.; Becattini, F.; Seidenari, L.; Vezzani, R.; Del Bimbo, A.

Published in: COMPUTER VISION AND IMAGE UNDERSTANDING

Technologies to enable safe and effective collaboration and coexistence between humans and robots have gained significant importance in the last … (Read full abstract)

Technologies to enable safe and effective collaboration and coexistence between humans and robots have gained significant importance in the last few years. A critical component useful for realizing this collaborative paradigm is the understanding of human and robot 3D poses using non-invasive systems. Therefore, in this paper, we propose a novel vision-based system leveraging depth data to accurately establish the 3D locations of skeleton joints. Specifically, we introduce the concept of Pose Nowcasting, denoting the capability of the proposed system to enhance its current pose estimation accuracy by jointly learning to forecast future poses. The experimental evaluation is conducted on two different datasets, providing accurate and real-time performance and confirming the validity of the proposed method on both the robotic and human scenarios.

2025 Articolo su rivista

A Benchmark Study of Gene Fusion Prioritization Tools

Authors: Miccolis, F.; Lovino, M.; Ficarra, E.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

A gene fusion is a chromosomal aberration from juxtaposing separate genes. Since some gene fusions are involved in tumorigenesis, proper … (Read full abstract)

A gene fusion is a chromosomal aberration from juxtaposing separate genes. Since some gene fusions are involved in tumorigenesis, proper gene fusion investigation and analysis are crucial in the literature. After DNA/RNA sample extraction, detecting gene fusions requires first gene fusion detection tools, which usually provide many false positives. Given the high experimental costs in wet lab validation of a single fusion, gene fusion prioritization tools were made available over the years to significantly narrow down candidate gene fusions for validation (e.g., Oncofuse, Pegasus, DEEPrior, ChimerDriver). Although a few reviews about gene fusion detection tools are available, a benchmark on prioritization tools is not available yet in the literature. The aim of this paper is twofold: 1. to provide a curated dataset for a fair gene fusion prioritization tool evaluation. 2. to develop a proper comparison based on time, resources, and tool confidence on selected gene fusions. Based on this benchmark, it can be stated that ChimerDriver is the most reliable tool for prioritizing oncogenic fusions.

2025 Relazione in Atti di Convegno

A Deep-Learning-Based Method for Real-Time Barcode Segmentation on Edge CPUs

Authors: Vezzali, Enrico; Vorabbi, Lorenzo; Grana, Costantino; Bolelli, Federico

Barcodes are a critical technology in industrial automation, logistics, and retail, enabling fast and reliable data capture. While deep learning … (Read full abstract)

Barcodes are a critical technology in industrial automation, logistics, and retail, enabling fast and reliable data capture. While deep learning has significantly improved barcode localization accuracy, most modern architectures remain too computationally demanding for real-time deployment on embedded systems without dedicated hardware acceleration. In this work, we present BaFaLo (Barcode Fast Localizer), an ultra-lightweight segmentation-based neural network for barcode localization. Our model is specifically optimized for real-time performance on low-power CPUs while maintaining high localization accuracy for both 1D and 2D barcodes. It features a two-branch architecture—comprising a local feature extractor and a global context module—and is tailored for low-resolution inputs to improve inference speed further. We benchmark BaFaLo against several lightweight architectures for object detection or segmentation, including YOLO Nano, Fast-SCNN, BiSeNet V2, and ContextNet, using the BarBeR dataset. BaFaLo achieves the fastest inference time among all deep-learning models tested, operating at 57.62ms per frame on a single CPU core of a Raspberry Pi 3B+. Despite its compact design, it achieves a decoding rate nearly equivalent to YOLO Nano for 1D barcodes and only 3.5 percentage points lower for 2D barcodes while being approximately nine times faster.

2025 Relazione in Atti di Convegno

Page 2 of 106 • Total publications: 1059