Publications - AImageLab

Empowering the Operator: Fault Diagnosis and Identification in an Industrial Environment Through a User-Friendly IoT Architecture

Authors: Bertoli, Annalisa; Fantuzzi, Cesare

Published in: COMPUTERS

In recent years, the increasing complexity of production systems driven by technological development has created new opportunities in the industrial … (Read full abstract)

In recent years, the increasing complexity of production systems driven by technological development has created new opportunities in the industrial world but has also brought challenges in the practical use of these systems by operators. One of the biggest changes is data existence and its accessibility. This work proposes an IoT architecture specifically designed for real-world industrial environments. The goal is to present a system that can be effectively implemented to monitor operations and production processes in real time. This solution improves fault detection and identification, giving the operators the critical information needed to make informed decisions. The IoT architecture is implemented in two different industrial applications, demonstrating the flexibility of the architecture across various industrial contexts. It highlights how the system is monitored to reduce downtime when a fault occurs, making clear the loss in performance and the fault that causes this loss. Additionally, this approach supports human operators in a deeper understanding of their working environment, enabling them to make decisions based on real-time data.

2025

DOI IRIS

Enhancing rPPG Pulse-Signal Recovery by Facial Sampling and PSD Clustering

Authors: Grossi, Giuliano; Boccignone, Giuseppe; Conte, Donatello; Cuculo, Vittorio; D'Amelio, Alessandro; Lanzarotti, Raffaella

Published in: BIOMEDICAL SIGNAL PROCESSING AND CONTROL

2025 Articolo su rivista

DOI IRIS

Enhancing Testicular Ultrasound Image Classification Through Synthetic Data and Pretraining Strategies

Authors: Morelli, Nicola; Marchesini, Kevin; Lumetti, Luca; Santi, Daniele; Grana, Costantino; Bolelli, Federico

Testicular ultrasound imaging is vital for assessing male infertility, with testicular inhomogeneity serving as a key biomarker. However, subjective interpretation … (Read full abstract)

Testicular ultrasound imaging is vital for assessing male infertility, with testicular inhomogeneity serving as a key biomarker. However, subjective interpretation and the scarcity of publicly available datasets pose challenges to automated classification. In this study, we explore supervised and unsupervised pretraining strategies using a ResNet-based architecture, supplemented by diffusion-based generative models to synthesize realistic ultrasound images. Our results demonstrate that pretraining significantly enhances classification performance compared to training from scratch, and synthetic data can effectively substitute real images in the pretraining process, alleviating data-sharing constraints. These methods offer promising advancements toward robust, clinically valuable automated analysis of male infertility. The source code is publicly available at https://github.com/AImageLab-zip/TesticulUS/.

2025 Relazione in Atti di Convegno

IRIS

Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation

Authors: Sanguigni, Fulvio; Morelli, Davide; Cornia, Marcella; Cucchiara, Rita

Published in: PROCEEDINGS OF ... INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS

In recent years, the fashion industry has increasingly adopted AI technologies to enhance customer experience, driven by the proliferation of … (Read full abstract)

In recent years, the fashion industry has increasingly adopted AI technologies to enhance customer experience, driven by the proliferation of e-commerce platforms and virtual applications. Among the various tasks, virtual try-on and multimodal fashion image editing – which utilizes diverse input modalities such as text, garment sketches, and body poses – have become a key area of research. Diffusion models have emerged as a leading approach for such generative tasks, offering superior image quality and diversity. However, most existing virtual try-on methods rely on having a specific garment input, which is often impractical in real-world scenarios where users may only provide textual specifications. To address this limitation, in this work we introduce Fashion Retrieval-Augmented Generation (Fashion-RAG), a novel method that enables the customization of fashion items based on user preferences provided in textual form. Our approach retrieves multiple garments that match the input specifications and generates a personalized image by incorporating attributes from the retrieved items. To achieve this, we employ textual inversion techniques, where retrieved garment images are projected into the textual embedding space of the Stable Diffusion text encoder, allowing seamless integration of retrieved elements into the generative process. Experimental results on the Dress Code dataset demonstrate that Fashion-RAG outperforms existing methods both qualitatively and quantitatively, effectively capturing fine-grained visual details from retrieved garments. To the best of our knowledge, this is the first work to introduce a retrieval-augmented generation approach specifically tailored for multimodal fashion image editing.

2025 Relazione in Atti di Convegno

DOI IRIS

Foundation Models for Hepatocellular Carcinoma: Challenges in Generalization under Data Scarcity

Authors: Corso, Giulia; Lovino, Marta; Akpinar, Reha; Di Tommaso, Luca; Ficarra, Elisa; Ranzini, Marta

Published in: PROCEEDINGS OF ... INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS

2025 Relazione in Atti di Convegno

DOI IRIS

From raw data to research-ready: A FHIR-based transformation pipeline in a real-world oncology setting

Authors: Carbonaro, Antonella; Giorgetti, Luca; Ridolfi, Lorenzo; Pasolini, Roberto; Pagliarani, Andrea; Cavallucci, Martina; Andalò, Alice; Del Gaudio, Livia; De Angelis, Paolo; Vespignani, Roberto; Gentili, Nicola

Published in: COMPUTERS IN BIOLOGY AND MEDICINE

The exponential growth of healthcare data, driven by advancements in medical research and digital health technologies, has underscored the critical … (Read full abstract)

The exponential growth of healthcare data, driven by advancements in medical research and digital health technologies, has underscored the critical need for interoperability and standardization. However, the heterogeneous nature of real-world clinical data poses significant challenges to ensuring seamless data exchange and secondary use for research purposes. These challenges include syntactic inconsistencies (e.g., variable use of terminologies like ICD-10 vs SNOMED CT), semantic mismatches (e.g., differing conceptualizations of disease staging across institutions), and structural fragmentation (e.g., laboratory results encoded in free text rather than structured fields). Fast Healthcare Interoperability Resources (FHIR) has emerged as a leading standard for structuring and harmonizing healthcare data, enabling integration across diverse systems. This work presents a FHIR-based transformation pipeline that leverages Resource Description Framework (RDF) to convert raw, conceptually heterogeneous oncology data into research-ready, semantically enriched datasets. By representing FHIR resources as RDF graphs, our approach enables semantic interoperability, enhances data linkage across heterogeneous sources, and supports automated reasoning through ontology-based queries and inference mechanisms. The pipeline employs a templated conversion strategy, allowing for the declarative definition of mappings that enable domain experts to focus on the data model. In Cancer Virtual Lab, we applied this methodology to a real-world oncology dataset comprising 36,335 anonymized patient records, successfully converting 1,093,705 clinical records into 1,151,559 distinct RDF-based FHIR resource types. The process incorporated syntactic and semantic validation, along with expert review, to ensure technical correctness and clinical relevance. Our results demonstrate the feasibility of semantically integrating oncology data using FHIR and RDF, fostering machine-readable, interoperable knowledge representation. This enriched representation supports data quality monitoring and improvement, data harmonization, longitudinal analysis, advanced analytics, and AI-driven decision support, promoting large-scale secondary use.

2025 Articolo su rivista

DOI IRIS

Hallucination Early Detection in Diffusion Models

Authors: Betti, Federico; Baraldi, Lorenzo; Baraldi, Lorenzo; Cucchiara, Rita; Sebe, Nicu

Published in: INTERNATIONAL JOURNAL OF COMPUTER VISION

2025 Articolo su rivista

IRIS

How to Train Your Metamorphic Deep Neural Network

Authors: Sommariva, Thomas; Calderara, Simone; Porrello, Angelo

2025 Relazione in Atti di Convegno

IRIS

Hyperbolic Safety-Aware Vision-Language Models

Authors: Poppi, Tobia; Kasarla, Tejaswi; Mettes, Pascal; Baraldi, Lorenzo; Cucchiara, Rita

2025 Relazione in Atti di Convegno

DOI IRIS

IM-Fuse: A Mamba-based Fusion Block for Brain Tumor Segmentation with Incomplete Modalities

Authors: Pipoli, Vittorio; Saporita, Alessia; Marchesini, Kevin; Grana, Costantino; Ficarra, Elisa; Bolelli, Federico

Brain tumor segmentation is a crucial task in medical imaging that involves the integrated modeling of four distinct imaging modalities … (Read full abstract)

Brain tumor segmentation is a crucial task in medical imaging that involves the integrated modeling of four distinct imaging modalities to identify tumor regions accurately. Unfortunately, in real-life scenarios, the full availability of such four modalities is often violated due to scanning cost, time, and patient condition. Consequently, several deep learning models have been developed to address the challenge of brain tumor segmentation under conditions of missing imaging modalities. However, the majority of these models have been evaluated using the 2018 version of the BraTS dataset, which comprises only $285$ volumes. In this study, we reproduce and extensively analyze the most relevant models using BraTS2023, which includes 1,250 volumes, thereby providing a more comprehensive and reliable comparison of their performance. Furthermore, we propose and evaluate the adoption of Mamba as an alternative fusion mechanism for brain tumor segmentation in the presence of missing modalities. Experimental results demonstrate that transformer-based architectures achieve leading performance on BraTS2023, outperforming purely convolutional models that were instead superior in BraTS2018. Meanwhile, the proposed Mamba-based architecture exhibits promising performance in comparison to state-of-the-art models, competing and even outperforming transformers. The source code of the proposed approach is publicly released alongside the benchmark developed for the evaluation: https://github.com/AImageLab-zip/IM-Fuse.

2025 Relazione in Atti di Convegno

DOI IRIS