Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models

Authors: Poppi, Samuele; Poppi, Tobia; Cocchi, Federico; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Large-scale vision-and-language models, such as CLIP, are typically trained on web-scale data, which can introduce inappropriate content and lead to … (Read full abstract)

Large-scale vision-and-language models, such as CLIP, are typically trained on web-scale data, which can introduce inappropriate content and lead to the development of unsafe and biased behavior. This, in turn, hampers their applicability in sensitive and trustworthy contexts and could raise significant concerns in their adoption. Our research introduces a novel approach to enhancing the safety of vision-and-language models by diminishing their sensitivity to NSFW (not safe for work) inputs. In particular, our methodology seeks to sever "toxic" linguistic and visual concepts, unlearning the linkage between unsafe linguistic or visual items and unsafe regions of the embedding space. We show how this can be done by fine-tuning a CLIP model on synthetic data obtained from a large language model trained to convert between safe and unsafe sentences, and a text-to-image generator. We conduct extensive experiments on the resulting embedding space for cross-modal retrieval, text-to-image, and image-to-text generation, where we show that our model can be remarkably employed with pre-trained generative models. Our source code and trained models are available at: https://github.com/aimagelab/safe-clip.

2024 Relazione in Atti di Convegno

Saliency-driven Experience Replay for Continual Learning

Authors: Bellitto, Giovanni; Proietto Salanitri, Federica; Pennisi, Matteo; Boschini, Matteo; Bonicelli, Lorenzo; Porrello, Angelo; Calderara, Simone; Palazzo, Simone; Spampinato, Concetto

Published in: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS

2024 Relazione in Atti di Convegno

SDFR: Synthetic Data for Face Recognition Competition

Authors: Shahreza, H. O.; Ecabert, C.; George, A.; Unnervik, A.; Marcel, S.; Di Domenico, N.; Borghi, G.; Maltoni, D.; Boutros, F.; Vogel, J.; Damer, N.; Sanchez-Perez, A.; Mas-Candela, E.; Calvo-Zaragoza, J.; Biesseck, B.; Vidal, P.; Granada, R.; Menotti, D.; Deandres-Tame, I.; La Cava, S. M.; Concas, S.; Melzi, P.; Tolosana, R.; Vera-Rodriguez, R.; Perelli, G.; Orru, G.; Marcialis, G. L.; Fierrez, J.

Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns. … (Read full abstract)

Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns. With the recent advances in generative models, recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets. This paper presents the summary of the Synthetic Data for Face Recognition (SDFR) Competition held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024) and established to investigate the use of synthetic data for training face recognition models. The SDFR competition was split into two tasks, allowing participants to train face recognition systems using new synthetic datasets and/or existing ones. In the first task, the face recognition backbone was fixed and the dataset size was limited, while the second task provided almost complete freedom on the model backbone, the dataset, and the training pipeline. The submitted models were trained on existing and also new synthetic datasets and used clever methods to improve training with synthetic data. The submissions were evaluated and ranked on a diverse set of seven benchmarking datasets. The paper gives an overview of the submitted face recognition models and reports achieved performance compared to baseline models trained on real and synthetic datasets. Furthermore, the evaluation of submissions is extended to bias assessment across different demography groups. Lastly, an outlook on the current state of the research in training face recognition models using synthetic data is presented, and existing problems as well as potential future directions are also discussed.

2024 Relazione in Atti di Convegno

Self-Labeling the Job Shop Scheduling Problem

Authors: Corsini, Andrea; Porrello, Angelo; Calderara, Simone; Dell'Amico, Mauro

Published in: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS

This work proposes a self-supervised training strategy designed for combinatorial problems. An obstacle in applying supervised paradigms to such problems … (Read full abstract)

This work proposes a self-supervised training strategy designed for combinatorial problems. An obstacle in applying supervised paradigms to such problems is the need for costly target solutions often produced with exact solvers. Inspired by semi- and self-supervised learning, we show that generative models can be trained by sampling multiple solutions and using the best one according to the problem objective as a pseudo-label. In this way, we iteratively improve the model generation capability by relying only on its self-supervision, eliminating the need for optimality information. We validate this Self-Labeling Improvement Method (SLIM) on the Job Shop Scheduling (JSP), a complex combinatorial problem that is receiving much attention from the neural combinatorial community. We propose a generative model based on the well-known Pointer Network and train it with SLIM. Experiments on popular benchmarks demonstrate the potential of this approach as the resulting models outperform constructive heuristics and state-of-the-art learning proposals for the JSP. Lastly, we prove the robustness of SLIM to various parameters and its generality by applying it to the Traveling Salesman Problem.

2024 Relazione in Atti di Convegno

Spatial Entropy as an Inductive Bias for Vision Transformers

Authors: Peruzzo, Elia; Sangineto, Enver; Liu, Yahui; De Nadai, Marco; Bi, Wei; Lepri, Bruno; Sebe, Nicu

Published in: MACHINE LEARNING

2024 Articolo su rivista

SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective

Authors: Xu, Z.; Xing, S.; Sangineto, E.; Sebe, N.

Published in: IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION

2024 Relazione in Atti di Convegno

Spotting Culex pipiens from satellite: modeling habitat suitability in central Italy using Sentinel-2 and deep learning techniques

Authors: Ippoliti, Carla; Bonicelli, Lorenzo; De Ascentis, Matteo; Tora, Susanna; Di Lorenzo, Alessio; Gerardo D’Alessio, Silvio; Porrello, Angelo; Bonanni, Americo; Cioci, Daniela; Goffredo, Maria; Calderara, Simone; Conte, Annamaria

Published in: FRONTIERS IN VETERINARY SCIENCE

Culex pipiens, an important vector of many vector borne diseases, is a species capable to feeding on a wide variety … (Read full abstract)

Culex pipiens, an important vector of many vector borne diseases, is a species capable to feeding on a wide variety of hosts and adapting to different environments. To predict the potential distribution of Cx. pipiens in central Italy, this study integrated presence/absence data from a four-year entomological survey (2019-2022) carried out in the Abruzzo and Molise regions, with a datacube of spectral bands acquired by Sentinel-2 satellites, as patches of 224 x 224 pixels of 20 meters spatial resolution around each site and for each satellite revisit time. We investigated three scenarios: the baseline model, which considers the environmental conditions at the time of collection; the multitemporal model, focusing on conditions in the 2 months preceding the collection; and the MultiAdjacency Graph Attention Network (MAGAT) model, which accounts for similarities in temperature and nearby sites using a graph architecture. For the baseline scenario, a deep convolutional neural network (DCNN) analyzed a single multi-band Sentinel-2 image. The DCNN in the multitemporal model extracted temporal patterns from a sequence of 10 multispectral images; the MAGAT model incorporated spatial and climatic relationships among sites through a graph neural network aggregation method. For all models, we also evaluated temporal lags between the multi-band Earth Observation datacube date of acquisition and the mosquito collection, from 0 to 50 days. The study encompassed a total of 2,555 entomological collections, and 108,064 images (patches) at 20 meters spatial resolution. The baseline model achieved an F1 score higher than 75.8% for any temporal lag, which increased up to 81.4% with the multitemporal model. The MAGAT model recorded the highest F1 score of 80.9%. The study confirms the widespread presence of Cx. pipiens throughout the majority of the surveyed area. Utilizing only Sentinel-2 spectral bands, the models effectively capture early in advance the temporal patterns of the mosquito population, offering valuable insights for directing surveillance activities during the vector season. The methodology developed in this study can be scaled up to the national territory and extended to other vectors, in order to support the Ministry of Health in the surveillance and control strategies for the vectors and the diseases they transmit.

2024 Articolo su rivista

Sustainable Use of Resources in Hospitals: A Machine Learning-Based Approach to Predict Prolonged Length of Stay at the Time of Admission

Authors: Perliti Scorzoni, Paolo; Giovanetti, Anita; Bolelli, Federico; Grana, Costantino

Introduction. Length of Stay (LOS) and Prolonged Length of Stay (pLOS) are critical indicators of hospital efficiency. Reducing pLOS is … (Read full abstract)

Introduction. Length of Stay (LOS) and Prolonged Length of Stay (pLOS) are critical indicators of hospital efficiency. Reducing pLOS is crucial for patient safety, autonomy, and bed allocation. This study investigates different machine learning (ML) models to predict LOS and pLOS. Methods. We analyzed a dataset of patients discharged from a northern Italian hospital between 2022 and 2023 as a retrospective cohort study. We compared sixteen regression algorithms and twelve classification methods for predicting LOS as either a continuous or multi-class variable (1-3 days, 4-10 days, >10 days). We also evaluated pLOS prediction using the same models, having pLOS defined as any hospitalization with LOS longer than 8 days. We further analyzed all models using two versions of the same dataset: one containing only structured data (e.g. demographics and clinical information), whereas the second one also containing features extracted from free-text diagnosis. Results. Our results indicate that ensemble models achieved the highest prediction accuracy for both LOS and pLOS, outperforming traditional single-algorithm models, particularly when using both structured and unstructured data extracted from diagnoses. Discussion. The integration of ML, particularly ensemble models, can significantly improve LOS prediction and identify patients at increased risk of pLOS. This information can guide healthcare professionals and bed managers in making informed decisions to enhance patient care and optimize resource allocation.

2024 Relazione in Atti di Convegno

The Revolution of Multimodal Large Language Models: A Survey

Authors: Caffagni, Davide; Cocchi, Federico; Barsellotti, Luca; Moratelli, Nicholas; Sarto, Sara; Baraldi, Lorenzo; Baraldi, Lorenzo; Cornia, Marcella; Cucchiara, Rita

Published in: PROCEEDINGS OF THE CONFERENCE - ASSOCIATION FOR COMPUTATIONAL LINGUISTICS. MEETING

Connecting text and visual modalities plays an essential role in generative intelligence. For this reason, inspired by the success of … (Read full abstract)

Connecting text and visual modalities plays an essential role in generative intelligence. For this reason, inspired by the success of large language models, significant research efforts are being devoted to the development of Multimodal Large Language Models (MLLMs). These models can seamlessly integrate visual and textual modalities, while providing a dialogue-based interface and instruction-following capabilities. In this paper, we provide a comprehensive review of recent visual-based MLLMs, analyzing their architectural choices, multimodal alignment strategies, and training techniques. We also conduct a detailed analysis of these models across a wide range of tasks, including visual grounding, image generation and editing, visual understanding, and domain-specific applications. Additionally, we compile and describe training datasets and evaluation benchmarks, conducting comparisons among existing models in terms of performance and computational requirements. Overall, this survey offers a comprehensive overview of the current state of the art, laying the groundwork for future MLLMs.

2024 Relazione in Atti di Convegno

Towards Federated Learning for Morphing Attack Detection

Authors: Robledo-Moreno, M.; Borghi, G.; Di Domenico, N.; Franco, A.; Raja, K.; Maltoni, D.

Through the Face Morphing attack is possible to use the same legal document by two different people, destroying the unique … (Read full abstract)

Through the Face Morphing attack is possible to use the same legal document by two different people, destroying the unique biometric link between the document and its owner. In other words, a morphed face image has the potential to bypass face verification-based security controls, then representing a severe security threat. Unfortunately, the lack of public, extensive and varied training datasets severely hampers the development of effective and robust Morphing Attack Detection (MAD) models, key tools in contrasting the Face Morphing attack since able to automatically detect the presence of morphing images. Indeed, privacy regulations limit the possibility of acquiring, storing, and transferring MAD-related data that contain personal information, such as faces. Therefore, in this paper, we investigate the use of Federated Learning to train a MAD model on local training samples across multiple sites, eliminating the need for a single centralized training dataset, as common in Machine Learning, and then overcoming privacy limitations. Experimental results suggest that FL is a viable solution that will need to be considered in future research works in MAD.

2024 Relazione in Atti di Convegno

Page 18 of 106 • Total publications: 1059