Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

ClusterFix: A Cluster-Based Debiasing Approach without Protected-Group Supervision

Authors: Capitani, Giacomo; Bolelli, Federico; Porrello, Angelo; Calderara, Simone; Ficarra, Elisa

The failures of Deep Networks can sometimes be ascribed to biases in the data or algorithmic choices. Existing debiasing approaches … (Read full abstract)

The failures of Deep Networks can sometimes be ascribed to biases in the data or algorithmic choices. Existing debiasing approaches exploit prior knowledge to avoid unintended solutions; we acknowledge that, in real-world settings, it could be unfeasible to gather enough prior information to characterize the bias, or it could even raise ethical considerations. We hence propose a novel debiasing approach, termed ClusterFix, which does not require any external hint about the nature of biases. Such an approach alters the standard empirical risk minimization and introduces a per-example weight, encoding how critical and far from the majority an example is. Notably, the weights consider how difficult it is for the model to infer the correct pseudo-label, which is obtained in a self-supervised manner by dividing examples into multiple clusters. Extensive experiments show that the misclassification error incurred in identifying the correct cluster allows for identifying examples prone to bias-related issues. As a result, our approach outperforms existing methods on standard benchmarks for bias removal and fairness.

2024 Relazione in Atti di Convegno

Compact High-Resolution Multi-Wavelength LED Light Source for Eye Stimulation

Authors: Gibertoni, Giovanni; Borghi, Guido; Rovati, Luigi

Published in: ELECTRONICS

Eye stimulation research plays a critical role in advancing our understanding of visual processing and developing new therapies for visual … (Read full abstract)

Eye stimulation research plays a critical role in advancing our understanding of visual processing and developing new therapies for visual impairments. Despite its importance, researchers and clinicians still face challenges with the availability of cost-effective, precise, and versatile tools for conducting these studies. Therefore, this study introduces a high-resolution, compact, and budget-friendly multi-wavelength LED light source tailored for precise and versatile eye stimulation, addressing the aforementioned needs in medical research and visual science. Accommodating standard 3 mm or 5 mm package LEDs, the system boasts broad compatibility, while its integration with any microcontroller capable of PWM generation and supporting SPI and UART communication ensures adaptability across diverse applications. Operating at high resolution (18 bits or more) with great linearity, the LED light source offers nuanced control for sophisticated eye stimulation protocols. The simple 3D printable optical design allows the coupling of up to seven different wavelengths while ensuring the cost-effectiveness of the device. The system’s output has been designed to be fiber-coupled with standard SMA connectors to be compatible with most solutions. The proposed implementation significantly undercuts the cost of commercially available solutions, providing a viable, budget-friendly option for advancing eye stimulation research.

2024 Articolo su rivista

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

Authors: Baraldi, Lorenzo; Cocchi, Federico; Cornia, Marcella; Baraldi, Lorenzo; Nicolosi, Alessandro; Cucchiara, Rita

Discerning between authentic content and that generated by advanced AI methods has become increasingly challenging. While previous research primarily addresses … (Read full abstract)

Discerning between authentic content and that generated by advanced AI methods has become increasingly challenging. While previous research primarily addresses the detection of fake faces, the identification of generated natural images has only recently surfaced. This prompted the recent exploration of solutions that employ foundation vision-and-language models, like CLIP. However, the CLIP embedding space is optimized for global image-to-text alignment and is not inherently designed for deepfake detection, neglecting the potential benefits of tailored training and local image features. In this study, we propose CoDE (Contrastive Deepfake Embeddings), a novel embedding space specifically designed for deepfake detection. CoDE is trained via contrastive learning by additionally enforcing global-local similarities. To sustain the training of our model, we generate a comprehensive dataset that focuses on images generated by diffusion models and encompasses a collection of 9.2 million images produced by using four different generators. Experimental results demonstrate that CoDE achieves state-of-the-art accuracy on the newly collected dataset, while also showing excellent generalization capabilities to unseen image generators. Our source code, trained models, and collected dataset are publicly available at: https://github.com/aimagelab/CoDE.

2024 Relazione in Atti di Convegno

D-SPDH: Improving 3D Robot Pose Estimation in Sim2Real Scenario via Depth Data

Authors: Simoni, A.; Borghi, G.; Garattoni, L.; Francesca, G.; Vezzani, R.

Published in: IEEE ACCESS

In recent years, there has been a notable surge in the significance attributed to technologies facilitating secure and efficient cohabitation … (Read full abstract)

In recent years, there has been a notable surge in the significance attributed to technologies facilitating secure and efficient cohabitation and collaboration between humans and machines, with a particular interest in robotic systems. A pivotal element in actualizing this novel and challenging collaborative paradigm involves different technical tasks, including the comprehension of 3D poses exhibited by both humans and robots through the utilization of non-intrusive systems, such as cameras. In this scenario, the availability of vision-based systems capable of detecting in real-time the robot's pose is needed as a first step towards a safe and effective interaction to, for instance, avoid collisions. Therefore, in this work, we propose a vision-based system, referred to as D-SPDH, able to estimate the 3D robot pose. The system is based on double-branch architecture and depth data as a single input; any additional information regarding the state of the internal encoders of the robot is not required. The working scenario is the Sim2Real, i.e., the system is trained only with synthetic data and then tested on real sequences, thus eliminating the time-consuming acquisition and annotation procedures of real data, common phases in deep learning algorithms. Moreover, we introduce SimBa++, a dataset featuring both synthetic and real sequences with new real-world double-arm movements, and that represents a challenging setting in which the proposed approach is tested. Experimental results show that our D-SPDH method achieves state-of-the-art and real-time performance, paving the way a possible future non-invasive systems to monitor human-robot interactions.

2024 Articolo su rivista

Differential Morphing Attack Detection via Triplet-Based Metric Learning and Artifact Extraction

Authors: Liu, Chengcheng; Ferrara, Matteo; Franco, Annalisa; Borghi, Guido; Zhong, Dexing

2024 Relazione in Atti di Convegno

Diffusion and Autoregressive Deep Learning models for Transactional Data Generation

Authors: Garuti, Fabrizio; Luetto, Simone; Sangineto Lorenzo Forni, Enver; Cucchiara, Rita

2024 Relazione in Atti di Convegno

Enabling On-Device Continual Learning with Binary Neural Networks and Latent Replay

Authors: Vorabbi, Lorenzo; Maltoni, Davide; Borghi, Guido; Santi, Stefano

On-device learning remains a formidable challenge, especially when dealing with resource-constrained devices that have limited computational capabilities. This challenge is … (Read full abstract)

On-device learning remains a formidable challenge, especially when dealing with resource-constrained devices that have limited computational capabilities. This challenge is primarily rooted in two key issues: first, the memory available on embedded devices is typically insufficient to accommodate the memory-intensive back-propagation algorithm, which often relies on floating-point precision. Second, the development of learning algorithms on models with extreme quantization levels, such as Binary Neural Networks (BNNs), is critical due to the drastic reduction in bit representation. In this study, we propose a solution that combines recent advancements in the field of Continual Learning (CL) and Binary Neural Networks to enable on-device training while maintaining competitive performance. Specifically, our approach leverages binary latent replay (LR) activations and a novel quantization scheme that significantly reduces the number of bits required for gradient computation. The experimental validation demonstrates a significant accuracy improvement in combination with a noticeable reduction in memory requirement, confirming the suitability of our approach in expanding the practical applications of deep learning in real-world scenarios.

2024 Relazione in Atti di Convegno

Enhancing Patch-Based Learning for the Segmentation of the Mandibular Canal

Authors: Lumetti, Luca; Pipoli, Vittorio; Bolelli, Federico; Ficarra, Elisa; Grana, Costantino

Published in: IEEE ACCESS

Segmentation of the Inferior Alveolar Canal (IAC) is a critical aspect of dentistry and maxillofacial imaging, garnering considerable attention in … (Read full abstract)

Segmentation of the Inferior Alveolar Canal (IAC) is a critical aspect of dentistry and maxillofacial imaging, garnering considerable attention in recent research endeavors. Deep learning techniques have shown promising results in this domain, yet their efficacy is still significantly hindered by the limited availability of 3D maxillofacial datasets. An inherent challenge is posed by the size of input volumes, which necessitates a patch-based processing approach that compromises the neural network performance due to the absence of global contextual information. This study introduces a novel approach that harnesses the spatial information within the extracted patches and incorporates it into a Transformer architecture, thereby enhancing the segmentation process through the use of prior knowledge about the patch location. Our method significantly improves the Dice score by a factor of 4 points, with respect to the previous work proposed by Cipriano et al., while also reducing the training steps required by the entire pipeline. By integrating spatial information and leveraging the power of Transformer architectures, this research not only advances the accuracy of IAC segmentation, but also streamlines the training process, offering a promising direction for improving dental and maxillofacial image analysis.

2024 Articolo su rivista

Face Restoration for Morphed Images Retouching

Authors: Di Domenico, Nicolò; Borghi, Guido; Franco, Annalisa; Maltoni, Davide

2024 Relazione in Atti di Convegno

Fault Diagnosis and Identification in AGVs System

Authors: Bertoli, A.; Battilani, N.; Fantuzzi, C.

Published in: IFAC PAPERSONLINE

This article describes a methodology for the diagnosis of failures in multi-AGV (Automatic Guided Vehicles). Today, AGVs are establishing themselves … (Read full abstract)

This article describes a methodology for the diagnosis of failures in multi-AGV (Automatic Guided Vehicles). Today, AGVs are establishing themselves in the most advanced automatic logistics solutions, providing performance and safety that cannot be achieved with handling solutions with manual forklifts. Furthermore, thanks to the application of Industry 4.0 digital technologies, very advanced tools are available to monitor the performance and diagnose faults of fleets of AGV. In particular, studies on fault diagnosis have mainly focused on (1) the diagnosis of internal components of the automatic truck and (2) the identification of failures in the functionality of the AGV in its interaction with the surrounding environment. This paper shows an approach to fault diagnosis in multi-AGVs system, considering the interaction between each single AGV and the environment, with the scope to help the user increase the system efficiency in an existing layout. The objective of the paper is to introduce and discuss a methodology to study the failure and the available recovery actions of the AGV navigation system. Moreover, the paper presents the real AGV data acquisition and processing architecture actually deployed on the factory shop floor, as well as the result from the experimental study in a real industrial environment. Copyright (c) 2024 The Authors. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Page 14 of 106 • Total publications: 1059