Publications - AImageLab

Enabling On-Device Continual Learning with Binary Neural Networks and Latent Replay

Authors: Vorabbi, Lorenzo; Maltoni, Davide; Borghi, Guido; Santi, Stefano

On-device learning remains a formidable challenge, especially when dealing with resource-constrained devices that have limited computational capabilities. This challenge is … (Read full abstract)

On-device learning remains a formidable challenge, especially when dealing with resource-constrained devices that have limited computational capabilities. This challenge is primarily rooted in two key issues: first, the memory available on embedded devices is typically insufficient to accommodate the memory-intensive back-propagation algorithm, which often relies on floating-point precision. Second, the development of learning algorithms on models with extreme quantization levels, such as Binary Neural Networks (BNNs), is critical due to the drastic reduction in bit representation. In this study, we propose a solution that combines recent advancements in the field of Continual Learning (CL) and Binary Neural Networks to enable on-device training while maintaining competitive performance. Specifically, our approach leverages binary latent replay (LR) activations and a novel quantization scheme that significantly reduces the number of bits required for gradient computation. The experimental validation demonstrates a significant accuracy improvement in combination with a noticeable reduction in memory requirement, confirming the suitability of our approach in expanding the practical applications of deep learning in real-world scenarios.

2024 Relazione in Atti di Convegno

DOI IRIS

Enhancing Patch-Based Learning for the Segmentation of the Mandibular Canal

Authors: Lumetti, Luca; Pipoli, Vittorio; Bolelli, Federico; Ficarra, Elisa; Grana, Costantino

Published in: IEEE ACCESS

Segmentation of the Inferior Alveolar Canal (IAC) is a critical aspect of dentistry and maxillofacial imaging, garnering considerable attention in … (Read full abstract)

Segmentation of the Inferior Alveolar Canal (IAC) is a critical aspect of dentistry and maxillofacial imaging, garnering considerable attention in recent research endeavors. Deep learning techniques have shown promising results in this domain, yet their efficacy is still significantly hindered by the limited availability of 3D maxillofacial datasets. An inherent challenge is posed by the size of input volumes, which necessitates a patch-based processing approach that compromises the neural network performance due to the absence of global contextual information. This study introduces a novel approach that harnesses the spatial information within the extracted patches and incorporates it into a Transformer architecture, thereby enhancing the segmentation process through the use of prior knowledge about the patch location. Our method significantly improves the Dice score by a factor of 4 points, with respect to the previous work proposed by Cipriano et al., while also reducing the training steps required by the entire pipeline. By integrating spatial information and leveraging the power of Transformer architectures, this research not only advances the accuracy of IAC segmentation, but also streamlines the training process, offering a promising direction for improving dental and maxillofacial image analysis.

2024 Articolo su rivista

DOI IRIS

Face Restoration for Morphed Images Retouching

Authors: Di Domenico, Nicolò; Borghi, Guido; Franco, Annalisa; Maltoni, Davide

2024 Relazione in Atti di Convegno

DOI IRIS

Fault Diagnosis and Identification in AGVs System

Authors: Bertoli, A.; Battilani, N.; Fantuzzi, C.

Published in: IFAC PAPERSONLINE

This article describes a methodology for the diagnosis of failures in multi-AGV (Automatic Guided Vehicles). Today, AGVs are establishing themselves … (Read full abstract)

This article describes a methodology for the diagnosis of failures in multi-AGV (Automatic Guided Vehicles). Today, AGVs are establishing themselves in the most advanced automatic logistics solutions, providing performance and safety that cannot be achieved with handling solutions with manual forklifts. Furthermore, thanks to the application of Industry 4.0 digital technologies, very advanced tools are available to monitor the performance and diagnose faults of fleets of AGV. In particular, studies on fault diagnosis have mainly focused on (1) the diagnosis of internal components of the automatic truck and (2) the identification of failures in the functionality of the AGV in its interaction with the surrounding environment. This paper shows an approach to fault diagnosis in multi-AGVs system, considering the interaction between each single AGV and the environment, with the scope to help the user increase the system efficiency in an existing layout. The objective of the paper is to introduce and discuss a methodology to study the failure and the available recovery actions of the AGV navigation system. Moreover, the paper presents the real AGV data acquisition and processing architecture actually deployed on the factory shop floor, as well as the result from the experimental study in a real industrial environment. Copyright (c) 2024 The Authors. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)

2024

DOI IRIS

Fluent and Accurate Image Captioning with a Self-Trained Reward Model

Authors: Moratelli, Nicholas; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Fine-tuning image captioning models with hand-crafted rewards like the CIDEr metric has been a classical strategy for promoting caption quality … (Read full abstract)

Fine-tuning image captioning models with hand-crafted rewards like the CIDEr metric has been a classical strategy for promoting caption quality at the sequence level. This approach, however, is known to limit descriptiveness and semantic richness and tends to drive the model towards the style of ground-truth sentences, thus losing detail and specificity. On the contrary, recent attempts to employ image-text models like CLIP as reward have led to grammatically incorrect and repetitive captions. In this paper, we propose Self-Cap a captioning approach that relies on a learnable reward model based on self-generated negatives that can discriminate captions based on their consistency with the image. Specifically, our discriminator is a fine-tuned contrastive image-text model trained to promote caption correctness while avoiding the aberrations that typically happen when training with a CLIP-based reward. To this end, our discriminator directly incorporates negative samples from a frozen captioner, which significantly improves the quality and richness of the generated captions but also reduces the fine-tuning time in comparison to using the CIDEr score as the sole metric for optimization. Experimental results demonstrate the effectiveness of our training strategy on both standard and zero-shot image captioning datasets.

2024 Relazione in Atti di Convegno

IRIS

FOSSIL: Free Open-Vocabulary Semantic Segmentation through Synthetic References Retrieval

Authors: Barsellotti, Luca; Amoroso, Roberto; Baraldi, Lorenzo; Cucchiara, Rita

2024 Relazione in Atti di Convegno

DOI IRIS

FRCSyn Challenge at WACV 2024:Face Recognition Challenge in the Era of Synthetic Data

Authors: Melzi, Pietro; Tolosana, Ruben; Vera-Rodriguez, Ruben; Kim, Minchul; Rathgeb, Christian; Liu, Xiaoming; DeAndres-Tame, Ivan; Morales, Aythami; Fierrez, Julian; Ortega-Garcia, Javier; Zhao, Weisong; Zhu, Xiangyu; Yan, Zheyu; Zhang, Xiao-Yu; Wu, Jinlin; Lei, Zhen; Tripathi, Suvidha; Kothari, Mahak; Haider Zama, Md; Deb, Debayan; Biesseck, Bernardo; Vidal, Pedro; Granada, Roger; Fickel, Guilherme; Führ, Gustavo; Menotti, David; Unnervik, Alexander; George, Anjith; Ecabert, Christophe; Otroshi Shahreza, Hatef; Rahimi, Parsa; Marcel, Sébastien; Sarridis, Ioannis; Koutlis, Christos; Baltsou, Georgia; Papadopoulos, Symeon; Diou, Christos; Di Domenico, Nicolò; Borghi, Guido; Pellegrini, Lorenzo; Mas-Candela, Enrique; Sánchez-Pérez, Ángela; Atzori, Andrea; Boutros, Fadi; Damer, Naser; Fenu, Gianni; Marras, Mirko

Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are … (Read full abstract)

Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are still several challenges that must be covered in more detail. This paper offers an overview of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first international challenge aiming to explore the use of synthetic data in face recognition to address existing limitations in the technology. Specifically, the FRCSyn Challenge targets concerns related to data privacy issues, demographic biases, generalization to unseen scenarios, and performance limitations in challenging scenarios, including significant age disparities between enrollment and testing, pose variations, and occlusions. The results achieved in the FRCSyn Challenge, together with the proposed benchmark, contribute significantly to the application of synthetic data to improve face recognition technology.

2024 Relazione in Atti di Convegno

DOI IRIS

FRCSyn-onGoing: Benchmarking and comprehensive evaluation of real and synthetic data to improve face recognition systems

Authors: Melzi, Pietro; Tolosana, Ruben; Vera-Rodriguez, Ruben; Kim, Minchul; Rathgeb, Christian; Liu, Xiaoming; DeAndres-Tame, Ivan; Morales, Aythami; Fierrez, Julian; Ortega-Garcia, Javier; Zhao, Weisong; Zhu, Xiangyu; Yan, Zheyu; Zhang, Xiao-Yu; Wu, Jinlin; Lei, Zhen; Tripathi, Suvidha; Kothari, Mahak; Zama, Md Haider; Deb, Debayan; Biesseck, Bernardo; Vidal, Pedro; Granada, Roger; Fickel, Guilherme; Führ, Gustavo; Menotti, David; Unnervik, Alexander; George, Anjith; Ecabert, Christophe; Shahreza, Hatef Otroshi; Rahimi, Parsa; Marcel, Sébastien; Sarridis, Ioannis; Koutlis, Christos; Baltsou, Georgia; Papadopoulos, Symeon; Diou, Christos; Di Domenico, Nicolò; Borghi, Guido; Pellegrini, Lorenzo; Mas-Candela, Enrique; Sánchez-Pérez, Ángela; Atzori, Andrea; Boutros, Fadi; Damer, Naser; Fenu, Gianni; Marras, Mirko

Published in: INFORMATION FUSION

This article presents FRCSyn-onGoing, an ongoing challenge for face recognition where researchers can easily benchmark their systems against the state … (Read full abstract)

This article presents FRCSyn-onGoing, an ongoing challenge for face recognition where researchers can easily benchmark their systems against the state of the art in an open common platform using large-scale public databases and standard experimental protocols. FRCSyn-onGoing is based on the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first face recognition international challenge aiming to explore the use of real and synthetic data independently, and also their fusion, in order to address existing limitations in the technology. Specifically, FRCSyn-onGoing targets concerns related to data privacy issues, demographic biases, generalization to unseen scenarios, and performance limitations in challenging scenarios, including significant age disparities between enrollment and testing, pose variations, and occlusions. To enhance face recognition performance, FRCSyn-onGoing strongly advocates for information fusion at various levels, starting from the input data, where a mix of real and synthetic domains is proposed for specific tasks of the challenge. Additionally, participating teams are allowed to fuse diverse networks within their proposed systems to improve the performance. In this article, we provide a comprehensive evaluation of the face recognition systems and results achieved so far in FRCSyn-onGoing. The results obtained in FRCSyn-onGoing, together with the proposed public ongoing benchmark, contribute significantly to the application of synthetic data to improve face recognition technology.

2024 Articolo su rivista

DOI IRIS

From One to Many Lorikeets: Discovering Image Analogies in the CLIP Space

Authors: Xing, S.; Peruzzo, E.; Sangineto, E.; Sebe, N.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Drawing analogies between two pairs of entities in the form of A:B::C:D (i.e. A is to B as C is to … (Read full abstract)

Drawing analogies between two pairs of entities in the form of A:B::C:D (i.e. A is to B as C is to D) is a hallmark of human intelligence, as evidenced by sufficient findings in cognitive science for the last decades. In recent years, this property has been found far beyond cognitive science. Notable examples are word2vec and GloVe models in natural language processing. Recent research in computer vision also found the property of analogies in the feature space of a pretrained ConvNet feature extractor. However, analogy mining in the semantic space of recent strong foundation models such as CLIP is still understudied, despite the fact that they have been successfully applied to a wide range of downstream tasks. In this work, we show that CLIP possesses the similar ability of analogical reasoning in the latent space, and propose a novel strategy to extract analogies between pairs of images in the CLIP space. We compute all the difference vectors of a pair of any two images that belong to the same class in the CLIP space, and employ k-means clustering to group the difference vectors into clusters irrespective of their classes. This procedure results in cluster centroids representative of class-agnostic semantic analogies between images. Through extensive analysis, we show that the property of drawing analogies between images also exists in the CLIP space, which are interpretable by humans through a visualisation of the learned clusters.

2024 Relazione in Atti di Convegno

DOI IRIS

From One to Many Lorikeets: Discovering Image Analogies in the CLIP Space

Authors: Xing, S.; Peruzzo, E.; Sangineto, E.; Sebe, N.

2024 Relazione in Atti di Convegno

IRIS