Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition

Authors: Cascianelli, Silvia; Pippi, Vittorio; Maarand, Martin; Cornia, Marcella; Baraldi, Lorenzo; Kermorvant, Christopher; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Handwritten Text Recognition (HTR) is an open problem at the intersection of Computer Vision and Natural Language Processing. The main … (Read full abstract)

Handwritten Text Recognition (HTR) is an open problem at the intersection of Computer Vision and Natural Language Processing. The main challenges, when dealing with historical manuscripts, are due to the preservation of the paper support, the variability of the handwriting – even of the same author over a wide time-span – and the scarcity of data from ancient, poorly represented languages. With the aim of fostering the research on this topic, in this paper we present the Ludovico Antonio Muratori (LAM) dataset, a large line-level HTR dataset of Italian ancient manuscripts edited by a single author over 60 years. The dataset comes in two configurations: a basic splitting and a date-based splitting which takes into account the age of the author. The first setting is intended to study HTR on ancient documents in Italian, while the second focuses on the ability of HTR systems to recognize text written by the same writer in time periods for which training data are not available. For both configurations, we analyze quantitative and qualitative characteristics, also with respect to other line-level HTR benchmarks, and present the recognition performance of state-of-the-art HTR architectures. The dataset is available for download at https://aimagelab.ing.unimore.it/go/lam.

2022 Relazione in Atti di Convegno

The Unreasonable Effectiveness of CLIP features for Image Captioning: an Experimental Analysis

Authors: Barraco, Manuele; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita

Published in: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS

Generating textual descriptions from visual inputs is a fundamental step towards machine intelligence, as it entails modeling the connections between … (Read full abstract)

Generating textual descriptions from visual inputs is a fundamental step towards machine intelligence, as it entails modeling the connections between the visual and textual modalities. For years, image captioning models have relied on pre-trained visual encoders and object detectors, trained on relatively small sets of data. Recently, it has been observed that large-scale multi-modal approaches like CLIP (Contrastive Language-Image Pre-training), trained on a massive amount of image-caption pairs, provide a strong zero-shot capability on various vision tasks. In this paper, we study the advantage brought by CLIP in image captioning, employing it as a visual encoder. Through extensive experiments, we show how CLIP can significantly outperform widely-used visual encoders and quantify its role under different architectures, variants, and evaluation protocols, ranging from classical captioning performance to zero-shot transfer.

2022 Relazione in Atti di Convegno

Transfer without Forgetting

Authors: Boschini, Matteo; Bonicelli, Lorenzo; Porrello, Angelo; Bellitto, Giovanni; Pennisi, Matteo; Palazzo, Simone; Spampinato, Concetto; Calderara, Simone

Published in: LECTURE NOTES IN COMPUTER SCIENCE

This work investigates the entanglement between Continual Learning (CL) and Transfer Learning (TL). In particular, we shed light on the … (Read full abstract)

This work investigates the entanglement between Continual Learning (CL) and Transfer Learning (TL). In particular, we shed light on the widespread application of network pretraining, highlighting that it is itself subject to catastrophic forgetting. Unfortunately, this issue leads to the under-exploitation of knowledge transfer during later tasks. On this ground, we propose Transfer without Forgetting (TwF), a hybrid Continual Transfer Learning approach building upon a fixed pretrained sibling network, which continuously propagates the knowledge inherent in the source domain through a layer-wise loss term. Our experiments indicate that TwF steadily outperforms other CL methods across a variety of settings, averaging a 4.81% gain in Class-Incremental accuracy over a variety of datasets and different buffer sizes.

2022 Relazione in Atti di Convegno

Transform, Warp, and Dress: A New Transformation-Guided Model for Virtual Try-On

Authors: Fincato, Matteo; Cornia, Marcella; Landi, Federico; Cesari, Fabio; Cucchiara, Rita

Published in: ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS

Virtual try-on has recently emerged in computer vision and multimedia communities with the development of architectures that can generate realistic … (Read full abstract)

Virtual try-on has recently emerged in computer vision and multimedia communities with the development of architectures that can generate realistic images of a target person wearing a custom garment. This research interest is motivated by the large role played by e-commerce and online shopping in our society. Indeed, the virtual try-on task can offer many opportunities to improve the efficiency of preparing fashion catalogs and to enhance the online user experience. The problem is far to be solved: current architectures do not reach sufficient accuracy with respect to manually generated images and can only be trained on image pairs with a limited variety. Existing virtual try-on datasets have two main limits: they contain only female models, and all the images are available only in low resolution. This not only affects the generalization capabilities of the trained architectures but makes the deployment to real applications impractical. To overcome these issues, we present Dress Code, a new dataset for virtual try-on that contains high-resolution images of a large variety of upper-body clothes and both male and female models. Leveraging this enriched dataset, we propose a new model for virtual try-on capable of generating high-quality and photo-realistic images using a three-stage pipeline. The first two stages perform two different geometric transformations to warp the desired garment and make it fit into the target person's body pose and shape. Then, we generate the new image of that same person wearing the try-on garment using a generative network. We test the proposed solution on the most widely used dataset for this task as well as on our newly collected dataset and demonstrate its effectiveness when compared to current state-of-the-art methods. Through extensive analyses on our Dress Code dataset, we show the adaptability of our model, which can generate try-on images even with a higher resolution.

2022 Articolo su rivista

Unsupervised Detection of Dynamic Hand Gestures from Leap Motion Data

Authors: D'Eusanio, A.; Pini, S.; Borghi, G.; Simoni, A.; Vezzani, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

The effective and reliable detection and classification of dynamic hand gestures is a key element for building Natural User Interfaces, … (Read full abstract)

The effective and reliable detection and classification of dynamic hand gestures is a key element for building Natural User Interfaces, systems that allow the users to interact using free movements of their body instead of traditional mechanical tools. However, methods that temporally segment and classify dynamic gestures usually rely on a great amount of labeled data, including annotations regarding the class and the temporal segmentation of each gesture. In this paper, we propose an unsupervised approach to train a Transformer-based architecture that learns to detect dynamic hand gestures in a continuous temporal sequence. The input data is represented by the 3D position of the hand joints, along with their speed and acceleration, collected through a Leap Motion device. Experimental results show a promising accuracy on both the detection and the classification task and that only limited computational power is required, confirming that the proposed method can be applied in real-world applications.

2022 Relazione in Atti di Convegno

Unsupervised High-Resolution Portrait Gaze Correction and Animation

Authors: Zhang, J.; Chen, J.; Tang, H.; Sangineto, E.; Wu, P.; Yan, Y.; Sebe, N.; Wang, W.

Published in: IEEE TRANSACTIONS ON IMAGE PROCESSING

This paper proposes a gaze correction and animation method for high-resolution, unconstrained portrait images, which can be trained without the … (Read full abstract)

This paper proposes a gaze correction and animation method for high-resolution, unconstrained portrait images, which can be trained without the gaze angle and the head pose annotations. Common gaze-correction methods usually require annotating training data with precise gaze, and head pose information. Solving this problem using an unsupervised method remains an open problem, especially for high-resolution face images in the wild, which are not easy to annotate with gaze and head pose labels. To address this issue, we first create two new portrait datasets: CelebGaze (256 × 256) and high-resolution CelebHQGaze (512 × 512). Second, we formulate the gaze correction task as an image inpainting problem, addressed using a Gaze Correction Module (GCM) and a Gaze Animation Module (GAM). Moreover, we propose an unsupervised training strategy, i.e., Synthesis-As-Training, to learn the correlation between the eye region features and the gaze angle. As a result, we can use the learned latent space for gaze animation with semantic interpolation in this space. Moreover, to alleviate both the memory and the computational costs in the training and the inference stage, we propose a Coarse-to-Fine Module (CFM) integrated with GCM and GAM. Extensive experiments validate the effectiveness of our method for both the gaze correction and the gaze animation tasks in both low and high-resolution face datasets in the wild and demonstrate the superiority of our method with respect to the state of the art.

2022 Articolo su rivista

Warp and Learn: Novel Views Generation for Vehicles and Other Objects

Authors: Palazzi, Andrea; Bergamini, Luca; Calderara, Simone; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

In this work we introduce a new self-supervised, semi-parametric approach for synthesizing novel views of a vehicle starting from a … (Read full abstract)

In this work we introduce a new self-supervised, semi-parametric approach for synthesizing novel views of a vehicle starting from a single monocular image.Differently from parametric (i.e. entirely learning-based) methods, we show how a-priori geometric knowledge about the object and the 3D world can be successfully integrated into a deep learning based image generation framework. As this geometric component is not learnt, we call our approach semi-parametric.In particular, we exploit man-made object symmetry and piece-wise planarity to integrate rich a-priori visual information into the novel viewpoint synthesis process. An Image Completion Network (ICN) is then trained to generate a realistic image starting from this geometric guidance.This blend between parametric and non-parametric components allows us to i) operate in a real-world scenario, ii) preserve high-frequency visual information such as textures, iii) handle truly arbitrary 3D roto-translations of the input and iv) perform shape transfer to completely different 3D models. Eventually, we show that our approach can be easily complemented with synthetic data and extended to other rigid objects with completely different topology, even in presence of concave structures and holes.A comprehensive experimental analysis against state-of-the-art competitors shows the efficacy of our method both from a quantitative and a perceptive point of view.

2022 Articolo su rivista

Wind Turbine Power Curve Monitoring Based on Environmental and Operational Data

Authors: Cascianelli, S.; Astolfi, D.; Castellani, F.; Cucchiara, R.; Fravolini, M. L.

Published in: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS

The power produced by a wind turbine depends on environmental conditions, working parameters, and interactions with nearby turbines. However, these … (Read full abstract)

The power produced by a wind turbine depends on environmental conditions, working parameters, and interactions with nearby turbines. However, these aspects are often neglected in the design of data-driven models for wind farms' performance analysis. In this article, we propose to predict the active power and to provide reliable prediction intervals via ensembles of multivariate polynomial regression models that exploit a higher number of inputs (compared to most approaches in the literature), including operational and thermal variables. We present two main strategies: the former considers the environmental measurements collected at the other wind turbines in the farm as additional modeling information for the turbine under analysis; the latter combines multiple models relative to different operative conditions. We validate our approach on real data from the SCADA system of a wind farm in Italy and obtain a MAE of the order of 1.0% of the rated power of the turbine. Moreover, due to the structure of our approach, we can gain quantitative insights on the covariates most frequently selected depending on the working region of the wind turbines.

2022 Articolo su rivista

A Bayesian approach to Expert Gate Incremental Learning

Authors: Mieuli, V.; Ponzio, F.; Mascolini, A.; Macii, E.; Ficarra, E.; Di Cataldo, S.

Published in: PROCEEDINGS OF ... INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS

Incremental learning involves Machine Learning paradigms that dynamically adjust their previous knowledge whenever new training samples emerge. To address the … (Read full abstract)

Incremental learning involves Machine Learning paradigms that dynamically adjust their previous knowledge whenever new training samples emerge. To address the problem of multi-task incremental learning without storing any samples of the previous tasks, the so-called Expert Gate paradigm was proposed, which consists of a Gate and a downstream network of task-specific CNNs, a.k.a. the Experts. The gate forwards the input to a certain expert, based on the decision made by a set of autoencoders. Unfortunately, as a CNN is intrinsically incapable of dealing with inputs of a class it was not specifically trained on, the activation of the wrong expert will invariably end into a classification error. To address this issue, we propose a probabilistic extension of the classic Expert Gate paradigm. Exploiting the prediction uncertainty estimations provided by Bayesian Convolutional Neural Networks (B-CNNs), the proposed paradigm is able to either reduce, or correct at a later stage, wrong decisions of the gate. The goodness of our approach is shown by experimental comparisons with state-of-the-art incremental learning methods.

2021 Relazione in Atti di Convegno

A Cone Beam Computed Tomography Annotation Tool for Automatic Detection of the Inferior Alveolar Nerve Canal

Authors: Mercadante, Cristian; Cipriano, Marco; Bolelli, Federico; Pollastri, Federico; Di Bartolomeo, Mattia; Anesi, Alexandre; Grana, Costantino

In recent years, deep learning has been employed in several medical fields, achieving impressive results. Unfortunately, these algorithms require a … (Read full abstract)

In recent years, deep learning has been employed in several medical fields, achieving impressive results. Unfortunately, these algorithms require a huge amount of annotated data to ensure the correct learning process. When dealing with medical imaging, collecting and annotating data can be cumbersome and expensive. This is mainly related to the nature of data, often three-dimensional, and to the need for well-trained expert technicians. In maxillofacial imagery, recent works have been focused on the detection of the Inferior Alveolar Nerve (IAN), since its position is of great relevance for avoiding severe injuries during surgery operations such as third molar extraction or implant installation. In this work, we introduce a novel tool for analyzing and labeling the alveolar nerve from Cone Beam Computed Tomography (CBCT) 3D volumes.

2021 Relazione in Atti di Convegno

Page 31 of 106 • Total publications: 1059