Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

VITON-GT: An Image-based Virtual Try-On Model with Geometric Transformations

Authors: Fincato, Matteo; Landi, Federico; Cornia, Marcella; Cesari, Fabio; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

The large spread of online shopping has led computer vision researchers to develop different solutions for the fashion domain to … (Read full abstract)

The large spread of online shopping has led computer vision researchers to develop different solutions for the fashion domain to potentially increase the online user experience and improve the efficiency of preparing fashion catalogs. Among them, image-based virtual try-on has recently attracted a lot of attention resulting in several architectures that can generate a new image of a person wearing an input try-on garment in a plausible and realistic way. In this paper, we present VITON-GT, a new model for virtual try-on that generates high-quality and photo-realistic images thanks to multiple geometric transformations. In particular, our model is composed of a two-stage geometric transformation module that performs two different projections on the input garment, and a transformation-guided try-on module that synthesizes the new image. We experimentally validate the proposed solution on the most common dataset for this task, containing mainly t-shirts, and we demonstrate its effectiveness compared to different baselines and previous methods. Additionally, we assess the generalization capabilities of our model on a new set of fashion items composed of upper-body clothes from different categories. To the best of our knowledge, we are the first to test virtual try-on architectures in this challenging experimental setting.

2021 Relazione in Atti di Convegno

Watch Your Strokes: Improving Handwritten Text Recognition with Deformable Convolutions

Authors: Cojocaru, Iulian; Cascianelli, Silvia; Baraldi, Lorenzo; Corsini, Massimiliano; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Handwritten Text Recognition (HTR) in free-layout pages is a valuable yet challenging task which aims to automatically understand handwritten texts. … (Read full abstract)

Handwritten Text Recognition (HTR) in free-layout pages is a valuable yet challenging task which aims to automatically understand handwritten texts. State-of-the-art approaches in this field usually encode input images with Convolutional Neural Networks, whose kernels are typically defined on a fixed grid and focus on all input pixels independently. However, this is in contrast with the sparse nature of handwritten pages, in which only pixels representing the ink of the writing are useful for the recognition task. Furthermore, the standard convolution operator is not explicitly designed to take into account the great variability in shape, scale, and orientation of handwritten characters. To overcome these limitations, we investigate the use of deformable convolutions for handwriting recognition. This type of convolution deform the convolution kernel according to the content of the neighborhood, and can therefore be more adaptable to geometric variations and other deformations of the text. Experiments conducted on the IAM and RIMES datasets demonstrate that the use of deformable convolutions is a promising direction for the design of novel architectures for handwritten text recognition.

2021 Relazione in Atti di Convegno

Whitening for Self-Supervised Representation Learning

Authors: Ermolov, A.; Siarohin, A.; Sangineto, E.; Sebe, N.

Published in: PROCEEDINGS OF MACHINE LEARNING RESEARCH

2021 Relazione in Atti di Convegno

Working Memory Connections for LSTM

Authors: Landi, Federico; Baraldi, Lorenzo; Cornia, Marcella; Cucchiara, Rita

Published in: NEURAL NETWORKS

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when … (Read full abstract)

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.

2021 Articolo su rivista

[123 I] Metaiodobenzylguanidine (MIBG) Cardiac Scintigraphy and Automated Classification Techniques in Parkinsonian Disorders

Authors: Nuvoli, Susanna; Spanu, Angela; Fravolini Mario, Luca; Bianconi, Francesco; Cascianelli, Silvia; Madeddu, Giuseppe; Palumbo, Barbara

Published in: MOLECULAR IMAGING AND BIOLOGY

Purpose: To provide reliable and reproducible heart/mediastinum (H/M) ratio cut-off values for parkinsonian disorders using two machine learning techniques, Support … (Read full abstract)

Purpose: To provide reliable and reproducible heart/mediastinum (H/M) ratio cut-off values for parkinsonian disorders using two machine learning techniques, Support Vector Machines (SVM) and Random Forest (RF) classifier, applied to [123I]MIBG cardiac scintigraphy. Procedures: We studied 85 subjects, 50 with idiopathic Parkinson’s disease, 26 with atypical Parkinsonian syndromes (P), and 9 with essential tremor (ET). All patients underwent planar early and delayed cardiac scintigraphy after [123I]MIBG (111 MBq) intravenous injection. Images were evaluated both qualitatively and quantitatively; the latter by the early and delayed H/M ratio obtained from regions of interest (ROIt1 and ROIt2) drawn on planar images. SVM and RF classifiers were finally used to obtain the correct cut-off value. Results: SVM and RF produced excellent classification performances: SVM classifier achieved perfect classification and RF also attained very good accuracy. The better cut-off for H/M value was 1.55 since it remains the same for both ROIt1 and ROIt2. This value allowed to correctly classify PD from P and ET: patients with H/M ratio less than 1.55 were classified as PD while those with values higher than 1.55 were considered as affected by parkinsonism and/or ET. No difference was found when early or late H/M ratio were considered separately thus suggesting that a single early evaluation could be sufficient to obtain the final diagnosis. Conclusions: Our results evidenced that the use of SVM and CT permitted to define the better cut-off value for H/M ratios both in early and in delayed phase thus underlining the role of [123I]MIBG cardiac scintigraphy and the effectiveness of H/M ratio in differentiating PD from other parkinsonism or ET. Moreover, early scans alone could be used for a reliable diagnosis since no difference was found between early and late. Definitely, a larger series of cases is needed to confirm this data.

2020 Articolo su rivista

25th international conference on pattern recognition

Authors: Cucchiara, R.; Bimbo, A. D.; Sclaroff, S.

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

2020 Curatela

A Transformer-Based Network for Dynamic Hand Gesture Recognition

Authors: D’Eusanio, Andrea; Simoni, Alessandro; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Transformer-based neural networks represent a successful self-attention mechanism that achieves state-of-the-art results in language understanding and sequence modeling. However, their … (Read full abstract)

Transformer-based neural networks represent a successful self-attention mechanism that achieves state-of-the-art results in language understanding and sequence modeling. However, their application to visual data and, in particular, to the dynamic hand gesture recognition task has not yet been deeply investigated. In this paper, we propose a transformer-based architecture for the dynamic hand gesture recognition task. We show that the employment of a single active depth sensor, specifically the usage of depth maps and the surface normals estimated from them, achieves state-of-the-art results, overcoming all the methods available in the literature on two automotive datasets, namely NVidia Dynamic Hand Gesture and Briareo. Moreover, we test the method with other data types available with common RGB-D devices, such as infrared and color data. We also assess the performance in terms of inference time and number of parameters, showing that the proposed framework is suitable for an online in-car infotainment system.

2020 Relazione in Atti di Convegno

A Unified Cycle-Consistent Neural Model for Text and Image Retrieval

Authors: Cornia, Marcella; Baraldi, Lorenzo; Tavakoli, Hamed R.; Cucchiara, Rita

Published in: MULTIMEDIA TOOLS AND APPLICATIONS

Text-image retrieval has been recently becoming a hot-spot research field, thanks to the development of deeply-learnable architectures which can retrieve … (Read full abstract)

Text-image retrieval has been recently becoming a hot-spot research field, thanks to the development of deeply-learnable architectures which can retrieve visual items given textual queries and vice-versa. The key idea of many state-of-the-art approaches has been that of learning a joint multi-modal embedding space in which text and images could be projected and compared. Here we take a different approach and reformulate the problem of text-image retrieval as that of learning a translation between the textual and visual domain. Our proposal leverages an end-to-end trainable architecture that can translate text into image features and vice versa and regularizes this mapping with a cycle-consistency criterion. Experimental evaluations for text-to-image and image-to-text retrieval, conducted on small, medium and large-scale datasets show consistent improvements over the baselines, thus confirming the appropriateness of using a cycle-consistent constrain for the text-image matching task.

2020 Articolo su rivista

A Warp Speed Chain-Code Algorithm Based on Binary Decision Trees

Authors: Allegretti, Stefano; Bolelli, Federico; Grana, Costantino

Contours extraction, also known as chain-code extraction, is one of the most common algorithms of binary image processing. Despite being … (Read full abstract)

Contours extraction, also known as chain-code extraction, is one of the most common algorithms of binary image processing. Despite being the raster way the most cache friendly and, consequently, fast way to scan an image, most commonly used chain-code algorithms perform contours tracing, and therefore tend to be fairly inefficient. In this paper, we took a rarely used algorithm that extracts contours in raster scan, and optimized its execution time through template functions, look-up tables and decision trees, in order to reduce code branches and the average number of load/store operations required. The result is a very fast solution that outspeeds the state-of-the-art contours extraction algorithm implemented in OpenCV, on a collection of real case datasets. Contribution: This paper significantly improves the performance of existing chain-code algorithms, by smartly introducing decision trees to reduce code branches and the average number of load/store operations required.

2020 Relazione in Atti di Convegno

Ai4ar: An ai-based mobile application for the automatic generation of ar contents

Authors: Pierdicca, R.; Paolanti, M.; Frontoni, E.; Baraldi, L.

Published in: LECTURE NOTES IN ARTIFICIAL INTELLIGENCE

Augmented reality (AR) is the process of using technology to superimpose images, text or sounds on top of what a … (Read full abstract)

Augmented reality (AR) is the process of using technology to superimpose images, text or sounds on top of what a person can already see. Art galleries and museums started to develop AR applications to increase engagement and provide an entirely new kind of exploration experience. However, the creation of contents results a very time consuming process, thus requiring an ad-hoc development for each painting to be increased. In fact, for the creation of an AR experience on any painting, it is necessary to choose the points of interest, to create digital content and then to develop the application. If this is affordable for the great masterpieces of an art gallery, it would be impracticable for an entire collection. In this context, the idea of this paper is to develop AR applications based on Artificial Intelligence. In particular, automatic captioning techniques are the key core for the implementation of AR application for improving the user experience in front of a painting or an artwork in general. The study has demonstrated the feasibility through a proof of concept application, implemented for hand held devices, and adds to the body of knowledge in mobile AR application as this approach has not been applied in this field before.

2020 Relazione in Atti di Convegno

Page 42 of 110 • Total publications: 1100