Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

DeepFakes Have No Heart: A Simple rPPG-Based Method to Reveal Fake Videos

Authors: Boccignone, Giuseppe; Bursic, Sathya; Cuculo, Vittorio; D’Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella; Patania, Sabrina

Published in: LECTURE NOTES IN COMPUTER SCIENCE

We present a simple, yet general method to detect fake videos displaying human subjects, generated via Deep Learning techniques. The … (Read full abstract)

We present a simple, yet general method to detect fake videos displaying human subjects, generated via Deep Learning techniques. The method relies on gauging the complexity of heart rate dynamics as derived from the facial video streams through remote photoplethysmography (rPPG). Features analyzed have a clear semantics as to such physiological behaviour. The approach is thus explainable both in terms of the underlying context model and the entailed computational steps. Most important, when compared to more complex state-of-the-art detection methods, results so far achieved give evidence of its capability to cope with datasets produced by different deep fake models.

2022 Relazione in Atti di Convegno

Differential Diagnosis of Alzheimer Disease vs. Mild Cognitive Impairment Based on Left Temporal Lateral Lobe Hypomethabolism on 18F-FDG PET/CT and Automated Classifiers

Authors: Nuvoli, S.; Bianconi, F.; Rondini, M.; Lazzarato, A.; Marongiu, A.; Fravolini, M. L.; Cascianelli, S.; Amici, S.; Filippi, L.; Spanu, A.; Palumbo, B.

Published in: DIAGNOSTICS

Purpose: We evaluate the ability of Artificial Intelligence with automatic classification methods applied to semi-quantitative data from brain F-18-FDG PET/CT … (Read full abstract)

Purpose: We evaluate the ability of Artificial Intelligence with automatic classification methods applied to semi-quantitative data from brain F-18-FDG PET/CT to improve the differential diagnosis between Alzheimer Disease (AD) and Mild Cognitive Impairment (MCI). Procedures: We retrospectively analyzed a total of 150 consecutive patients who underwent diagnostic evaluation for suspected AD (n = 67) or MCI (n = 83). All patients received brain 18F-FDG PET/CT according to the international guidelines, and images were analyzed both Qualitatively (QL) and Quantitatively (QN), the latter by a fully automated post-processing software that produced a z score metabolic map of 25 anatomically different cortical regions. A subset of n = 122 cases with a confirmed diagnosis of AD (n = 53) or MDI (n = 69) by 18-24-month clinical follow-up was finally included in the study. Univariate analysis and three automated classification models (classification tree-ClT-, ridge classifier-RC- and linear Support Vector Machine -lSVM-) were considered to estimate the ability of the z scores to discriminate between AD and MCI cases in. Results: The univariate analysis returned 14 areas where the z scores were significantly different between AD and MCI groups, and the classification accuracy ranged between 74.59% and 76.23%, with ClT and RC providing the best results. The best classification strategy consisted of one single split with a cut-off value of approximate to -2.0 on the z score from temporal lateral left area: cases below this threshold were classified as AD and those above the threshold as MCI. Conclusions: Our findings confirm the usefulness of brain 18F-FDG PET/CT QL and QN analyses in differentiating AD from MCI. Moreover, the combined use of automated classifications models can improve the diagnostic process since its use allows identification of a specific hypometabolic area involved in AD cases in respect to MCI. This data improves the traditional 18F-FDG PET/CT image interpretation and the diagnostic assessment of cognitive disorders.

2022 Abstract in Rivista

Dress Code: High-Resolution Multi-Category Virtual Try-On

Authors: Morelli, Davide; Fincato, Matteo; Cornia, Marcella; Landi, Federico; Cesari, Fabio; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Prior … (Read full abstract)

Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Prior work focuses mainly on upper-body clothes (e.g. t-shirts, shirts, and tops) and neglects full-body or lower-body items. This shortcoming arises from a main factor: current publicly available datasets for image-based virtual try-on do not account for this variety, thus limiting progress in the field. To address this deficiency, we introduce Dress Code, which contains images of multi-category clothes. Dress Code is more than 3x larger than publicly available datasets for image-based virtual try-on and features high-resolution paired images (1024x768) with front-view, full-body reference models. To generate HD try-on images with high visual quality and rich in details, we propose to learn fine-grained discriminating features. Specifically, we leverage a semantic-aware discriminator that makes predictions at pixel-level instead of image- or patch-level. Extensive experimental evaluation demonstrates that the proposed approach surpasses the baselines and state-of-the-art competitors in terms of visual quality and quantitative results. The Dress Code dataset is publicly available at https://github.com/aimagelab/dress-code.

2022 Relazione in Atti di Convegno

Dress Code: High-Resolution Multi-Category Virtual Try-On

Authors: Morelli, Davide; Fincato, Matteo; Cornia, Marcella; Landi, Federico; Cesari, Fabio; Cucchiara, Rita

Published in: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS

Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Existing … (Read full abstract)

Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Existing literature focuses mainly on upper-body clothes (e.g. t-shirts, shirts, and tops) and neglects full-body or lower-body items. This shortcoming arises from a main factor: current publicly available datasets for image-based virtual try-on do not account for this variety, thus limiting progress in the field. In this research activity, we introduce Dress Code, a novel dataset which contains images of multi-category clothes. Dress Code is more than 3x larger than publicly available datasets for image-based virtual try-on and features high-resolution paired images (1024 x 768) with front-view, full-body reference models. To generate HD try-on images with high visual quality and rich in details, we propose to learn fine-grained discriminating features. Specifically, we leverage a semantic-aware discriminator that makes predictions at pixel-level instead of image- or patch-level. The Dress Code dataset is publicly available at https://github.com/aimagelab/dress-code.

2022 Relazione in Atti di Convegno

Dual-Branch Collaborative Transformer for Virtual Try-On

Authors: Fenocchi, Emanuele; Morelli, Davide; Cornia, Marcella; Baraldi, Lorenzo; Cesari, Fabio; Cucchiara, Rita

Published in: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS

Image-based virtual try-on has recently gained a lot of attention in both the scientific and fashion industry communities due to … (Read full abstract)

Image-based virtual try-on has recently gained a lot of attention in both the scientific and fashion industry communities due to its challenging setting and practical real-world applications. While pure convolutional approaches have been explored to solve the task, Transformer-based architectures have not received significant attention yet. Following the intuition that self- and cross-attention operators can deal with long-range dependencies and hence improve the generation, in this paper we extend a Transformer-based virtual try-on model by adding a dual-branch collaborative module that can exploit cross-modal information at generation time. We perform experiments on the VITON dataset, which is the standard benchmark for the task, and on a recently collected virtual try-on dataset with multi-category clothing, Dress Code. Experimental results demonstrate the effectiveness of our solution over previous methods and show that Transformer-based architectures can be a viable alternative for virtual try-on.

2022 Relazione in Atti di Convegno

Effects of Auxiliary Knowledge on Continual Learning

Authors: Bellitto, Giovanni; Pennisi, Matteo; Palazzo, Simone; Bonicelli, Lorenzo; Boschini, Matteo; Calderara, Simone; Spampinato, Concetto

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

In Continual Learning (CL), a neural network is trained on a stream of data whose distribution changes over time. In … (Read full abstract)

In Continual Learning (CL), a neural network is trained on a stream of data whose distribution changes over time. In this context, the main problem is how to learn new information without forgetting old knowledge (i.e., Catastrophic Forgetting). Most existing CL approaches focus on finding solutions to preserve acquired knowledge, so working on the past of the model. However, we argue that as the model has to continually learn new tasks, it is also important to put focus on the present knowledge that could improve following tasks learning. In this paper we propose a new, simple, CL algorithm that focuses on solving the current task in a way that might facilitate the learning of the next ones. More specifically, our approach combines the main data stream with a secondary, diverse and uncorrelated stream, from which the network can draw auxiliary knowledge. This helps the model from different perspectives, since auxiliary data may contain useful features for the current and the next tasks and incoming task classes can be mapped onto auxiliary classes. Furthermore, the addition of data to the current task is implicitly making the classifier more robust as we are forcing the extraction of more discriminative features. Our method can outperform existing state-of-the-art models on the most common CL Image Classification benchmarks.

2022 Relazione in Atti di Convegno

Embodied Navigation at the Art Gallery

Authors: Bigazzi, Roberto; Landi, Federico; Cascianelli, Silvia; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Embodied agents, trained to explore and navigate indoor photorealistic environments, have achieved impressive results on standard datasets and benchmarks. So … (Read full abstract)

Embodied agents, trained to explore and navigate indoor photorealistic environments, have achieved impressive results on standard datasets and benchmarks. So far, experiments and evaluations have involved domestic and working scenes like offices, flats, and houses. In this paper, we build and release a new 3D space with unique characteristics: the one of a complete art museum. We name this environment ArtGallery3D (AG3D). Compared with existing 3D scenes, the collected space is ampler, richer in visual features, and provides very sparse occupancy information. This feature is challenging for occupancy-based agents which are usually trained in crowded domestic environments with plenty of occupancy information. Additionally, we annotate the coordinates of the main points of interest inside the museum, such as paintings, statues, and other items. Thanks to this manual process, we deliver a new benchmark for PointGoal navigation inside this new space. Trajectories in this dataset are far more complex and lengthy than existing ground-truth paths for navigation in Gibson and Matterport3D. We carry on extensive experimental evaluation using our new space for evaluation and prove that existing methods hardly adapt to this scenario. As such, we believe that the availability of this 3D model will foster future research and help improve existing solutions.

2022 Relazione in Atti di Convegno

Explaining Transformer-based Image Captioning Models: An Empirical Analysis

Authors: Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: AI COMMUNICATIONS

Image Captioning is the task of translating an input image into a textual description. As such, it connects Vision and … (Read full abstract)

Image Captioning is the task of translating an input image into a textual description. As such, it connects Vision and Language in a generative fashion, with applications that range from multi-modal search engines to help visually impaired people. Although recent years have witnessed an increase in accuracy in such models, this has also brought increasing complexity and challenges in interpretability and visualization. In this work, we focus on Transformer-based image captioning models and provide qualitative and quantitative tools to increase interpretability and assess the grounding and temporal alignment capabilities of such models. Firstly, we employ attribution methods to visualize what the model concentrates on in the input image, at each step of the generation. Further, we propose metrics to evaluate the temporal alignment between model predictions and attribution scores, which allows measuring the grounding capabilities of the model and spot hallucination flaws. Experiments are conducted on three different Transformer-based architectures, employing both traditional and Vision Transformer-based visual features.

2022 Articolo su rivista

Exploiting generative self-supervised learning for the assessment of biological images with lack of annotations

Authors: Mascolini, Alessio; Cardamone, Dario; Ponzio, Francesco; Di Cataldo, Santa; Ficarra, Elisa

Published in: BMC BIOINFORMATICS

Computer-aided analysis of biological images typically requires extensive training on large-scale annotated datasets, which is not viable in many situations. … (Read full abstract)

Computer-aided analysis of biological images typically requires extensive training on large-scale annotated datasets, which is not viable in many situations. In this paper, we present Generative Adversarial Network Discriminator Learner (GAN-DL), a novel self-supervised learning paradigm based on the StyleGAN2 architecture, which we employ for self-supervised image representation learning in the case of fluorescent biological images.

2022 Articolo su rivista

Fine-Grained Human Analysis Under Occlusions and Perspective Constraints in Multimedia Surveillance

Authors: Cucchiara, Rita; Fabbri, Matteo

Published in: ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS

2022 Articolo su rivista

Page 27 of 106 • Total publications: 1059