Publications - AImageLab

TrackFlow: Multi-Object Tracking with Normalizing Flows

Authors: Mancusi, Gianluca; Panariello, Aniello; Porrello, Angelo; Fabbri, Matteo; Calderara, Simone; Cucchiara, Rita

Published in: PROCEEDINGS IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION

The field of multi-object tracking has recently seen a renewed interest in the good old schema of tracking-by-detection, as its … (Read full abstract)

The field of multi-object tracking has recently seen a renewed interest in the good old schema of tracking-by-detection, as its simplicity and strong priors spare it from the complex design and painful babysitting of tracking-by-attention approaches. In view of this, we aim at extending tracking-by-detection to multi-modal settings, where a comprehensive cost has to be computed from heterogeneous information e.g., 2D motion cues, visual appearance, and pose estimates. More precisely, we follow a case study where a rough estimate of 3D information is also available and must be merged with other traditional metrics (e.g., the IoU). To achieve that, recent approaches resort to either simple rules or complex heuristics to balance the contribution of each cost. However, i) they require careful tuning of tailored hyperparameters on a hold-out set, and ii) they imply these costs to be independent, which does not hold in reality. We address these issues by building upon an elegant probabilistic formulation, which considers the cost of a candidate association as the negative log-likelihood yielded by a deep density estimator, trained to model the conditional joint probability distribution of correct associations. Our experiments, conducted on both simulated and real benchmarks, show that our approach consistently enhances the performance of several tracking-by-detection algorithms.

2023 Relazione in Atti di Convegno

DOI IRIS

Transformer-Based Approach to Melanoma Detection

Authors: Cirrincione, G.; Cannata, S.; Cicceri, G.; Prinzi, F.; Currieri, T.; Lovino, M.; Militello, C.; Pasero, E.; Vitabile, S.

Published in: SENSORS

Melanoma is a malignant cancer type which develops when DNA damage occurs (mainly due to environmental factors such as ultraviolet … (Read full abstract)

Melanoma is a malignant cancer type which develops when DNA damage occurs (mainly due to environmental factors such as ultraviolet rays). Often, melanoma results in intense and aggressive cell growth that, if not caught in time, can bring one toward death. Thus, early identification at the initial stage is fundamental to stopping the spread of cancer. In this paper, a ViT-based architecture able to classify melanoma versus non-cancerous lesions is presented. The proposed predictive model is trained and tested on public skin cancer data from the ISIC challenge, and the obtained results are highly promising. Different classifier configurations are considered and analyzed in order to find the most discriminating one. The best one reached an accuracy of 0.948, sensitivity of 0.928, specificity of 0.967, and AUROC of 0.948.

2023 Articolo su rivista

DOI IRIS

Unveiling the Impact of Image Transformations on Deepfake Detection: An Experimental Analysis

Authors: Cocchi, Federico; Baraldi, Lorenzo; Poppi, Samuele; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

With the recent explosion of interest in visual Generative AI, the field of deepfake detection has gained a lot of … (Read full abstract)

With the recent explosion of interest in visual Generative AI, the field of deepfake detection has gained a lot of attention. In fact, deepfake detection might be the only measure to counter the potential proliferation of generated media in support of fake news and its consequences. While many of the available works limit the detection to a pure and direct classification of fake versus real, this does not translate well to a real-world scenario. Indeed, malevolent users can easily apply post-processing techniques to generated content, changing the underlying distribution of fake data. In this work, we provide an in-depth analysis of the robustness of a deepfake detection pipeline, considering different image augmentations, transformations, and other pre-processing steps. These transformations are only applied in the evaluation phase, thus simulating a practical situation in which the detector is not trained on all the possible augmentations that can be used by the attacker. In particular, we analyze the performance of a k-NN and a linear probe detector on the COCOFake dataset, using image features extracted from pre-trained models, like CLIP and DINO. Our results demonstrate that while the CLIP visual backbone outperforms DINO in deepfake detection with no augmentation, its performance varies significantly in presence of any transformation, favoring the robustness of DINO.

2023 Relazione in Atti di Convegno

DOI IRIS

Using Gaze for Behavioural Biometrics

Authors: D’Amelio, Alessandro; Patania, Sabrina; Bursic, Sathya; Cuculo, Vittorio; Boccignone, Giuseppe

Published in: SENSORS

A principled approach to the analysis of eye movements for behavioural biometrics is laid down. The approach grounds in foraging … (Read full abstract)

A principled approach to the analysis of eye movements for behavioural biometrics is laid down. The approach grounds in foraging theory, which provides a sound basis to capture the unique- ness of individual eye movement behaviour. We propose a composite Ornstein-Uhlenbeck process for quantifying the exploration/exploitation signature characterising the foraging eye behaviour. The rel- evant parameters of the composite model, inferred from eye-tracking data via Bayesian analysis, are shown to yield a suitable feature set for biometric identification; the latter is eventually accomplished via a classical classification technique. A proof of concept of the method is provided by measuring its identification performance on a publicly available dataset. Data and code for reproducing the analyses are made available. Overall, we argue that the approach offers a fresh view on either the analyses of eye-tracking data and prospective applications in this field.

2023 Articolo su rivista

DOI IRIS

Vision-Based Eye Image Classification for Ophthalmic Measurement Systems

Authors: Gibertoni, Giovanni; Borghi, Guido; Rovati, Luigi

Published in: SENSORS

: The accuracy and the overall performances of ophthalmic instrumentation, where specific analysis of eye images is involved, can be … (Read full abstract)

: The accuracy and the overall performances of ophthalmic instrumentation, where specific analysis of eye images is involved, can be negatively influenced by invalid or incorrect frames acquired during everyday measurements of unaware or non-collaborative human patients and non-technical operators. Therefore, in this paper, we investigate and compare the adoption of several vision-based classification algorithms belonging to different fields, i.e., Machine Learning, Deep Learning, and Expert Systems, in order to improve the performance of an ophthalmic instrument designed for the Pupillary Light Reflex measurement. To test the implemented solutions, we collected and publicly released PopEYE as one of the first datasets consisting of 15 k eye images belonging to 22 different subjects acquired through the aforementioned specialized ophthalmic device. Finally, we discuss the experimental results in terms of classification accuracy of the eye status, as well as computational load analysis, since the proposed solution is designed to be implemented in embedded boards, which have limited hardware resources in computational power and memory size.

2023 Articolo su rivista

DOI IRIS

Volumetric Fast Fourier Convolution for Detecting Ink on the Carbonized Herculaneum Papyri

Authors: Quattrini, F.; Pippi, V.; Cascianelli, S.; Cucchiara, R.

Published in: ... IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS

Recent advancements in Digital Document Restoration (DDR) have led to significant breakthroughs in analyzing highly damaged written artifacts. Among those, … (Read full abstract)

Recent advancements in Digital Document Restoration (DDR) have led to significant breakthroughs in analyzing highly damaged written artifacts. Among those, there has been an increasing interest in applying Artificial Intelligence techniques for virtually unwrapping and automatically detecting ink on the Herculaneum papyri collection. This collection consists of carbonized scrolls and fragments of documents, which have been digitized via X-ray tomography to allow the development of ad-hoc deep learning-based DDR solutions. In this work, we propose a modification of the Fast Fourier Convolution operator for volumetric data and apply it in a segmentation architecture for ink detection on the challenging Herculaneum papyri, demonstrating its suitability via deep experimental analysis. To encourage the research on this task and the application of the proposed operator to other tasks involving volumetric data, we will release our implementation (https://github.com/aimagelab/vffc).

2023 Relazione in Atti di Convegno

DOI IRIS

W2WNet: A two-module probabilistic Convolutional Neural Network with embedded data cleansing functionality

Authors: Ponzio, F.; Macii, E.; Ficarra, E.; Di Cataldo, S.

Published in: EXPERT SYSTEMS WITH APPLICATIONS

Ideally, Convolutional Neural Networks (CNNs) should be trained with high quality images with minimum noise and correct ground truth labels. … (Read full abstract)

Ideally, Convolutional Neural Networks (CNNs) should be trained with high quality images with minimum noise and correct ground truth labels. Nonetheless, in many real-world scenarios, such high quality is very hard to obtain, and datasets may be affected by any sort of image degradation and mislabelling issues. This negatively impacts the performance of standard CNNs, both during the training and the inference phase. To address this issue we propose Wise2WipedNet (W2WNet), a new two-module Convolutional Neural Network, where a Wise module exploits Bayesian inference to identify and discard spurious images during the training and a Wiped module takes care of the final classification, while broadcasting information on the prediction confidence at inference time. The goodness of our solution is demonstrated on a number of public benchmarks addressing different image classification tasks, as well as on a real-world case study on histological image analysis. Overall, our experiments demonstrate that W2WNet is able to identify image degradation and mislabelling issues both at training and at inference time, with positive impact on the final classification accuracy.

2023 Articolo su rivista

DOI IRIS

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

Authors: Barraco, Manuele; Sarto, Sara; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: PROCEEDINGS IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION

Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an … (Read full abstract)

Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent descriptions. Although successful, the attention operator only considers a weighted summation of projections of the current input sample, therefore ignoring the relevant semantic information which can come from the joint observation of other samples. In this paper, we devise a network which can perform attention over activations obtained while processing other training samples, through a prototypical memory model. Our memory models the distribution of past keys and values through the definition of prototype vectors which are both discriminative and compact. Experimentally, we assess the performance of the proposed model on the COCO dataset, in comparison with carefully designed baselines and state-of-the-art approaches, and by investigating the role of each of the proposed components. We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training. Source code and trained models are available at: https://github.com/aimagelab/PMA-Net.

2023 Relazione in Atti di Convegno

DOI IRIS

3D-Aware Semantic-Guided Generative Model for Human Synthesis

Authors: Zhang, J.; Sangineto, E.; Tang, H.; Siarohin, A.; Zhong, Z.; Sebe, N.; Wang, W.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Generative Neural Radiance Field (GNeRF) models, which extract implicit 3D representations from 2D images, have recently been shown to produce … (Read full abstract)

Generative Neural Radiance Field (GNeRF) models, which extract implicit 3D representations from 2D images, have recently been shown to produce realistic images representing rigid/semi-rigid objects, such as human faces or cars. However, they usually struggle to generate high-quality images representing non-rigid objects, such as the human body, which is of a great interest for many computer graphics applications. This paper proposes a 3D-aware Semantic-Guided Generative Model (3D-SGAN) for human image synthesis, which combines a GNeRF with a texture generator. The former learns an implicit 3D representation of the human body and outputs a set of 2D semantic segmentation masks. The latter transforms these semantic masks into a real image, adding a realistic texture to the human appearance. Without requiring additional 3D information, our model can learn 3D human representations with a photo-realistic, controllable generation. Our experiments on the DeepFashion dataset show that 3D-SGAN significantly outperforms the most recent baselines. The code is available at https://github.com/zhangqianhui/3DSGAN.

2022 Relazione in Atti di Convegno

DOI IRIS

A Computational Approach for Progressive Architecture Shrinkage in Action Recognition

Authors: Tomei, Matteo; Baraldi, Lorenzo; Fiameni, Giuseppe; Bronzin, Simone; Cucchiara, Rita

Published in: SOFTWARE, PRACTICE AND EXPERIENCE

2022 Articolo su rivista

DOI IRIS