Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Intelligent Multimodal Artificial Agents that Talk and Express Emotions

Authors: Rawal, Niyati; Maharjan, Rahul Singh; Romeo, Marta; Bigazzi, Roberto; Baraldi, Lorenzo; Cucchiara, Rita; Cangelosi, Angelo

2024 Relazione in Atti di Convegno

Is Multiple Object Tracking a Matter of Specialization?

Authors: Mancusi, Gianluca; Bernardi, Mattia; Panariello, Aniello; Porrello, Angelo; Cucchiara, Rita; Calderara, Simone

Published in: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS

End-to-end transformer-based trackers have achieved remarkable performance on most human-related datasets. However, training these trackers in heterogeneous scenarios poses significant … (Read full abstract)

End-to-end transformer-based trackers have achieved remarkable performance on most human-related datasets. However, training these trackers in heterogeneous scenarios poses significant challenges, including negative interference - where the model learns conflicting scene-specific parameters - and limited domain generalization, which often necessitates expensive fine-tuning to adapt the models to new domains. In response to these challenges, we introduce Parameter-efficient Scenario-specific Tracking Architecture (PASTA), a novel framework that combines Parameter-Efficient Fine-Tuning (PEFT) and Modular Deep Learning (MDL). Specifically, we define key scenario attributes (e.g, camera-viewpoint, lighting condition) and train specialized PEFT modules for each attribute. These expert modules are combined in parameter space, enabling systematic generalization to new domains without increasing inference time. Extensive experiments on MOTSynth, along with zero-shot evaluations on MOT17 and PersonPath22 demonstrate that a neural tracker built from carefully selected modules surpasses its monolithic counterpart. We release models and code.

2024 Relazione in Atti di Convegno

KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction

Authors: Di Nucci, Davide; Simoni, Alessandro; Tomei, Matteo; Ciuffreda, Luca; Vezzani, Roberto; Cucchiara, Rita

2024 Relazione in Atti di Convegno

Large-Scale Transformer models for Transactional Data

Authors: Garuti, F.; Luetto, S.; Sangineto, E.; Cucchiara, R.

Published in: CEUR WORKSHOP PROCEEDINGS

Following the spread of digital channels for everyday activities and electronic payments, huge collections of online transactions are available from … (Read full abstract)

Following the spread of digital channels for everyday activities and electronic payments, huge collections of online transactions are available from financial institutions. These transactions are usually organized as time series, i.e., a time-dependent sequence of tabular data, where each element of the series is a collection of heterogeneous fields (e.g., dates, amounts, categories, etc.). Transactions are usually evaluated by automated or semi-automated procedures to address financial tasks and gain insights into customers’ behavior. In the last years, many Trees-based Machine Learning methods (e.g., RandomForest, XGBoost) have been proposed for financial tasks, but they do not fully exploit in an end-to-end pipeline all the information richness of individual transactions, neither they fully model the underling temporal patterns. Instead, Deep Learning approaches have proven to be very effective in modeling complex data by representing them in a semantic latent space. In this paper, inspired by the multi-modal Deep Learning approaches used in Computer Vision and NLP, we propose UniTTab, an end-to-end Deep Learning Transformer model for transactional time series which can uniformly represent heterogeneous time-dependent data in a single embedding. Given the availability of large sets of tabular transactions, UniTTab defines a pre-training self-supervised phase to learn useful representations which can be employed to solve financial tasks such as churn prediction and loan default prediction. A strength of UniTTab is its flexibility since it can be adopted to represent time series of arbitrary length and composed of different data types in the fields. The flexibility of our model in solving different types of tasks (e.g., detection, classification, regression) and the possibility of varying the length of the input time series, from a few to hundreds of transactions, makes UniTTab a general-purpose Transformer architecture for bank transactions.

2024 Relazione in Atti di Convegno

Latent spectral regularization for continual learning

Authors: Frascaroli, Emanuele; Benaglia, Riccardo; Boschini, Matteo; Moschella, Luca; Fiorini, Cosimo; Rodolà, Emanuele; Calderara, Simone

Published in: PATTERN RECOGNITION LETTERS

While biological intelligence grows organically as new knowledge is gathered throughout life, Artificial Neural Networks forget catastrophically whenever they face … (Read full abstract)

While biological intelligence grows organically as new knowledge is gathered throughout life, Artificial Neural Networks forget catastrophically whenever they face a changing training data distribution. Rehearsal-based Continual Learning (CL) approaches have been established as a versatile and reliable solution to overcome this limitation; however, sudden input disruptions and memory constraints are known to alter the consistency of their predictions. We study this phenomenon by investigating the geometric characteristics of the learner’s latent space and find that replayed data points of different classes increasingly mix up, interfering with classification. Hence, we propose a geometric regularizer that enforces weak requirements on the Laplacian spectrum of the latent space, promoting a partitioning behavior. Our proposal, called Continual Spectral Regularizer for Incremental Learning (CaSpeR-IL), can be easily combined with any rehearsal-based CL approach and improves the performance of SOTA methods on standard benchmarks.

2024 Articolo su rivista

Mapping High-level Semantic Regions in Indoor Environments without Object Recognition

Authors: Bigazzi, Roberto; Baraldi, Lorenzo; Kousik, Shreyas; Cucchiara, Rita; Pavone, Marco

Published in: IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION

2024 Relazione in Atti di Convegno

May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels

Authors: Millunzi, Monica; Bonicelli, Lorenzo; Porrello, Angelo; Credi, Jacopo; Kolm, Petter N.; Calderara, Simone

Forgetting presents a significant challenge during incremental training, making it particularly demanding for contemporary AI systems to assimilate new knowledge … (Read full abstract)

Forgetting presents a significant challenge during incremental training, making it particularly demanding for contemporary AI systems to assimilate new knowledge in streaming data environments. To address this issue, most approaches in Continual Learning (CL) rely on the replay of a restricted buffer of past data. However, the presence of noise in real-world scenarios, where human annotation is constrained by time limitations or where data is automatically gathered from the web, frequently renders these strategies vulnerable. In this study, we address the problem of CL under Noisy Labels (CLN) by introducing Alternate Experience Replay (AER), which takes advantage of forgetting to maintain a clear distinction between clean, complex, and noisy samples in the memory buffer. The idea is that complex or mislabeled examples, which hardly fit the previously learned data distribution, are most likely to be forgotten. To grasp the benefits of such a separation, we equip AER with Asymmetric Balanced Sampling (ABS): a new sample selection strategy that prioritizes purity on the current task while retaining relevant samples from the past. Through extensive computational comparisons, we demonstrate the effectiveness of our approach in terms of both accuracy and purity of the obtained buffer, resulting in a remarkable average gain of 4.71% points in accuracy with respect to existing loss-based purification strategies. Code is available at https://github.com/aimagelab/mammoth

2024 Relazione in Atti di Convegno

MONOT: High-Quality Privacy-compliant Morphed Synthetic Images for Everyone

Authors: Borghi, Guido; Domenico, Nicolò Di; Ferrara, Matteo; Franco, Annalisa; Latif, Uzma; Maltoni, Davide

2024 Relazione in Atti di Convegno

Multi-Class Unlearning for Image Classification via Weight Filtering

Authors: Poppi, Samuele; Sarto, Sara; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: IEEE INTELLIGENT SYSTEMS

Machine Unlearning is an emerging paradigm for selectively removing the impact of training datapoints from a network. Unlike existing methods … (Read full abstract)

Machine Unlearning is an emerging paradigm for selectively removing the impact of training datapoints from a network. Unlike existing methods that target a limited subset or a single class, our framework unlearns all classes in a single round. We achieve this by modulating the network's components using memory matrices, enabling the network to demonstrate selective unlearning behavior for any class after training. By discovering weights that are specific to each class, our approach also recovers a representation of the classes which is explainable by design. We test the proposed framework on small- and medium-scale image classification datasets, with both convolution- and Transformer-based backbones, showcasing the potential for explainable solutions through unlearning.

2024 Articolo su rivista

ONOT: a High-Quality ICAO-compliant Synthetic Mugshot Dataset

Authors: Di Domenico, N.; Borghi, G.; Franco, A.; Maltoni, D.

Nowadays, state-of-the-art AI-based generative models represent a viable solution to overcome privacy issues and biases in the collection of datasets … (Read full abstract)

Nowadays, state-of-the-art AI-based generative models represent a viable solution to overcome privacy issues and biases in the collection of datasets containing personal information, such as faces. Following this intuition, in this paper we introduce ONOT11One, No one and One hundred Thousand (L. Pirandello, 1926), a synthetic dataset specifically focused on the generation of high-quality faces in adherence to the requirements of the ISO/IEC 39794-5 standards that, following the guidelines of the International Civil Aviation Organization (ICAO), defines the interchange formats of face images in electronic Machine-Readable Travel Documents (eMRTD). The strictly controlled and varied mugshot images included in ONOT are useful in research fields related to the analysis of face images in eMRTD, such as Morphing Attack Detection and Face Quality Assessment. The dataset is publicly released2https://miatbiolab.csr.unibo.it/icao-synthetic-dataset, in combination with the generation procedure details in order to improve the reproducibility and enable future extensions.

2024 Relazione in Atti di Convegno

Page 16 of 106 • Total publications: 1059