Publications - AImageLab

How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting

Authors: Monti, A.; Porrello, A.; Calderara, S.; Coscia, P.; Ballan, L.; Cucchiara, R.

Accurate prediction of future human positions is an essential task for modern video-surveillance systems. Current state-of-the-art models usually rely on … (Read full abstract)

Accurate prediction of future human positions is an essential task for modern video-surveillance systems. Current state-of-the-art models usually rely on a "history" of past tracked locations (e.g., 3 to 5 seconds) to predict a plausible sequence of future locations (e.g., up to the next 5 seconds). We feel that this common schema neglects critical traits of realistic applications: as the collection of input trajectories involves machine perception (i.e., detection and tracking), incorrect detection and fragmentation errors may accumulate in crowded scenes, leading to tracking drifts. On this account, the model would be fed with corrupted and noisy input data, thus fatally affecting its prediction performance.In this regard, we focus on delivering accurate predictions when only few input observations are used, thus potentially lowering the risks associated with automatic perception. To this end, we conceive a novel distillation strategy that allows a knowledge transfer from a teacher network to a student one, the latter fed with fewer observations (just two ones). We show that a properly defined teacher supervision allows a student network to perform comparably to state-of-the-art approaches that demand more observations. Besides, extensive experiments on common trajectory forecasting datasets highlight that our student network better generalizes to unseen scenarios.

2022 Relazione in Atti di Convegno

DOI IRIS

Learning the Quality of Machine Permutations in Job Shop Scheduling

Authors: Corsini, A.; Calderara, S.; Dell'Amico, M.

Published in: IEEE ACCESS

In recent years, the power demonstrated by Machine Learning (ML) has increasingly attracted the interest of the optimization community that … (Read full abstract)

In recent years, the power demonstrated by Machine Learning (ML) has increasingly attracted the interest of the optimization community that is starting to leverage ML for enhancing and automating the design of algorithms. One combinatorial optimization problem recently tackled with ML is the Job Shop scheduling Problem (JSP). Most of the works on the JSP using ML focus on Deep Reinforcement Learning (DRL), and only a few of them leverage supervised learning techniques. The recurrent reasons for avoiding supervised learning seem to be the difficulty in casting the right learning task, i.e., what is meaningful to predict, and how to obtain labels. Therefore, we first propose a novel supervised learning task that aims at predicting the quality of machine permutations. Then, we design an original methodology to estimate this quality, and we use these estimations to create an accurate sequential deep learning model (binary accuracy above 95%). Finally, we empirically demonstrate the value of predicting the quality of machine permutations by enhancing the performance of a simple Tabu Search algorithm inspired by the works in the literature.

2022 Articolo su rivista

DOI IRIS

On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning

Authors: Bonicelli, Lorenzo; Boschini, Matteo; Porrello, Angelo; Spampinato, Concetto; Calderara, Simone

Published in: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS

Rehearsal approaches enjoy immense popularity with Continual Learning (CL) practitioners. These methods collect samples from previously encountered data distributions in … (Read full abstract)

Rehearsal approaches enjoy immense popularity with Continual Learning (CL) practitioners. These methods collect samples from previously encountered data distributions in a small memory buffer; subsequently, they repeatedly optimize on the latter to prevent catastrophic forgetting. This work draws attention to a hidden pitfall of this widespread practice: repeated optimization on a small pool of data inevitably leads to tight and unstable decision boundaries, which are a major hindrance to generalization. To address this issue, we propose Lipschitz-DrivEn Rehearsal (LiDER), a surrogate objective that induces smoothness in the backbone network by constraining its layer-wise Lipschitz constants w.r.t. replay examples. By means of extensive experiments, we show that applying LiDER delivers a stable performance gain to several state-of-the-art rehearsal CL methods across multiple datasets, both in the presence and absence of pre-training. Through additional ablative experiments, we highlight peculiar aspects of buffer overfitting in CL and better characterize the effect produced by LiDER. Code is available at https://github.com/aimagelab/LiDER

2022 Relazione in Atti di Convegno

IRIS

SeeFar: Vehicle Speed Estimation and Flow Analysis from a Moving UAV

Authors: Ning, M.; Ma, X.; Lu, Y.; Calderara, S.; Cucchiara, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Visual perception from drones has been largely investigated for Intelligent Traffic Monitoring System (ITMS) recently. In this paper, we introduce … (Read full abstract)

Visual perception from drones has been largely investigated for Intelligent Traffic Monitoring System (ITMS) recently. In this paper, we introduce SeeFar to achieve vehicle speed estimation and traffic flow analysis based on YOLOv5 and DeepSORT from a moving drone. SeeFar differs from previous works in three key ways: the speed estimation and flow analysis components are integrated into a unified framework; our method of predicting car speed has the least constraints while maintaining a high accuracy; our flow analysor is direction-aware and outlier-aware. Specifically, we design the speed estimator only using the camera imaging geometry, where the transformation between world space and image space is completed by the variable Ground Sampling Distance. Besides, previous papers do not evaluate their speed estimators at scale due to the difficulty of obtaining the ground truth, we therefore propose a simple yet efficient approach to estimate the true speeds of vehicles via the prior size of the road signs. We evaluate SeeFar on our ten videos that contain 929 vehicle samples. Experiments on these sequences demonstrate the effectiveness of SeeFar by achieving 98.0% accuracy of speed estimation and 99.1% accuracy of traffic volume prediction, respectively.

2022 Relazione in Atti di Convegno

DOI IRIS

Transfer without Forgetting

Authors: Boschini, Matteo; Bonicelli, Lorenzo; Porrello, Angelo; Bellitto, Giovanni; Pennisi, Matteo; Palazzo, Simone; Spampinato, Concetto; Calderara, Simone

Published in: LECTURE NOTES IN COMPUTER SCIENCE

This work investigates the entanglement between Continual Learning (CL) and Transfer Learning (TL). In particular, we shed light on the … (Read full abstract)

This work investigates the entanglement between Continual Learning (CL) and Transfer Learning (TL). In particular, we shed light on the widespread application of network pretraining, highlighting that it is itself subject to catastrophic forgetting. Unfortunately, this issue leads to the under-exploitation of knowledge transfer during later tasks. On this ground, we propose Transfer without Forgetting (TwF), a hybrid Continual Transfer Learning approach building upon a fixed pretrained sibling network, which continuously propagates the knowledge inherent in the source domain through a layer-wise loss term. Our experiments indicate that TwF steadily outperforms other CL methods across a variety of settings, averaging a 4.81% gain in Class-Incremental accuracy over a variety of datasets and different buffer sizes.

2022 Relazione in Atti di Convegno

DOI IRIS

Warp and Learn: Novel Views Generation for Vehicles and Other Objects

Authors: Palazzi, Andrea; Bergamini, Luca; Calderara, Simone; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

In this work we introduce a new self-supervised, semi-parametric approach for synthesizing novel views of a vehicle starting from a … (Read full abstract)

In this work we introduce a new self-supervised, semi-parametric approach for synthesizing novel views of a vehicle starting from a single monocular image.Differently from parametric (i.e. entirely learning-based) methods, we show how a-priori geometric knowledge about the object and the 3D world can be successfully integrated into a deep learning based image generation framework. As this geometric component is not learnt, we call our approach semi-parametric.In particular, we exploit man-made object symmetry and piece-wise planarity to integrate rich a-priori visual information into the novel viewpoint synthesis process. An Image Completion Network (ICN) is then trained to generate a realistic image starting from this geometric guidance.This blend between parametric and non-parametric components allows us to i) operate in a real-world scenario, ii) preserve high-frequency visual information such as textures, iii) handle truly arbitrary 3D roto-translations of the input and iv) perform shape transfer to completely different 3D models. Eventually, we show that our approach can be easily complemented with synthetic data and extended to other rigid objects with completely different topology, even in presence of concave structures and holes.A comprehensive experimental analysis against state-of-the-art competitors shows the efficacy of our method both from a quantitative and a perceptive point of view.

2022 Articolo su rivista

DOI IRIS

AC-VRNN: Attentive Conditional-VRNN for multi-future trajectory prediction

Authors: Bertugli, A.; Calderara, S.; Coscia, P.; Ballan, L.; Cucchiara, R.

Published in: COMPUTER VISION AND IMAGE UNDERSTANDING

Anticipating human motion in crowded scenarios is essential for developing intelligent transportation systems, social-aware robots and advanced video surveillance applications. … (Read full abstract)

Anticipating human motion in crowded scenarios is essential for developing intelligent transportation systems, social-aware robots and advanced video surveillance applications. A key component of this task is represented by the inherently multi-modal nature of human paths which makes socially acceptable multiple futures when human interactions are involved. To this end, we propose a generative architecture for multi-future trajectory predictions based on Conditional Variational Recurrent Neural Networks (C-VRNNs). Conditioning mainly relies on prior belief maps, representing most likely moving directions and forcing the model to consider past observed dynamics in generating future positions. Human interactions are modelled with a graph-based attention mechanism enabling an online attentive hidden state refinement of the recurrent estimation. To corroborate our model, we perform extensive experiments on publicly-available datasets (e.g., ETH/UCY, Stanford Drone Dataset, STATS SportVU NBA, Intersection Drone Dataset and TrajNet++) and demonstrate its effectiveness in crowded scenes compared to several state-of-the-art methods.

2021 Articolo su rivista

DOI IRIS

Avalanche: An end-to-end library for continual learning

Authors: Lomonaco, V.; Pellegrini, L.; Cossu, A.; Carta, A.; Graffieti, G.; Hayes, T. L.; De Lange, M.; Masana, M.; Pomponi, J.; Van De Ven, G. M.; Mundt, M.; She, Q.; Cooper, K.; Forest, J.; Belouadah, E.; Calderara, S.; Parisi, G. I.; Cuzzolin, F.; Tolias, A. S.; Scardapane, S.; Antiga, L.; Ahmad, S.; Popescu, A.; Kanan, C.; Van De Weijer, J.; Tuytelaars, T.; Bacciu, D.; Maltoni, D.

Published in: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS

Learning continually from non-stationary data streams is a long-standing goal and a challenging problem in machine learning. Recently, we have … (Read full abstract)

Learning continually from non-stationary data streams is a long-standing goal and a challenging problem in machine learning. Recently, we have witnessed a renewed and fast-growing interest in continual learning, especially within the deep learning community. However, algorithmic solutions are often difficult to re-implement, evaluate and port across different settings, where even results on standard benchmarks are hard to reproduce. In this work, we propose Avalanche, an open-source end-to-end library for continual learning research based on PyTorch. Avalanche is designed to provide a shared and collaborative codebase for fast prototyping, training, and reproducible evaluation of continual learning algorithms.

2021 Relazione in Atti di Convegno

DOI IRIS

DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Authors: Monti, Alessio; Bertugli, Alessia; Calderara, Simone; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Understanding human motion behaviour is a critical task for several possible applications like self-driving cars or social robots, and in … (Read full abstract)

Understanding human motion behaviour is a critical task for several possible applications like self-driving cars or social robots, and in general for all those settings where an autonomous agent has to navigate inside a human-centric environment. This is non-trivial because human motion is inherently multi-modal: given a history of human motion paths, there are many plausible ways by which people could move in the future. Additionally, people activities are often driven by goals, e.g. reaching particular locations or interacting with the environment. We address the aforementioned aspects by proposing a new recurrent generative model that considers both single agents' future goals and interactions between different agents. The model exploits a double attention-based graph neural network to collect information about the mutual influences among different agents and to integrate it with data about agents' possible future objectives. Our proposal is general enough to be applied to different scenarios: the model achieves state-of-the-art results in both urban environments and also in sports applications.

2021 Relazione in Atti di Convegno

DOI IRIS

Extracting accurate long-term behavior changes from a large pig dataset

Authors: Bergamini, L.; Pini, S.; Simoni, A.; Vezzani, R.; Calderara, S.; Eath, R. B. D.; Fisher, R. B.

Visual observation of uncontrolled real-world behavior leads to noisy observations, complicated by occlusions, ambiguity, variable motion rates, detection and tracking … (Read full abstract)

Visual observation of uncontrolled real-world behavior leads to noisy observations, complicated by occlusions, ambiguity, variable motion rates, detection and tracking errors, slow transitions between behaviors, etc. We show in this paper that reliable estimates of long-term trends can be extracted given enough data, even though estimates from individual frames may be noisy. We validate this concept using a new public dataset of approximately 20+ million daytime pig observations over 6 weeks of their main growth stage, and we provide annotations for various tasks including 5 individual behaviors. Our pipeline chains detection, tracking and behavior classification combining deep and shallow computer vision techniques. While individual detections may be noisy, we show that long-term behavior changes can still be extracted reliably, and we validate these results qualitatively on the full dataset. Eventually, starting from raw RGB video data we are able to both tell what pigs main daily activities are, and how these change through time.

2021 Relazione in Atti di Convegno

DOI IRIS

Publications by Simone Calderara

How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting

Learning the Quality of Machine Permutations in Job Shop Scheduling

On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning

SeeFar: Vehicle Speed Estimation and&nbsp;Flow Analysis from&nbsp;a&nbsp;Moving UAV

Transfer without Forgetting

Warp and Learn: Novel Views Generation for Vehicles and Other Objects

AC-VRNN: Attentive Conditional-VRNN for multi-future trajectory prediction

Avalanche: An end-to-end library for continual learning

DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Extracting accurate long-term behavior changes from a large pig dataset

SeeFar: Vehicle Speed Estimation and Flow Analysis from a Moving UAV