Publications by Simone Calderara

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Simone Calderara

Transfer without Forgetting

Authors: Boschini, Matteo; Bonicelli, Lorenzo; Porrello, Angelo; Bellitto, Giovanni; Pennisi, Matteo; Palazzo, Simone; Spampinato, Concetto; Calderara, Simone

Published in: LECTURE NOTES IN COMPUTER SCIENCE

This work investigates the entanglement between Continual Learning (CL) and Transfer Learning (TL). In particular, we shed light on the … (Read full abstract)

This work investigates the entanglement between Continual Learning (CL) and Transfer Learning (TL). In particular, we shed light on the widespread application of network pretraining, highlighting that it is itself subject to catastrophic forgetting. Unfortunately, this issue leads to the under-exploitation of knowledge transfer during later tasks. On this ground, we propose Transfer without Forgetting (TwF), a hybrid Continual Transfer Learning approach building upon a fixed pretrained sibling network, which continuously propagates the knowledge inherent in the source domain through a layer-wise loss term. Our experiments indicate that TwF steadily outperforms other CL methods across a variety of settings, averaging a 4.81% gain in Class-Incremental accuracy over a variety of datasets and different buffer sizes.

2022 Relazione in Atti di Convegno

Warp and Learn: Novel Views Generation for Vehicles and Other Objects

Authors: Palazzi, Andrea; Bergamini, Luca; Calderara, Simone; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

In this work we introduce a new self-supervised, semi-parametric approach for synthesizing novel views of a vehicle starting from a … (Read full abstract)

In this work we introduce a new self-supervised, semi-parametric approach for synthesizing novel views of a vehicle starting from a single monocular image.Differently from parametric (i.e. entirely learning-based) methods, we show how a-priori geometric knowledge about the object and the 3D world can be successfully integrated into a deep learning based image generation framework. As this geometric component is not learnt, we call our approach semi-parametric.In particular, we exploit man-made object symmetry and piece-wise planarity to integrate rich a-priori visual information into the novel viewpoint synthesis process. An Image Completion Network (ICN) is then trained to generate a realistic image starting from this geometric guidance.This blend between parametric and non-parametric components allows us to i) operate in a real-world scenario, ii) preserve high-frequency visual information such as textures, iii) handle truly arbitrary 3D roto-translations of the input and iv) perform shape transfer to completely different 3D models. Eventually, we show that our approach can be easily complemented with synthetic data and extended to other rigid objects with completely different topology, even in presence of concave structures and holes.A comprehensive experimental analysis against state-of-the-art competitors shows the efficacy of our method both from a quantitative and a perceptive point of view.

2022 Articolo su rivista

AC-VRNN: Attentive Conditional-VRNN for multi-future trajectory prediction

Authors: Bertugli, A.; Calderara, S.; Coscia, P.; Ballan, L.; Cucchiara, R.

Published in: COMPUTER VISION AND IMAGE UNDERSTANDING

Anticipating human motion in crowded scenarios is essential for developing intelligent transportation systems, social-aware robots and advanced video surveillance applications. … (Read full abstract)

Anticipating human motion in crowded scenarios is essential for developing intelligent transportation systems, social-aware robots and advanced video surveillance applications. A key component of this task is represented by the inherently multi-modal nature of human paths which makes socially acceptable multiple futures when human interactions are involved. To this end, we propose a generative architecture for multi-future trajectory predictions based on Conditional Variational Recurrent Neural Networks (C-VRNNs). Conditioning mainly relies on prior belief maps, representing most likely moving directions and forcing the model to consider past observed dynamics in generating future positions. Human interactions are modelled with a graph-based attention mechanism enabling an online attentive hidden state refinement of the recurrent estimation. To corroborate our model, we perform extensive experiments on publicly-available datasets (e.g., ETH/UCY, Stanford Drone Dataset, STATS SportVU NBA, Intersection Drone Dataset and TrajNet++) and demonstrate its effectiveness in crowded scenes compared to several state-of-the-art methods.

2021 Articolo su rivista

Avalanche: An end-to-end library for continual learning

Authors: Lomonaco, V.; Pellegrini, L.; Cossu, A.; Carta, A.; Graffieti, G.; Hayes, T. L.; De Lange, M.; Masana, M.; Pomponi, J.; Van De Ven, G. M.; Mundt, M.; She, Q.; Cooper, K.; Forest, J.; Belouadah, E.; Calderara, S.; Parisi, G. I.; Cuzzolin, F.; Tolias, A. S.; Scardapane, S.; Antiga, L.; Ahmad, S.; Popescu, A.; Kanan, C.; Van De Weijer, J.; Tuytelaars, T.; Bacciu, D.; Maltoni, D.

Published in: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS

Learning continually from non-stationary data streams is a long-standing goal and a challenging problem in machine learning. Recently, we have … (Read full abstract)

Learning continually from non-stationary data streams is a long-standing goal and a challenging problem in machine learning. Recently, we have witnessed a renewed and fast-growing interest in continual learning, especially within the deep learning community. However, algorithmic solutions are often difficult to re-implement, evaluate and port across different settings, where even results on standard benchmarks are hard to reproduce. In this work, we propose Avalanche, an open-source end-to-end library for continual learning research based on PyTorch. Avalanche is designed to provide a shared and collaborative codebase for fast prototyping, training, and reproducible evaluation of continual learning algorithms.

2021 Relazione in Atti di Convegno

DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Authors: Monti, Alessio; Bertugli, Alessia; Calderara, Simone; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Understanding human motion behaviour is a critical task for several possible applications like self-driving cars or social robots, and in … (Read full abstract)

Understanding human motion behaviour is a critical task for several possible applications like self-driving cars or social robots, and in general for all those settings where an autonomous agent has to navigate inside a human-centric environment. This is non-trivial because human motion is inherently multi-modal: given a history of human motion paths, there are many plausible ways by which people could move in the future. Additionally, people activities are often driven by goals, e.g. reaching particular locations or interacting with the environment. We address the aforementioned aspects by proposing a new recurrent generative model that considers both single agents' future goals and interactions between different agents. The model exploits a double attention-based graph neural network to collect information about the mutual influences among different agents and to integrate it with data about agents' possible future objectives. Our proposal is general enough to be applied to different scenarios: the model achieves state-of-the-art results in both urban environments and also in sports applications.

2021 Relazione in Atti di Convegno

Extracting accurate long-term behavior changes from a large pig dataset

Authors: Bergamini, L.; Pini, S.; Simoni, A.; Vezzani, R.; Calderara, S.; Eath, R. B. D.; Fisher, R. B.

Visual observation of uncontrolled real-world behavior leads to noisy observations, complicated by occlusions, ambiguity, variable motion rates, detection and tracking … (Read full abstract)

Visual observation of uncontrolled real-world behavior leads to noisy observations, complicated by occlusions, ambiguity, variable motion rates, detection and tracking errors, slow transitions between behaviors, etc. We show in this paper that reliable estimates of long-term trends can be extracted given enough data, even though estimates from individual frames may be noisy. We validate this concept using a new public dataset of approximately 20+ million daytime pig observations over 6 weeks of their main growth stage, and we provide annotations for various tasks including 5 individual behaviors. Our pipeline chains detection, tracking and behavior classification combining deep and shallow computer vision techniques. While individual detections may be noisy, we show that long-term behavior changes can still be extracted reliably, and we validate these results qualitatively on the full dataset. Eventually, starting from raw RGB video data we are able to both tell what pigs main daily activities are, and how these change through time.

2021 Relazione in Atti di Convegno

Future Urban Scenes Generation Through Vehicles Synthesis

Authors: Simoni, Alessandro; Bergamini, Luca; Palazzi, Andrea; Calderara, Simone; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

In this work we propose a deep learning pipeline to predict the visual future appearance of an urban scene. Despite … (Read full abstract)

In this work we propose a deep learning pipeline to predict the visual future appearance of an urban scene. Despite recent advances, generating the entire scene in an end-to-end fashion is still far from being achieved. Instead, here we follow a two stages approach, where interpretable information is included in the loop and each actor is modelled independently. We leverage a per-object novel view synthesis paradigm; i.e. generating a synthetic representation of an object undergoing a geometrical roto-translation in the 3D space. Our model can be easily conditioned with constraints (e.g. input trajectories) provided by state-of-the-art tracking methods or by the user itself. This allows us to generate a set of diverse realistic futures starting from the same input in a multi-modal fashion. We visually and quantitatively show the superiority of this approach over traditional end-to-end scene-generation methods on CityFlow, a challenging real world dataset.

2021 Relazione in Atti di Convegno

MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?

Authors: Fabbri, Matteo; Braso, Guillem; Maugeri, Gianluca; Cetintas, Orcun; Gasparini, Riccardo; Osep, Aljosa; Calderara, Simone; Leal-Taixe, Laura; Cucchiara, Rita

Published in: PROCEEDINGS IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION

2021 Relazione in Atti di Convegno

RMS-Net: Regression and Masking for Soccer Event Spotting

Authors: Tomei, Matteo; Baraldi, Lorenzo; Calderara, Simone; Bronzin, Simone; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

2021 Relazione in Atti di Convegno

The color out of space: learning self-supervised representations for Earth Observation imagery

Authors: Vincenzi, Stefano; Porrello, Angelo; Buzzega, Pietro; Cipriano, Marco; Fronte, Pietro; Cuccu, Roberto; Ippoliti, Carla; Conte, Annamaria; Calderara, Simone

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

The recent growth in the number of satellite images fosters the development of effective deep-learning techniques for Remote Sensing (RS). … (Read full abstract)

The recent growth in the number of satellite images fosters the development of effective deep-learning techniques for Remote Sensing (RS). However, their full potential is untapped due to the lack of large annotated datasets. Such a problem is usually countered by fine-tuning a feature extractor that is previously trained on the ImageNet dataset. Unfortunately, the domain of natural images differs from the RS one, which hinders the final performance. In this work, we propose to learn meaningful representations from satellite imagery, leveraging its high-dimensionality spectral bands to reconstruct the visible colors. We conduct experiments on land cover classification (BigEarthNet) and West Nile Virus detection, showing that colorization is a solid pretext task for training a feature extractor. Furthermore, we qualitatively observe that guesses based on natural images and colorization rely on different parts of the input. This paves the way to an ensemble model that eventually outperforms both the above-mentioned techniques.

2021 Relazione in Atti di Convegno

Page 6 of 16 • Total publications: 157