Publications - AImageLab

Future Urban Scenes Generation Through Vehicles Synthesis

Authors: Simoni, Alessandro; Bergamini, Luca; Palazzi, Andrea; Calderara, Simone; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

In this work we propose a deep learning pipeline to predict the visual future appearance of an urban scene. Despite … (Read full abstract)

In this work we propose a deep learning pipeline to predict the visual future appearance of an urban scene. Despite recent advances, generating the entire scene in an end-to-end fashion is still far from being achieved. Instead, here we follow a two stages approach, where interpretable information is included in the loop and each actor is modelled independently. We leverage a per-object novel view synthesis paradigm; i.e. generating a synthetic representation of an object undergoing a geometrical roto-translation in the 3D space. Our model can be easily conditioned with constraints (e.g. input trajectories) provided by state-of-the-art tracking methods or by the user itself. This allows us to generate a set of diverse realistic futures starting from the same input in a multi-modal fashion. We visually and quantitatively show the superiority of this approach over traditional end-to-end scene-generation methods on CityFlow, a challenging real world dataset.

2021 Relazione in Atti di Convegno

DOI IRIS

MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?

Authors: Fabbri, Matteo; Braso, Guillem; Maugeri, Gianluca; Cetintas, Orcun; Gasparini, Riccardo; Osep, Aljosa; Calderara, Simone; Leal-Taixe, Laura; Cucchiara, Rita

Published in: PROCEEDINGS IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION

2021 Relazione in Atti di Convegno

DOI IRIS

RMS-Net: Regression and Masking for Soccer Event Spotting

Authors: Tomei, Matteo; Baraldi, Lorenzo; Calderara, Simone; Bronzin, Simone; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

2021 Relazione in Atti di Convegno

DOI IRIS

The color out of space: learning self-supervised representations for Earth Observation imagery

Authors: Vincenzi, Stefano; Porrello, Angelo; Buzzega, Pietro; Cipriano, Marco; Fronte, Pietro; Cuccu, Roberto; Ippoliti, Carla; Conte, Annamaria; Calderara, Simone

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

The recent growth in the number of satellite images fosters the development of effective deep-learning techniques for Remote Sensing (RS). … (Read full abstract)

The recent growth in the number of satellite images fosters the development of effective deep-learning techniques for Remote Sensing (RS). However, their full potential is untapped due to the lack of large annotated datasets. Such a problem is usually countered by fine-tuning a feature extractor that is previously trained on the ImageNet dataset. Unfortunately, the domain of natural images differs from the RS one, which hinders the final performance. In this work, we propose to learn meaningful representations from satellite imagery, leveraging its high-dimensionality spectral bands to reconstruct the visible colors. We conduct experiments on land cover classification (BigEarthNet) and West Nile Virus detection, showing that colorization is a solid pretext task for training a feature extractor. Furthermore, we qualitatively observe that guesses based on natural images and colorization rely on different parts of the input. This paves the way to an ensemble model that eventually outperforms both the above-mentioned techniques.

2021 Relazione in Atti di Convegno

DOI IRIS

Training convolutional neural networks to score pneumonia in slaughtered pigs

Authors: Bonicelli, L.; Trachtman, A. R.; Rosamilia, A.; Liuzzo, G.; Hattab, J.; Alcaraz, E. M.; Del Negro, E.; Vincenzi, S.; Dondona, A. C.; Calderara, S.; Marruchella, G.

Published in: ANIMALS

The slaughterhouse can act as a valid checkpoint to estimate the prevalence and the economic impact of diseases in farm … (Read full abstract)

The slaughterhouse can act as a valid checkpoint to estimate the prevalence and the economic impact of diseases in farm animals. At present, scoring lesions is a challenging and time‐consuming activity, which is carried out by veterinarians serving the slaughter chain. Over recent years, artificial intelligence(AI) has gained traction in many fields of research, including livestock production. In particular, AI‐based methods appear able to solve highly repetitive tasks and to consistently analyze large amounts of data, such as those collected by veterinarians during postmortem inspection in high‐throughput slaughterhouses. The present study aims to develop an AI‐based method capable of recognizing and quantifying enzootic pneumonia‐like lesions on digital images captured from slaughtered pigs under routine abattoir conditions. Overall, the data indicate that the AI‐based method proposed herein could properly identify and score enzootic pneumonia‐like lesions without interfering with the slaughter chain routine. According to European legislation, the application of such a method avoids the handling of carcasses and organs, decreasing the risk of microbial contamination, and could provide further alternatives in the field of food hygiene.

2021 Articolo su rivista

DOI IRIS

Vehicle and method for inspecting a railway line

Authors: Avizzano, Carlo Alberto; Borghi, Guido; Calderara, Simone; Cucchiara, Rita; Fedeli, Eugenio; Ermini, Mirko; Gonnelli, Mirco; Labanca, Giacomo; Frisoli, Antonio; Gasparini, Riccardo; Solazzi, Massimiliano; Tiseni, Luca; Leonardis, Daniele; Satler, Massimo

2021 Brevetto

IRIS

Video action detection by learning graph-based spatio-temporal interactions

Authors: Tomei, Matteo; Baraldi, Lorenzo; Calderara, Simone; Bronzin, Simone; Cucchiara, Rita

Published in: COMPUTER VISION AND IMAGE UNDERSTANDING

Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has … (Read full abstract)

Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has been addressed by processing fine-grained features extracted from a video classification backbone. Recently, thanks to the robustness of object and people detectors, a deeper focus has been added on relationship modelling. Following this line, we propose a graph-based framework to learn high-level interactions between people and objects, in both space and time. In our formulation, spatio-temporal relationships are learned through self-attention on a multi-layer graph structure which can connect entities from consecutive clips, thus considering long-range spatial and temporal dependencies. The proposed module is backbone independent by design and does not require end-to-end training. Extensive experiments are conducted on the AVA dataset, where our model demonstrates state-of-the-art results and consistent improvements over baselines built with different backbones. Code is publicly available at https://github.com/aimagelab/STAGE_action_detection.

2021 Articolo su rivista

DOI IRIS

Anomaly Detection for Vision-based Railway Inspection

Authors: Gasparini, Riccardo; Pini, Stefano; Borghi, Guido; Scaglione, Giuseppe; Calderara, Simone; Fedeli, Eugenio; Cucchiara, Rita

Published in: COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE

2020 Relazione in Atti di Convegno

DOI IRIS

Anomaly Detection, Localization and Classification for Railway Inspection

Authors: Gasparini, Riccardo; D'Eusanio, Andrea; Borghi, Guido; Pini, Stefano; Scaglione, Giuseppe; Calderara, Simone; Fedeli, Eugenio; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

2020 Relazione in Atti di Convegno

DOI IRIS

Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

Authors: Fabbri, Matteo; Lanzi, Fabio; Calderara, Simone; Alletto, Stefano; Cucchiara, Rita

Published in: PROCEEDINGS IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

In this paper we present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images. We … (Read full abstract)

In this paper we present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images. We propose to use high resolution volumetric heatmaps to model joint locations, devising a simple and effective compression method to drastically reduce the size of this representation. At the core of the proposed method lies our Volumetric Heatmap Autoencoder, a fully-convolutional network tasked with the compression of ground-truth heatmaps into a dense intermediate representation. A second model, the Code Predictor, is then trained to predict these codes, which can be decompressed at test time to re-obtain the original representation. Our experimental evaluation shows that our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets and, thanks to our novel compression strategy, can process full-HD images at the constant runtime of 8 fps regardless of the number of subjects in the scene.

2020 Relazione in Atti di Convegno

DOI IRIS

Publications by Simone Calderara

Future Urban Scenes Generation Through Vehicles Synthesis

MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?

RMS-Net: Regression and Masking for Soccer Event Spotting

The color out of space: learning self-supervised representations for Earth Observation imagery

Training convolutional neural networks to score pneumonia in slaughtered pigs

Vehicle and method for inspecting a railway line

Video action detection by learning graph-based spatio-temporal interactions

Anomaly Detection for Vision-based Railway Inspection

Anomaly Detection, Localization and Classification for Railway Inspection

Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation