Publications - AImageLab

Conditional Channel Gated Networks for Task-Aware Continual Learning

Authors: Abati, Davide; Tomczak, Jakub; Blankevoort, Tijmen; Calderara, Simone; Cucchiara, Rita; Bejnordi, Babak Ehteshami

Published in: PROCEEDINGS - IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

2020 Relazione in Atti di Convegno

DOI IRIS

Dark Experience for General Continual Learning: a Strong, Simple Baseline

Authors: Buzzega, Pietro; Boschini, Matteo; Porrello, Angelo; Abati, Davide; Calderara, Simone

Published in: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS

Continual Learning has inspired a plethora of approaches and evaluation settings; however, the majority of them overlooks the properties of … (Read full abstract)

Continual Learning has inspired a plethora of approaches and evaluation settings; however, the majority of them overlooks the properties of a practical scenario, where the data stream cannot be shaped as a sequence of tasks and offline training is not viable. We work towards General Continual Learning (GCL), where task boundaries blur and the domain and class distributions shift either gradually or suddenly. We address it through mixing rehearsal with knowledge distillation and regularization; our simple baseline, Dark Experience Replay, matches the network's logits sampled throughout the optimization trajectory, thus promoting consistency with its past. By conducting an extensive analysis on both standard benchmarks and a novel GCL evaluation setting (MNIST-360), we show that such a seemingly simple baseline outperforms consolidated approaches and leverages limited resources. We further explore the generalization capabilities of our objective, showing its regularization being beneficial beyond mere performance.

2020 Relazione in Atti di Convegno

IRIS

Deep learning-based method for vision-guided robotic grasping of unknown objects

Authors: Bergamini, L.; Sposato, M.; Pellicciari, M.; Peruzzini, M.; Calderara, S.; Schmidt, J.

Published in: ADVANCED ENGINEERING INFORMATICS

Nowadays, robots are heavily used in factories for different tasks, most of them including grasping and manipulation of generic objects … (Read full abstract)

Nowadays, robots are heavily used in factories for different tasks, most of them including grasping and manipulation of generic objects in unstructured scenarios. In order to better mimic a human operator involved in a grasping action, where he/she needs to identify the object and detect an optimal grasp by means of visual information, a widely adopted sensing solution is Artificial Vision. Nonetheless, state-of-art applications need long training and fine-tuning for manually build the object's model that is used at run-time during the normal operations, which reduce the overall operational throughput of the robotic system. To overcome such limits, the paper presents a framework based on Deep Convolutional Neural Networks (DCNN) to predict both single and multiple grasp poses for multiple objects all at once, using a single RGB image as input. Thanks to a novel loss function, our framework is trained in an end-to-end fashion and matches state-of-art accuracy with a substantially smaller architecture, which gives unprecedented real-time performances during experimental tests, and makes the application reliable for working on real robots. The system has been implemented using the ROS framework and tested on a Baxter collaborative robot.

2020 Articolo su rivista

DOI IRIS

Face-from-Depth for Head Pose Estimation on Depth Images

Authors: Borghi, Guido; Fabbri, Matteo; Vezzani, Roberto; Calderara, Simone; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Depth cameras allow to set up reliable solutions for people monitoring and behavior understanding, especially when unstable or poor illumination … (Read full abstract)

Depth cameras allow to set up reliable solutions for people monitoring and behavior understanding, especially when unstable or poor illumination conditions make unusable common RGB sensors. Therefore, we propose a complete framework for the estimation of the head and shoulder pose based on depth images only. A head detection and localization module is also included, in order to develop a complete end-to-end system. The core element of the framework is a Convolutional Neural Network, called POSEidon+, that receives as input three types of images and provides the 3D angles of the pose as output. Moreover, a Face-from-Depth component based on a Deterministic Conditional GAN model is able to hallucinate a face from the corresponding depth image. We empirically demonstrate that this positively impacts the system performances. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Experimental results show that our method overcomes several recent state-of-art works based on both intensity and depth input data, running in real-time at more than 30 frames per second.

2020 Articolo su rivista

DOI IRIS

Predicting WNV circulation in Italy using earth observation data and extreme gradient boosting model

Authors: Candeloro, L.; Ippoliti, C.; Iapaolo, F.; Monaco, F.; Morelli, D.; Cuccu, R.; Fronte, P.; Calderara, S.; Vincenzi, S.; Porrello, A.; D'Alterio, N.; Calistri, P.; Conte, A.

Published in: REMOTE SENSING

West Nile Disease (WND) is one of the most spread zoonosis in Italy and Europe caused by a vector-borne virus. … (Read full abstract)

West Nile Disease (WND) is one of the most spread zoonosis in Italy and Europe caused by a vector-borne virus. Its transmission cycle is well understood, with birds acting as the primary hosts and mosquito vectors transmitting the virus to other birds, while humans and horses are occasional dead-end hosts. Identifying suitable environmental conditions across large areas containing multiple species of potential hosts and vectors can be difficult. The recent and massive availability of Earth Observation data and the continuous development of innovative Machine Learning methods can contribute to automatically identify patterns in big datasets and to make highly accurate identification of areas at risk. In this paper, we investigated the West Nile Virus (WNV) circulation in relation to Land Surface Temperature, Normalized Difference Vegetation Index and Surface Soil Moisture collected during the 160 days before the infection took place, with the aim of evaluating the predictive capacity of lagged remotely sensed variables in the identification of areas at risk for WNV circulation. WNV detection in mosquitoes, birds and horses in 2017, 2018 and 2019, has been collected from the National Information System for Animal Disease Notification. An Extreme Gradient Boosting model was trained with data from 2017 and 2018 and tested for the 2019 epidemic, predicting the spatio-temporal WNV circulation two weeks in advance with an overall accuracy of 0.84. This work lays the basis for a future early warning system that could alert public authorities when climatic and environmental conditions become favourable to the onset and spread of WNV.

2020 Articolo su rivista

DOI IRIS

Rethinking Experience Replay: a Bag of Tricks for Continual Learning

Authors: Buzzega, Pietro; Boschini, Matteo; Porrello, Angelo; Calderara, Simone

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

In Continual Learning, a Neural Network is trained on a stream of data whose distribution shifts over time. Under these … (Read full abstract)

In Continual Learning, a Neural Network is trained on a stream of data whose distribution shifts over time. Under these assumptions, it is especially challenging to improve on classes appearing later in the stream while remaining accurate on previous ones. This is due to the infamous problem of catastrophic forgetting, which causes a quick performance degradation when the classifier focuses on learning new categories. Recent literature proposed various approaches to tackle this issue, often resorting to very sophisticated techniques. In this work, we show that naïve rehearsal can be patched to achieve similar performance. We point out some shortcomings that restrain Experience Replay (ER) and propose five tricks to mitigate them. Experiments show that ER, thus enhanced, displays an accuracy gain of 51.2 and 26.9 percentage points on the CIFAR-10 and CIFAR-100 datasets respectively (memory buffer size 1000). As a result, it surpasses current state-of-the-art rehearsal-based methods.

2020 Relazione in Atti di Convegno

DOI IRIS

Robust Re-Identification by Multiple Views Knowledge Distillation

Authors: Porrello, Angelo; Bergamini, Luca; Calderara, Simone

Published in: LECTURE NOTES IN COMPUTER SCIENCE

To achieve robustness in Re-Identification, standard methods leverage tracking information in a Video-To-Video fashion. However, these solutions face a large … (Read full abstract)

To achieve robustness in Re-Identification, standard methods leverage tracking information in a Video-To-Video fashion. However, these solutions face a large drop in performance for single image queries (e.g., Image-To-Video setting). Recent works address this severe degradation by transferring temporal information from a Video-based network to an Image-based one. In this work, we devise a training strategy that allows the transfer of a superior knowledge, arising from a set of views depicting the target object. Our proposal - Views Knowledge Distillation (VKD) - pins this visual variety as a supervision signal within a teacher-student framework, where the teacher educates a student who observes fewer views. As a result, the student outperforms not only its teacher but also the current state-of-the-art in Image-To-Video by a wide margin (6.3% mAP on MARS, 8.6% on Duke-Video-ReId and 5% on VeRi-776). A thorough analysis - on Person, Vehicle and Animal Re-ID - investigates the properties of VKD from a qualitatively and quantitatively perspective.

2020 Relazione in Atti di Convegno

DOI IRIS

Scoring pleurisy in slaughtered pigs using convolutional neural networks

Authors: Trachtman, A. R.; Bergamini, L.; Palazzi, A.; Porrello, A.; Capobianco Dondona, A.; Del Negro, E.; Paolini, A.; Vignola, G.; Calderara, S.; Marruchella, G.

Published in: VETERINARY RESEARCH

Diseases of the respiratory system are known to negatively impact the profitability of the pig industry, worldwide. Considering the relatively … (Read full abstract)

Diseases of the respiratory system are known to negatively impact the profitability of the pig industry, worldwide. Considering the relatively short lifespan of pigs, lesions can be still evident at slaughter, where they can be usefully recorded and scored. Therefore, the slaughterhouse represents a key check-point to assess the health status of pigs, providing unique and valuable feedback to the farm, as well as an important source of data for epidemiological studies. Although relevant, scoring lesions in slaughtered pigs represents a very time-consuming and costly activity, thus making difficult their systematic recording. The present study has been carried out to train a convolutional neural network-based system to automatically score pleurisy in slaughtered pigs. The automation of such a process would be extremely helpful to enable a systematic examination of all slaughtered livestock. Overall, our data indicate that the proposed system is well able to differentiate half carcasses affected with pleurisy from healthy ones, with an overall accuracy of 85.5%. The system was better able to recognize severely affected half carcasses as compared with those showing less severe lesions. The training of convolutional neural networks to identify and score pneumonia, on the one hand, and the achievement of trials in large capacity slaughterhouses, on the other, represent the natural pursuance of the present study. As a result, convolutional neural network-based technologies could provide a fast and cheap tool to systematically record lesions in slaughtered pigs, thus supplying an enormous amount of useful data to all stakeholders in the pig industry.

2020 Articolo su rivista

DOI IRIS

A Deep-learning-based approach to VM behavior Identification in Cloud Systems

Authors: Stefanini, M.; Lancellotti, R.; Baraldi, L.; Calderara, S.

2019 Relazione in Atti di Convegno

DOI IRIS

Can adversarial networks hallucinate occluded people with a plausible aspect?

Authors: Fulgeri, F.; Fabbri, Matteo; Alletto, Stefano; Calderara, S.; Cucchiara, R.

Published in: COMPUTER VISION AND IMAGE UNDERSTANDING

When you see a person in a crowd, occluded by other persons, you miss visual information that can be used … (Read full abstract)

When you see a person in a crowd, occluded by other persons, you miss visual information that can be used to recognize, re-identify or simply classify him or her. You can imagine its appearance given your experience, nothing more. Similarly, AI solutions can try to hallucinate missing information with specific deep learning architectures, suitably trained with people with and without occlusions. The goal of this work is to generate a complete image of a person, given an occluded version in input, that should be a) without occlusion b) similar at pixel level to a completely visible people shape c) capable to conserve similar visual attributes (e.g. male/female) of the original one. For the purpose, we propose a new approach by integrating the state-of-the-art of neural network architectures, namely U-nets and GANs, as well as discriminative attribute classification nets, with an architecture specifically designed to de-occlude people shapes. The network is trained to optimize a Loss function which could take into account the aforementioned objectives. As well we propose two datasets for testing our solution: the first one, occluded RAP, created automatically by occluding real shapes of the RAP dataset created by Li et al. (2016) (which collects also attributes of the people aspect); the second is a large synthetic dataset, AiC, generated in computer graphics with data extracted from the GTA video game, that contains 3D data of occluded objects by construction. Results are impressive and outperform any other previous proposal. This result could be an initial step to many further researches to recognize people and their behavior in an open crowded world.

2019 Articolo su rivista

DOI IRIS

Publications by Simone Calderara

Conditional Channel Gated Networks for Task-Aware Continual Learning

Dark Experience for General Continual Learning: a Strong, Simple Baseline

Deep learning-based method for vision-guided robotic grasping of unknown objects

Face-from-Depth for Head Pose Estimation on Depth Images

Predicting WNV circulation in Italy using earth observation data and extreme gradient boosting model

Rethinking Experience Replay: a Bag of Tricks for Continual Learning

Robust Re-Identification by Multiple Views Knowledge Distillation

Scoring pleurisy in slaughtered pigs using convolutional neural networks

A Deep-learning-based approach to VM behavior Identification in Cloud Systems

Can adversarial networks hallucinate occluded people with a plausible aspect?