Publications by Simone Calderara

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Simone Calderara

Comportamento non verbale intergruppi “oggettivo”: una replica dello studio di Dovidio, kawakami e Gaertner (2002)

Authors: Di Bernardo, Gian Antonio; Vezzali, Loris; Giovannini, Dino; Palazzi, Andrea; Calderara, Simone; Bicocchi, Nicola; Zambonelli, Franco; Cucchiara, Rita; Cadamuro, Alessia; Cocco, Veronica Margherita

Vi è una lunga tradizione di ricerca che ha analizzato il comportamento non verbale, anche considerando relazioni intergruppi. Solitamente, questi … (Read full abstract)

Vi è una lunga tradizione di ricerca che ha analizzato il comportamento non verbale, anche considerando relazioni intergruppi. Solitamente, questi studi si avvalgono di valutazioni di coder esterni, che tuttavia sono soggettive e aperte a distorsioni. Abbiamo condotto uno studio in cui si è preso come riferimento il celebre studio di Dovidio, Kawakami e Gaertner (2002), apportando tuttavia alcune modifiche e considerando la relazione tra bianchi e neri. Partecipanti bianchi, dopo aver completato misure di pregiudizio esplicito e implicito, incontravano (in ordine contro-bilanciato) un collaboratore bianco e uno nero. Con ognuno di essi, parlavano per tre minuti di un argomento neutro e di un argomento saliente per la distinzione di gruppo (in ordine contro-bilanciato). Tali interazioni erano registrate con una telecamera kinect, che è in grado di tenere conto della componente tridimensionale del movimento. I risultati hanno rivelato vari elementi di interesse. Anzitutto, si sono creati indici oggettivi, a partire da un’analisi della letteratura, alcuni dei quali non possono essere rilevati da coder esterni, quali distanza interpersonale e volume di spazio tra le persone. I risultati hanno messo in luce alcuni aspetti rilevanti: (1) l’atteggiamento implicito è associato a vari indici di comportamento non verbale, i quali mediano sulle valutazioni dei partecipanti fornite dai collaboratori; (2) le interazioni vanno considerate in maniera dinamica, tenendo conto che si sviluppano nel tempo; (3) ciò che può essere importante è il comportamento non verbale globale, piuttosto che alcuni indici specifici pre-determinati dagli sperimentatori.

2018 Abstract in Atti di Convegno

Domain Translation with Conditional GANs: from Depth to RGB Face-to-Face

Authors: Fabbri, Matteo; Borghi, Guido; Lanzi, Fabio; Vezzani, Roberto; Calderara, Simone; Cucchiara, Rita

Can faces acquired by low-cost depth sensors be useful to see some characteristic details of the faces? Typically the answer … (Read full abstract)

Can faces acquired by low-cost depth sensors be useful to see some characteristic details of the faces? Typically the answer is not. However, new deep architectures can generate RGB images from data acquired in a different modality, such as depth data. In this paper we propose a new Deterministic Conditional GAN, trained on annotated RGB-D face datasets, effective for a face-to-face translation from depth to RGB. Although the network cannot reconstruct the exact somatic features for unknown individual faces, it is capable to reconstruct plausible faces; their appearance is accurate enough to be used in many pattern recognition tasks. In fact, we test the network capability to hallucinate with some Perceptual Probes, as for instance face aspect classification or landmark detection. Depth face can be used in spite of the correspondent RGB images, that often are not available for darkness of difficult luminance conditions. Experimental results are very promising and are as far as better than previous proposed approaches: this domain translation can constitute a new way to exploit depth data in new future applications.

2018 Relazione in Atti di Convegno

Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World

Authors: Fabbri, Matteo; Lanzi, Fabio; Calderara, Simone; Palazzi, Andrea; Vezzani, Roberto; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Multi-People Tracking in an open-world setting requires a special effort in precise detection. Moreover, temporal continuity in the detection phase … (Read full abstract)

Multi-People Tracking in an open-world setting requires a special effort in precise detection. Moreover, temporal continuity in the detection phase gains more importance when scene cluttering introduces the challenging problems of occluded targets. For the purpose, we propose a deep network architecture that jointly extracts people body parts and associates them across short temporal spans. Our model explicitly deals with occluded body parts, by hallucinating plausible solutions of not visible joints. We propose a new end-to-end architecture composed by four branches (visible heatmaps, occluded heatmaps, part affinity fields and temporal affinity fields) fed by a time linker feature extractor. To overcome the lack of surveillance data with tracking, body part and occlusion annotations we created the vastest Computer Graphics dataset for people tracking in urban scenarios by exploiting a photorealistic videogame. It is up to now the vastest dataset (about 500.000 frames, almost 10 million body poses) of human body parts for people tracking in urban scenarios. Our architecture trained on virtual data exhibits good generalization capabilities also on public real tracking benchmarks, when image resolution and sharpness are high enough, producing reliable tracklets useful for further batch data association or re-id modules.

2018 Relazione in Atti di Convegno

Metodo e sistema per il riconoscimento biometrico univoco di un animale, basati sull'utilizzo di tecniche di deep learning

Authors: Calderara, Simone; Bergamini, Luca; Capobianco Dondona, Andrea; Del Negro, Ercole; Di Tondo, Francesco

La presente invenzione descrive un metodo e sistema per il riconoscimento biometrico univoco di un animale, basato sull’utilizzo di tecniche … (Read full abstract)

La presente invenzione descrive un metodo e sistema per il riconoscimento biometrico univoco di un animale, basato sull’utilizzo di tecniche di deep learning. Il metodo è caratterizzato dalle seguenti fasi: a. fase di allenamento su di un dominio umano ed un dominio animale per l’ottenimento di embedding animali in uno spazio latente omologo a quello umano per mezzo di reti neurali convolutive; b. memorizzazione degli embedding animali ottenuti in una banca dati; c. riconoscimento di una identità animale per mezzo di reti neurali convolutive. La presente invenzione comprende anche un sistema per il riconoscimento biometrico univoco di un animale che utilizza il metodo precedentemente descritto.

2018 Brevetto

Multi-views Embedding for Cattle Re-identification

Authors: Bergamini, Luca; Porrello, Angelo; Andrea Capobianco Dondona, ; Ercole Del Negro, ; Mattioli, Mauro; D’Alterio, Nicola; Calderara, Simone

People re-identification task has seen enormous improvements in the latest years, mainly due to the development of better image features … (Read full abstract)

People re-identification task has seen enormous improvements in the latest years, mainly due to the development of better image features extraction from deep Convolutional Neural Networks (CNN) and the availability of large datasets. However, little research has been conducted on animal identification and re-identification, even if this knowledge may be useful in a rich variety of different scenarios. Here, we tackle cattle re-identification exploiting deep CNN and show how this task is poorly related to the human one, presenting unique challenges that make it far from being solved. We present various baselines, both based on deep architectures or on standard machine learning algorithms, and compared them with our solution. Finally, a rich ablation study has been conducted to further investigate the unique peculiarities of this task.

2018 Relazione in Atti di Convegno

Unsupervised vehicle re-identification using triplet networks

Authors: Marin-Reyes, P. A.; Bergamini, L.; Lorenzo-Navarro, J.; Palazzi, A.; Calderara, S.; Cucchiara, R.

Published in: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS

Vehicle re-identification plays a major role in modern smart surveillance systems. Specifically, the task requires the capability to predict the … (Read full abstract)

Vehicle re-identification plays a major role in modern smart surveillance systems. Specifically, the task requires the capability to predict the identity of a given vehicle, given a dataset of known associations, collected from different views and surveillance cameras. Generally, it can be cast as a ranking problem: given a probe image of a vehicle, the model needs to rank all database images based on their similarities w.r.t the probe image. In line with recent research, we devise a metric learning model that employs a supervision based on local constraints. In particular, we leverage pairwise and triplet constraints for training a network capable of assigning a high degree of similarity to samples sharing the same identity, while keeping different identities distant in feature space. Eventually, we show how vehicle tracking can be exploited to automatically generate a weakly labelled dataset that can be used to train the deep network for the task of vehicle re-identification. Learning and evaluation is carried out on the NVIDIA AI city challenge videos.

2018 Relazione in Atti di Convegno

Using Kinect camera for investigating intergroup non-verbal human interactions

Authors: Vezzali, Loris; Di Bernardo, Gian Antonio; Cadamuro, Alessia; Cocco, Veronica Margherita; Crapolicchio, Eleonora; Bicocchi, Nicola; Calderara, Simone; Giovannini, Dino; Zambonelli, Franco; Cucchiara, Rita

A long tradition in social psychology focused on nonverbal behaviour displayed during dyadic interactions generally relying on evaluations from external … (Read full abstract)

A long tradition in social psychology focused on nonverbal behaviour displayed during dyadic interactions generally relying on evaluations from external coders. However, in addition to the fact that external coders may be biased, they may not capture certain type of behavioural indices. We designed three studies examining explicit and implicit prejudice as predictors of nonberval behaviour as reflected in objective indices provided by Kinect cameras. In the first study, we considered White-Black relations from the perspective of 36 White participants. Results revealed that implicit prejudice was associated with a reduction in interpersonal distance and in the volume of space between Whites and Blacks (vs. Whites and Whites), which in turn were associated with evaluations by collaborators taking part in the interaction. In the second study, 37 non-HIV participants interacted with HIV individuals. We found that implicit prejudice was associated with reduced volume of space between interactants over time (a process of bias overcorrection) only when they tried hard to control their behaviour (as captured by a stroop test). In the third study 35 non-disabled children interacted with disabled children. Results revealed that implicit prejudice was associated with reduced interpersonal distance over time.

2018 Abstract in Atti di Convegno

A new era in the study of intergroup nonverbal behaviour: Studying intergroup dyadic interactions “online”

Authors: Di Bernardo, Gian Antonio; Vezzali, Loris; Palazzi, Andrea; Calderara, Simone; Bicocchi, Nicola; Zambonelli, Franco; Cucchiara, Rita; Cadamuro, Alessia

We examined predictors and consequences of intergroup nonverbal behaviour by relying on new technologies and new objective indices. In three … (Read full abstract)

We examined predictors and consequences of intergroup nonverbal behaviour by relying on new technologies and new objective indices. In three studies, both in the laboratory and in the field with children, behaviour was a function of implicit prejudice.

2017 Abstract in Atti di Convegno

Attentive Models in Vision: Computing Saliency Maps in the Deep Learning Era

Authors: Cornia, Marcella; Abati, Davide; Baraldi, Lorenzo; Palazzi, Andrea; Calderara, Simone; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Estimating the focus of attention of a person looking at an image or a video is a crucial step which … (Read full abstract)

Estimating the focus of attention of a person looking at an image or a video is a crucial step which can enhance many vision-based inference mechanisms: image segmentation and annotation, video captioning, autonomous driving are some examples. The early stages of the attentive behavior are typically bottom-up; reproducing the same mechanism means to find the saliency embodied in the images, i.e. which parts of an image pop out of a visual scene. This process has been studied for decades in neuroscience and in terms of computational models for reproducing the human cortical process. In the last few years, early models have been replaced by deep learning architectures, that outperform any early approach compared against public datasets. In this paper, we propose a discussion on why convolutional neural networks (CNNs) are so accurate in saliency prediction. We present our DL architectures which combine both bottom-up cues and higher-level semantics, and incorporate the concept of time in the attentional process through LSTM recurrent architectures. Eventually, we present a video-specific architecture based on the C3D network, which can extracts spatio-temporal features by means of 3D convolutions to model task-driven attentive behaviors. The merit of this work is to show how these deep networks are not mere brute-force methods tuned on massive amount of data, but represent well-defined architectures which recall very closely the early saliency models, although improved with the semantics learned by human ground-thuth.

2017 Relazione in Atti di Convegno

From Groups to Leaders and Back. Exploring Mutual Predictability Between Social Groups and Their Leaders

Authors: Solera, Francesco; Calderara, Simone; Cucchiara, Rita

Recently, social theories and empirical observations identified small groups and leaders as the basic elements which shape a crowd. This … (Read full abstract)

Recently, social theories and empirical observations identified small groups and leaders as the basic elements which shape a crowd. This leads to an intermediate level of abstraction that is placed between the crowd as a flow of people, and the crowd as a collection of individuals. Consequently, automatic analysis of crowds in computer vision is also experiencing a shift in focus from individuals to groups and from small groups to their leaders. In this chapter, we present state-of-the-art solutions to the groups and leaders detection problem, which are able to account for physical factors as well as for sociological evidence observed over short time windows. The presented algorithms are framed as structured learning problems over the set of individual trajectories. However, the way trajectories are exploited to predict the structure of the crowd is not fixed but rather learned from recorded and annotated data, enabling the method to adapt these concepts to different scenarios, densities, cultures, and other unobservable complexities. Additionally, we investigate the relation between leaders and their groups and propose the first attempt to exploit leadership as prior knowledge for group detection.

2017 Capitolo/Saggio

Page 10 of 17 • Total publications: 161