Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

What was Monet seeing while painting? Translating artworks to photo-realistic images

Authors: Tomei, Matteo; Baraldi, Lorenzo; Cornia, Marcella; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

State of the art Computer Vision techniques exploit the availability of large-scale datasets, most of which consist of images captured … (Read full abstract)

State of the art Computer Vision techniques exploit the availability of large-scale datasets, most of which consist of images captured from the world as it is. This brings to an incompatibility between such methods and digital data from the artistic domain, on which current techniques under-perform. A possible solution is to reduce the domain shift at the pixel level, thus translating artistic images to realistic copies. In this paper, we present a model capable of translating paintings to photo-realistic images, trained without paired examples. The idea is to enforce a patch level similarity between real and generated images, aiming to reproduce photo-realistic details from a memory bank of real images. This is subsequently adopted in the context of an unpaired image-to-image translation framework, mapping each image from one distribution to a new one belonging to the other distribution. Qualitative and quantitative results are presented on Monet, Cezanne and Van Gogh paintings translation tasks, showing that our approach increases the realism of generated images with respect to the CycleGAN approach.

2019 Relazione in Atti di Convegno

Whitening and coloring batch transform for GANS

Authors: Siarohin, A.; Sangineto, E.; Sebe, N.

Batch Normalization (BN) is a common technique used to speed-up and stabilize training. On the other hand, the learnable parameters … (Read full abstract)

Batch Normalization (BN) is a common technique used to speed-up and stabilize training. On the other hand, the learnable parameters of BN are commonly used in conditional Generative Adversarial Networks (cGANs) for representing class-specific information using conditional Batch Normalization (cBN). In this paper we propose to generalize both BN and cBN using a Whitening and Coloring based batch normalization. We show that our conditional Coloring can represent categorical conditioning information which largely helps the cGAN qualitative results. Moreover, we show that full-feature whitening is important in a general GAN scenario in which the training process is known to be highly unstable. We test our approach on different datasets and using different GAN networks and training protocols, showing a consistent improvement in all the tested frameworks. Our CIFAR-10 conditioned results are higher than all previous works on this dataset.

2019 Relazione in Atti di Convegno

Worldly eyes on video: Learnt vs. reactive deployment of attention to dynamic stimuli

Authors: Cuculo, V.; D'Amelio, A.; Grossi, G.; Lanzarotti, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Computational visual attention is a hot topic in computer vision. However, most efforts are devoted to model saliency, whilst the … (Read full abstract)

Computational visual attention is a hot topic in computer vision. However, most efforts are devoted to model saliency, whilst the actual eye guidance problem, which brings into play the sequence of gaze shifts characterising overt attention, is overlooked. Further, in those cases where the generation of gaze behaviour is considered, stimuli of interest are by and large static (still images) rather than dynamic ones (videos). Under such circumstances, the work described in this note has a twofold aim: (i) addressing the problem of estimating and generating visual scan paths, that is the sequences of gaze shifts over videos; (ii) investigating the effectiveness in scan path generation offered by features dynamically learned on the base of human observers attention dynamics as opposed to bottom-up derived features. To such end a probabilistic model is proposed. By using a publicly available dataset, our approach is compared against a model of scan path simulation that does not rely on a learning step.

2019 Relazione in Atti di Convegno

A Hierarchical Quasi-Recurrent approach to Video Captioning

Authors: Bolelli, Federico; Baraldi, Lorenzo; Grana, Costantino

Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Neural Networks, since they … (Read full abstract)

Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Neural Networks, since they can be utilized to both encode the input video and to create the corresponding description. In this paper, we present a recurrent video encoding scheme which can find and exploit the layered structure of the video. Differently from the established encoder-decoder approach, in which a video is encoded continuously by a recurrent layer, we propose to employ Quasi-Recurrent Neural Networks, further extending their basic cell with a boundary detector which can recognize discontinuity points between frames or segments and likewise modify the temporal connections of the encoding layer. We assess our approach on a large scale dataset, the Montreal Video Annotation dataset. Experiments demonstrate that our approach can find suitable levels of representation of the input information, while reducing the computational requirements.

2018 Relazione in Atti di Convegno

Aligning Text and Document Illustrations: towards Visually Explainable Digital Humanities

Authors: Baraldi, Lorenzo; Cornia, Marcella; Grana, Costantino; Cucchiara, Rita

While several approaches to bring vision and language together are emerging, none of them has yet addressed the digital humanities … (Read full abstract)

While several approaches to bring vision and language together are emerging, none of them has yet addressed the digital humanities domain, which, nevertheless, is a rich source of visual and textual data. To foster research in this direction, we investigate the learning of visual-semantic embeddings for historical document illustrations, devising both supervised and semi-supervised approaches. We exploit the joint visual-semantic embeddings to automatically align illustrations and textual elements, thus providing an automatic annotation of the visual content of a manuscript. Experiments are performed on the Borso d'Este Holy Bible, one of the most sophisticated illuminated manuscript from the Renaissance, which we manually annotate aligning every illustration with textual commentaries written by experts. Experimental results quantify the domain shift between ordinary visual-semantic datasets and the proposed one, validate the proposed strategies, and devise future works on the same line.

2018 Relazione in Atti di Convegno

Attentive Models in Vision: Computing Saliency Maps in the Deep Learning Era

Authors: Cornia, Marcella; Abati, Davide; Baraldi, Lorenzo; Palazzi, Andrea; Calderara, Simone; Cucchiara, Rita

Published in: INTELLIGENZA ARTIFICIALE

Estimating the focus of attention of a person looking at an image or a video is a crucial step which … (Read full abstract)

Estimating the focus of attention of a person looking at an image or a video is a crucial step which can enhance many vision-based inference mechanisms: image segmentation and annotation, video captioning, autonomous driving are some examples. The early stages of the attentive behavior are typically bottom-up; reproducing the same mechanism means to find the saliency embodied in the images, i.e. which parts of an image pop out of a visual scene. This process has been studied for decades both in neuroscience and in terms of computational models for reproducing the human cortical process. In the last few years, early models have been replaced by deep learning architectures, that outperform any early approach compared against public datasets. In this paper, we discuss the effectiveness of convolutional neural networks (CNNs) models in saliency prediction. We present a set of Deep Learning architectures developed by us, which can combine both bottom-up cues and higher-level semantics, and extract spatio-temporal features by means of 3D convolutions to model task-driven attentive behaviors. We will show how these deep networks closely recall the early saliency models, although improved with the semantics learned from the human ground-truth. Eventually, we will present a use-case in which saliency prediction is used to improve the automatic description of images.

2018 Articolo su rivista

Automatic Image Cropping and Selection using Saliency: an Application to Historical Manuscripts

Authors: Cornia, Marcella; Pini, Stefano; Baraldi, Lorenzo; Cucchiara, Rita

Published in: COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE

Automatic image cropping techniques are particularly important to improve the visual quality of cropped images and can be applied to … (Read full abstract)

Automatic image cropping techniques are particularly important to improve the visual quality of cropped images and can be applied to a wide range of applications such as photo-editing, image compression, and thumbnail selection. In this paper, we propose a saliency-based image cropping method which produces significant cropped images by only relying on the corresponding saliency maps. Experiments on standard image cropping datasets demonstrate the benefit of the proposed solution with respect to other cropping methods. Moreover, we present an image selection method that can be effectively applied to automatically select the most representative pages of historical manuscripts thus improving the navigation of historical digital libraries.

2018 Relazione in Atti di Convegno

Colorectal Cancer Classification using Deep Convolutional Networks. An Experimental Study

Authors: Ponzio, Francesco; Macii, Enrico; Ficarra, Elisa; Di Cataldo, Santa

The analysis of histological samples is of paramount importance for the early diagnosis of colorectal cancer (CRC). The traditional visual … (Read full abstract)

The analysis of histological samples is of paramount importance for the early diagnosis of colorectal cancer (CRC). The traditional visual assessment is time-consuming and highly unreliable because of the subjectivity of the evaluation. On the other hand, automated analysis is extremely challenging due to the variability of the architectural and colouring characteristics of the histological images. In this work, we propose a deep learning technique based on Convolutional Neural Networks (CNNs) to differentiate adenocarcinomas from healthy tissues and benign lesions. Fully training the CNN on a large set of annotated CRC samples provides good classification accuracy (around 90% in our tests), but on the other hand has the drawback of a very computationally intensive training procedure. Hence, in our work we also investigate the use of transfer learning approaches, based on CNN models pre-trained on a completely different dataset (i.e. the ImageNet). In our results, transfer learning considerably outperforms the CNN fully trained on CRC samples, obtaining an accuracy of about 96% on the same test dataset.

2018 Relazione in Atti di Convegno

Comportamento non verbale intergruppi “oggettivo”: una replica dello studio di Dovidio, kawakami e Gaertner (2002)

Authors: Di Bernardo, Gian Antonio; Vezzali, Loris; Giovannini, Dino; Palazzi, Andrea; Calderara, Simone; Bicocchi, Nicola; Zambonelli, Franco; Cucchiara, Rita; Cadamuro, Alessia; Cocco, Veronica Margherita

Vi è una lunga tradizione di ricerca che ha analizzato il comportamento non verbale, anche considerando relazioni intergruppi. Solitamente, questi … (Read full abstract)

Vi è una lunga tradizione di ricerca che ha analizzato il comportamento non verbale, anche considerando relazioni intergruppi. Solitamente, questi studi si avvalgono di valutazioni di coder esterni, che tuttavia sono soggettive e aperte a distorsioni. Abbiamo condotto uno studio in cui si è preso come riferimento il celebre studio di Dovidio, Kawakami e Gaertner (2002), apportando tuttavia alcune modifiche e considerando la relazione tra bianchi e neri. Partecipanti bianchi, dopo aver completato misure di pregiudizio esplicito e implicito, incontravano (in ordine contro-bilanciato) un collaboratore bianco e uno nero. Con ognuno di essi, parlavano per tre minuti di un argomento neutro e di un argomento saliente per la distinzione di gruppo (in ordine contro-bilanciato). Tali interazioni erano registrate con una telecamera kinect, che è in grado di tenere conto della componente tridimensionale del movimento. I risultati hanno rivelato vari elementi di interesse. Anzitutto, si sono creati indici oggettivi, a partire da un’analisi della letteratura, alcuni dei quali non possono essere rilevati da coder esterni, quali distanza interpersonale e volume di spazio tra le persone. I risultati hanno messo in luce alcuni aspetti rilevanti: (1) l’atteggiamento implicito è associato a vari indici di comportamento non verbale, i quali mediano sulle valutazioni dei partecipanti fornite dai collaboratori; (2) le interazioni vanno considerate in maniera dinamica, tenendo conto che si sviluppano nel tempo; (3) ciò che può essere importante è il comportamento non verbale globale, piuttosto che alcuni indici specifici pre-determinati dagli sperimentatori.

2018 Abstract in Atti di Convegno

Connected Components Labeling on DRAGs

Authors: Bolelli, Federico; Baraldi, Lorenzo; Cancilla, Michele; Grana, Costantino

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

In this paper we introduce a new Connected Components Labeling (CCL) algorithm which exploits a novel approach to model decision … (Read full abstract)

In this paper we introduce a new Connected Components Labeling (CCL) algorithm which exploits a novel approach to model decision problems as Directed Acyclic Graphs with a root, which will be called Directed Rooted Acyclic Graphs (DRAGs). This structure supports the use of sets of equivalent actions, as required by CCL, and optimally leverages these equivalences to reduce the number of nodes (decision points). The advantage of this representation is that a DRAG, differently from decision trees usually exploited by the state-of-the-art algorithms, will contain only the minimum number of nodes required to reach the leaf corresponding to a set of condition values. This combines the benefits of using binary decision trees with a reduction of the machine code size. Experiments show a consistent improvement of the execution time when the model is applied to CCL.

2018 Relazione in Atti di Convegno

Page 49 of 106 • Total publications: 1059