Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Layout analysis and content enrichment of digitized books

Authors: Grana, Costantino; Serra, Giuseppe; Manfredi, Marco; Coppi, Dalia; Cucchiara, Rita

Published in: MULTIMEDIA TOOLS AND APPLICATIONS

In this paper we describe a system for automatically analyzing old documents and creating hyper linking between different epochs, thus … (Read full abstract)

In this paper we describe a system for automatically analyzing old documents and creating hyper linking between different epochs, thus opening ancient documents to young people and to make them available on the web with old and current content. We propose a supervised learning approach to segment text and illustration of digitized old documents using a texture feature based on local correlation aimed at detecting the repeating patterns of text regions and differentiate them from pictorial elements. Moreover we present a solution to help the user in finding contemporary content connected to what is automatically extracted from the ancient documents.

2016 Articolo su rivista

Learning Personalized Models for Facial Expression Analysis and Gesture Recognition

Authors: Zen, Gloria; Porzi, Lorenzo; Sangineto, Enver; Ricci, Elisa; Sebe, Niculae

Published in: IEEE TRANSACTIONS ON MULTIMEDIA

Facial expression and gesture recognition algorithms are key enabling technologies for human-computer interaction (HCI) systems. State of the art approaches … (Read full abstract)

Facial expression and gesture recognition algorithms are key enabling technologies for human-computer interaction (HCI) systems. State of the art approaches for automatic detection of body movements and analyzing emotions from facial features heavily rely on advanced machine learning algorithms. Most of these methods are designed for the average user, but the assumption “one-size-fits-all” ignores diversity in cultural background, gender, ethnicity, and personal behavior, and limits their applicability in real-world scenarios. A possible solution is to build personalized interfaces, which practically implies learning person-specific classifiers and usually collecting a significant amount of labeled samples for each novel user. As data annotation is a tedious and time-consuming process, in this paper we present a framework for personalizing classification models which does not require labeled target data. Personalization is achieved by devising a novel transfer learning approach. Specifically, we propose a regression framework which exploits auxiliary (source) annotated data to learn the relation between person-specific sample distributions and parameters of the corresponding classifiers. Then, when considering a new target user, the classification model is computed by simply feeding the associated (unlabeled) sample distribution into the learned regression function. We evaluate the proposed approach in different applications: pain recognition and action unit detection using visual data and gestures classification using inertial measurements, demonstrating the generality of our method with respect to different input data types and basic classifiers. We also show the advantages of our approach in terms of accuracy and computational time both with respect to user-independent approaches and to previous personalization techniques.

2016 Articolo su rivista

Multi-Level Net: a Visual Saliency Prediction Model

Authors: Cornia, Marcella; Baraldi, Lorenzo; Serra, Giuseppe; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

State of the art approaches for saliency prediction are based on Full Convolutional Networks, in which saliency maps are built … (Read full abstract)

State of the art approaches for saliency prediction are based on Full Convolutional Networks, in which saliency maps are built using the last layer. In contrast, we here present a novel model that predicts saliency maps exploiting a non-linear combination of features coming from different layers of the network. We also present a new loss function to deal with the imbalance issue on saliency masks. Extensive results on three public datasets demonstrate the robustness of our solution. Our model outperforms the state of the art on SALICON, which is the largest and unconstrained dataset available, and obtains competitive results on MIT300 and CAT2000 benchmarks.

2016 Relazione in Atti di Convegno

Novel fusion transcripts identified by RNAseq cooperate with somatic mutations in the pathogenesis of acute myeloid leukemia

Authors: Antonella, Padella; Giorgia, Simonetti; Anna, Ferrari; Paciello, Giulia; Elisa, Zago; Carmen, Baldazzi; Viviana, Guadagnuolo; Cristina, Papayannidis; Valentina, Robustelli; Enrica, Imbrogno; Nicoletta, Testoni; Massimo, Delledonne; Ilaria, Iacobucci; Tiziana Clelia, Storlazzi; Ficarra, Elisa; Pier Luigi, Lollini; Giovanni, Martinelli

Published in: CANCER RESEARCH

2016 Abstract in Rivista

Optimized Connected Components Labeling with Pixel Prediction

Authors: Grana, Costantino; Baraldi, Lorenzo; Bolelli, Federico

Published in: LECTURE NOTES IN COMPUTER SCIENCE

In this paper we propose a new paradigm for connected components labeling, which employs a general approach to minimize the … (Read full abstract)

In this paper we propose a new paradigm for connected components labeling, which employs a general approach to minimize the number of memory accesses, by exploiting the information provided by already seen pixels, removing the need to check them again. The scan phase of our proposed algorithm is ruled by a forest of decision trees connected into a single graph. Every tree derives from a reduction of the complete optimal decision tree. Experimental results demonstrated that on low density images our method is slightly faster than the fastest conventional labeling algorithms.

2016 Relazione in Atti di Convegno

Optimizing image registration for interactive applications

Authors: Gasparini, Riccardo; Alletto, Stefano; Serra, Giuseppe; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

With the spread of wearable and mobile devices, the request for interactive augmented reality applications is in constant growth. Among … (Read full abstract)

With the spread of wearable and mobile devices, the request for interactive augmented reality applications is in constant growth. Among the different possibilities, we focus on the cultural heritage domain where a key step in the development applications for augmented cultural experiences is to obtain a precise localization of the user, i.e. the 6 degree-of-freedom of the camera acquiring the images used by the application. Current state of the art perform this task by extracting local descriptors from a query and exhaustively matching them to a sparse 3D model of the environment. While this procedure obtains good localization performance, due to the vast search space involved in the retrieval of 2D-3D correspondences this is often not feasible in real-time and interactive environments. In this paper we hence propose to perform descriptor quantization to reduce the search space and employ multiple KD-Trees combined with a principal component analysis dimensionality reduction to enable an efficient search. We experimentally show that our solution can halve the computational requirements of the correspondence search with regard to the state of the art while maintaining similar accuracy levels.

2016 Relazione in Atti di Convegno

Performance measures and a data set for multi-target, multi-camera tracking

Authors: Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C.

Published in: LECTURE NOTES IN ARTIFICIAL INTELLIGENCE

To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance … (Read full abstract)

To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance that treats errors of all types uniformly and emphasizes correct identification over sources of error; (ii) the largest fully-annotated and calibrated data set to date with more than 2 million frames of 1080 p, 60 fps video taken by 8 cameras observing more than 2, 700 identities over 85 min; and (iii) a reference software system as a comparison baseline. We show that (i) our measures properly account for bottom-line identity match performance in the multi-camera setting; (ii) our data set poses realistic challenges to current trackers; and (iii) the performance of our system is comparable to the state of the art.

2016 Relazione in Atti di Convegno

Quick, accurate, smart: 3D computer vision technology helps assessing confined animals' behaviour

Authors: Barnard, Shanis; Calderara, Simone; Pistocchi, Simone; Cucchiara, Rita; Podaliri Vulpiani, Michele; Messori, Stefano; Ferri, Nicola

Published in: PLOS ONE

Mankind directly controls the environment and lifestyles of several domestic species for purposes ranging from production and research to conservation … (Read full abstract)

Mankind directly controls the environment and lifestyles of several domestic species for purposes ranging from production and research to conservation and companionship. These environments and lifestyles may not offer these animals the best quality of life. Behaviour is a direct reflection of how the animal is coping with its environment. Behavioural indicators are thus among the preferred parameters to assess welfare. However, behavioural recording (usually from video) can be very time consuming and the accuracy and reliability of the output rely on the experience and background of the observers. The outburst of new video technology and computer image processing gives the basis for promising solutions. In this pilot study, we present a new prototype software able to automatically infer the behaviour of dogs housed in kennels from 3D visual data and through structured machine learning frameworks. Depth information acquired through 3D features, body part detection and training are the key elements that allow the machine to recognise postures, trajectories inside the kennel and patterns of movement that can be later labelled at convenience. The main innovation of the software is its ability to automatically cluster frequently observed temporal patterns of movement without any pre-set ethogram. Conversely, when common patterns are defined through training, a deviation from normal behaviour in time or between individuals could be assessed. The software accuracy in correctly detecting the dogs' behaviour was checked through a validation process. An automatic behaviour recognition system, independent from human subjectivity, could add scientific knowledge on animals' quality of life in confinement as well as saving time and resources. This 3D framework was designed to be invariant to the dog's shape and size and could be extended to farm, laboratory and zoo quadrupeds in artificial housing. The computer vision technique applied to this software is innovative in non-human animal behaviour science. Further improvements and validation are needed, and future applications and limitations are discussed.

2016 Articolo su rivista

Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features

Authors: Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita

This paper presents a novel retrieval pipeline for video collections, which aims to retrieve the most significant parts of an … (Read full abstract)

This paper presents a novel retrieval pipeline for video collections, which aims to retrieve the most significant parts of an edited video for a given query, and represent them with thumbnails which are at the same time semantically meaningful and aesthetically remarkable. Videos are first segmented into coherent and story-telling scenes, then a retrieval algorithm based on deep learning is proposed to retrieve the most significant scenes for a textual query. A ranking strategy based on deep features is finally used to tackle the problem of visualizing the best thumbnail. Qualitative and quantitative experiments are conducted on a collection of edited videos to demonstrate the effectiveness of our approach.

2016 Relazione in Atti di Convegno

Shot, scene and keyframe ordering for interactive video re-use

Authors: Baraldi, L.; Grana, C.; Borghi, G.; Vezzani, R.; Cucchiara, R.

This paper presents a complete system for shot and scene detection in broadcast videos, as well as a method to … (Read full abstract)

This paper presents a complete system for shot and scene detection in broadcast videos, as well as a method to select the best representative key-frames, which could be used in new interactive interfaces for accessing large collections of edited videos. The final goal is to enable an improved access to video footage and the re-use of video content with the direct management of user-selected video-clips.

2016 Relazione in Atti di Convegno

Page 63 of 109 • Total publications: 1084