Publications
Explore our research publications: papers, articles, and conference proceedings from AImageLab.
Tip: type @ to pick an author and # to pick a keyword.
Hand-designed local image descriptors vs. off-the-shelf CNN-based features for texture classification: an experimental comparison
Authors: Bello-Cerezo, Raquel; Bianconi, Francesco; Cascianelli, Silvia; Fravolini, Mario Luca; Di Maria, Francesco; Smeraldi, Fabrizio
Hands on the wheel: a Dataset for Driver Hand Detection and Tracking
Authors: Borghi, Guido; Frigieri, Elia; Vezzani, Roberto; Cucchiara, Rita
The ability to detect, localize and track the hands is crucial in many applications requiring the understanding of the person … (Read full abstract)
The ability to detect, localize and track the hands is crucial in many applications requiring the understanding of the person behavior, attitude and interactions. In particular, this is true for the automotive context, in which hand analysis allows to predict preparatory movements for maneuvers or to investigate the driver’s attention level. Moreover, due to the recent diffusion of cameras inside new car cockpits, it is feasible to use hand gestures to develop new Human-Car Interaction systems, more user-friendly and safe. In this paper, we propose a new dataset, called Turms, that consists of infrared images of driver’s hands, collected from the back of the steering wheel, an innovative point of view. The Leap Motion device has been selected for the recordings, thanks to its stereo capabilities and the wide view-angle. Besides, we introduce a method to detect the presence and the location of driver’s hands on the steering wheel, during driving activity tasks.
Head Detection with Depth Images in the Wild
Authors: Ballotta, Diego; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita
Head detection and localization is a demanding task and a key element for many computer vision applications, like video surveillance, … (Read full abstract)
Head detection and localization is a demanding task and a key element for many computer vision applications, like video surveillance, Human Computer Interaction and face analysis. The stunning amount of work done for detecting faces on RGB images, together with the availability of huge face datasets, allowed to setup very effective systems on that domain. However, due to illumination issues, infrared or depth cameras may be required in real applications. In this paper, we introduce a novel method for head detection on depth images that exploits the classification ability of deep learning approaches. In addition to reduce the dependency on the external illumination, depth images implicitly embed useful information to deal with the scale of the target objects. Two public datasets have been exploited: the first one, called Pandora, is used to train a deep binary classifier with face and non-face images. The second one, collected by Cornell University, is used to perform a cross-dataset test during daily activities in unconstrained environments. Experimental results show that the proposed method overcomes the performance of state-of-art methods working on depth images.
Improving Skin Lesion Segmentation with Generative Adversarial Networks
Authors: Pollastri, Federico; Bolelli, Federico; Paredes, Roberto; Grana, Costantino
Published in: PROCEEDINGS IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS
This paper proposes a novel strategy that employs Generative Adversarial Networks (GANs) to augment data in the image segmentation field, … (Read full abstract)
This paper proposes a novel strategy that employs Generative Adversarial Networks (GANs) to augment data in the image segmentation field, and a Convolutional-Deconvolutional Neural Network (CDNN) to automatically generate lesion segmentation mask from dermoscopic images. Training the CDNN with our GAN generated data effectively improves the state-of-the-art.
LAMV: Learning to align and match videos with kernelized temporal layers
Authors: Baraldi, Lorenzo; Douze, Matthijs; Cucchiara, Rita; Jégou, Hervé
Published in: PROCEEDINGS - IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION
This paper considers a learnable approach for comparing and aligning videos. Our architecture builds upon and revisits temporal match kernels … (Read full abstract)
This paper considers a learnable approach for comparing and aligning videos. Our architecture builds upon and revisits temporal match kernels within neural networks: we propose a new temporal layer that finds temporal alignments by maximizing the scores between two sequences of vectors, according to a time-sensitive similarity metric parametrized in the Fourier domain. We learn this layer with a temporal proposal strategy, in which we minimize a triplet loss that takes into account both the localization accuracy and the recognition rate. We evaluate our approach on video alignment, copy detection and event retrieval. Our approach outperforms the state on the art on temporal video alignment and video copy detection datasets in comparable setups. It also attains the best reported results for particular event search, while precisely aligning videos.
Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World
Authors: Fabbri, Matteo; Lanzi, Fabio; Calderara, Simone; Palazzi, Andrea; Vezzani, Roberto; Cucchiara, Rita
Published in: LECTURE NOTES IN COMPUTER SCIENCE
Multi-People Tracking in an open-world setting requires a special effort in precise detection. Moreover, temporal continuity in the detection phase … (Read full abstract)
Multi-People Tracking in an open-world setting requires a special effort in precise detection. Moreover, temporal continuity in the detection phase gains more importance when scene cluttering introduces the challenging problems of occluded targets. For the purpose, we propose a deep network architecture that jointly extracts people body parts and associates them across short temporal spans. Our model explicitly deals with occluded body parts, by hallucinating plausible solutions of not visible joints. We propose a new end-to-end architecture composed by four branches (visible heatmaps, occluded heatmaps, part affinity fields and temporal affinity fields) fed by a time linker feature extractor. To overcome the lack of surveillance data with tracking, body part and occlusion annotations we created the vastest Computer Graphics dataset for people tracking in urban scenarios by exploiting a photorealistic videogame. It is up to now the vastest dataset (about 500.000 frames, almost 10 million body poses) of human body parts for people tracking in urban scenarios. Our architecture trained on virtual data exhibits good generalization capabilities also on public real tracking benchmarks, when image resolution and sharpness are high enough, producing reliable tracklets useful for further batch data association or re-id modules.
Learning to Generate Facial Depth Maps
Authors: Pini, Stefano; Grazioli, Filippo; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita
In this paper, an adversarial architecture for facial depth map estimation from monocular intensity images is presented. By following an … (Read full abstract)
In this paper, an adversarial architecture for facial depth map estimation from monocular intensity images is presented. By following an image-to-image approach, we combine the advantages of supervised learning and adversarial training, proposing a conditional Generative Adversarial Network that effectively learns to translate intensity face images into the corresponding depth maps. Two public datasets, namely Biwi database and Pandora dataset, are exploited to demonstrate that the proposed model generates high-quality synthetic depth images, both in terms of visual appearance and informative content. Furthermore, we show that the model is capable of predicting distinctive facial details by testing the generated depth maps through a deep model trained on authentic depth maps for the face verification task.
Low-cost pupillometry for human-computer interface
Authors: Goddi, A; Ponzio, F; Ficarra, E; Di Cataldo, S; Roatta, S.
Changes in pupil size are governed by the autonomic nervous system but may also be systematically driven by voluntary shifting … (Read full abstract)
Changes in pupil size are governed by the autonomic nervous system but may also be systematically driven by voluntary shifting the gaze in depth. Thus, the pupil accommodative response (PAR) that accompanies voluntary gaze shifts from a far (3 m distance) to a near (30 cm) visual target might be exploited as a simple human-computer interface (HCI), bypassing the somato-motor system.
MDM2 and Aurora Kinase a Contribute to SETD2 Loss of Function in Advanced Systemic Mastocytosis: Implications for Pathogenesis and Treatment
Authors: Mancini, Manuela; Monaldi, Cecilia; De Santis, Sara; Papayannidis, Cristina; Rondoni, Michela; Bavaro, Luana; Martelli, Margherita; Maria Chiara, Abbenante; Curti, Antonio; Ficarra, Elisa; Paciello, Giulia; Chiara Fontana, Maria; Zanotti, Roberta; Bonifacio, Massimiliano; Scaffidi, Luigi; Pagano, Livio; Criscuolo, Marianna; Albano, Francesco; Ciceri, Fabio; Elena, Chiara; Tosi, Patrizia; Delledonne, Massimo; Avanzato, Carla; Xumerle, Luciano; Valent, Peter; Martinelli, Giovanni; Cavo, Michele; Soverini, Simona
Published in: BLOOD