Publications
Explore our research publications: papers, articles, and conference proceedings from AImageLab.
Tip: type @ to pick an author and # to pick a keyword.
Similarity-Based Retrieval with MPEG-7 3D Descriptors: Performance Evaluation on the Princeton Shape Benchmark
Authors: Grana, Costantino; M., Davolio; Cucchiara, Rita
In this work, we describe in detail the new MPEG-7 Perceptual 3D Shape Descriptor and provide a set of tests … (Read full abstract)
In this work, we describe in detail the new MPEG-7 Perceptual 3D Shape Descriptor and provide a set of tests with different 3D objects databases, mainly with the Princeton Shape Benchmark. With this purpose we created a function library called Retrieval-3D and fixed some bugs of the MPEG-7 eXperimentation Model (XM). We explain how to match the Attributed Relational Graph (ARG) of every 3D model with the modified nested Earth Mover’s Distance (mnEMD). Finally we compare our results with the best found in literature, including the first MPEG-7 3D descriptor, i.e. the Shape Spectrum Descriptor.
Sports Video Annotation Using Enhanced HSV Histograms in Multimedia Ontologies
Authors: M., Bertini; A., Del Bimbo; C., Torniai; Grana, Costantino; Vezzani, Roberto; Cucchiara, Rita
This paper presents multimedia ontologies, where multimedia data and traditional textual ontologies are merged. A solution for their implementation for … (Read full abstract)
This paper presents multimedia ontologies, where multimedia data and traditional textual ontologies are merged. A solution for their implementation for the soccer video domain and a method to perform automatic soccer video annotation using these extended ontologies is shown. HSV is a widely adopted space in image and video retrieval, but its quantization for histogram generation can create misleading errors in classification of achromatic and low saturated colors. In this paper we propose an Enhanced HSV Histogram with achromatic point detection based on a single Hue and Saturation parameter that can correct this limitation.The more general concepts of the sport domain (e.g. play/break, crowd, etc.) are put in correspondence with the more general visual features of the video like color and texture, while the more specific concepts of the soccer domain (e.g. highlights such as attack actions) are put in correspondence with domain specific visual feature like the soccer playfield and the players. Experimental results for annotation of soccer videos using generic concepts are presented.
Using a Wireless Sensor Network to Enhance Video Surveillance
Authors: Cucchiara, Rita; Prati, Andrea; Vezzani, Roberto; L., Benini; E., Farella; P., Zappi
Published in: JOURNAL OF UBIQUITOUS COMPUTING AND INTELLIGENCE
To enhance video surveillance systems, multi-modal sensor integration can be a successful strategy. In this work, a computer vision system … (Read full abstract)
To enhance video surveillance systems, multi-modal sensor integration can be a successful strategy. In this work, a computer vision system able to detect and track people from multiple cameras is integrated with a wireless sensor network mounting passive Pyroelectric InfraRed sensors. Thetwo subsystems are briefly described and possible cases in which computer vision algorithms are likely to fail are discussed. Then, simple but reliable outputs from the sensor nodes are exploited to improve the accuracy of the vision system. In particular, two case studies are reported: the first uses the presence detection of sensors to disambiguate between an open door and a moving person, while the second handles motion direction changes during occlusions. Preliminary results are reported and demonstrate the usefulness of the integration of the two subsystems.
Video Shots Comparison using the Mallows Distance
Authors: Grana, Costantino; Borghesani, Daniele; Cucchiara, Rita
Published in: PROCEEDINGS - INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS
In this work, we focus on two aspects of the comparison of video shots. We present a new approach to … (Read full abstract)
In this work, we focus on two aspects of the comparison of video shots. We present a new approach to extract a variable number of key frames from a shot, by the use of a hierarchical clustering with automatic level selection, in order to provide optimal allocation of features on different parts of the shot. We then employ the Mallows distance as an effective technique to compare the discrete distributions of features, independently from the features selected for the specific application. Results and comparisons on a soccer documentary video are provided.
Video transcoding and streaming for mobile applications
Authors: Gualdi, G.; Prati, A.; Cucchiara, R.
Published in: LECTURE NOTES IN COMPUTER SCIENCE
The present work shows a system for compressing and streaming of live videos over networks with low bandwidths (radio mobile … (Read full abstract)
The present work shows a system for compressing and streaming of live videos over networks with low bandwidths (radio mobile networks), with the objective to design an effective solution for mobile video access. We present a mobile ready-to-use streaming system, that encodes video using h264 codec (offering good quality and frame rate at very low bit-rates) and streams it over the network using UDP protocol. A dynamic frame rate control has been implemented in order to obtain the best trade off between playback fluency and latency. © Springer-Verlag Berlin Heidelberg 2007.
Visor: Video Surveillance Online Repository
Authors: Vezzani, Roberto; Cucchiara, Rita
Aim of the Visor Project [1] is to gather and makefreely available a repository of surveillance andvideo footages for the … (Read full abstract)
Aim of the Visor Project [1] is to gather and makefreely available a repository of surveillance andvideo footages for the research community onpattern recognition and multimedia retrieval. Thegoal is to create an open forum and a free repositoryto exchange, compare and discuss results of manyproblems in video surveillance and retrieval.Together with the videos, the repository containsmetadata annotation, both manually annotated asground-truth and automatically obtained by videosurveillance systems. Annotation refers to a largeontology of concepts on surveillance and securityrelated objects and events. The ontology has beendefined including concepts from LSCOM andMediaMill ontologies. As well as videos andannotations, Visor provides tools for enriching theontology, annotating new videos, searching bytextual queries, composing and downloading videos.
3-D Virtual Environments on Mobile Devices for Remote Surveillance
Authors: Vezzani, Roberto; Cucchiara, Rita; A., Malizia; L., Cinque
In this paper we present a distributed videosurveillanceframework. Our end is the remote monitoringof the behavior of people moving in … (Read full abstract)
In this paper we present a distributed videosurveillanceframework. Our end is the remote monitoringof the behavior of people moving in a scene exploitinga virtual reconstruction on low capabilitiesdevices, like PDAs and cell phones. The main noveltyof this system is the effective integration of the computervision and computer graphics modules. The first,using a probabilistic frameworks, can detect the position,the trajectory and the posture of peoples movingin the scene. The second exploits the new possibility ofboth standard 3D graphics libraries on mobile (namelyJSR184 and M3G graphic format) and new PDAsprocessing capability in order to reconstruct the remotesurveillance data in real-time.
A 3D geometric approach to face detection and facial expression recognition
Authors: Gaeta, Matteo; Iovane, Gerardo; Sangineto, Enver
Published in: JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY
Face detection and facial expression recognition are research areas with important application possibilities. Although the two problems are usually dealt … (Read full abstract)
Face detection and facial expression recognition are research areas with important application possibilities. Although the two problems are usually dealt with different approaches, we show in this paper how the same recognition process can be used to recognize both a generic “class-face” in a given, possibly complex image, and a specific facial expression. The approach we propose is based on two steps. In the former we use alignment techniques in order to overlap the 3D representations of the main face components with the 2D image elements. In the latter we compare the candidate groups of localized components with a set of structural models, each of which representing a facial expression. Expressionindependent face detection is achieved using the same approach with a model built generalizing over a set of face examples with different expressions.
A Distributed Domotic Surveillance System
Authors: Cucchiara, Rita; Grana, Costantino; Prati, Andrea; Vezzani, Roberto
Distributed video surveillance has a direct application in intelligent home automation or domotics (from the Latin word domus, that means … (Read full abstract)
Distributed video surveillance has a direct application in intelligent home automation or domotics (from the Latin word domus, that means “home”, and informatics); in particular, in-house videosurveillance can provide good support for people with some difficulties (e.g., elderly or disabled people) living alone and with a limited autonomy. New hardware technologies for surveillance are now affordable and provide high reliability. Problems related to reliable software solutions are not completely solved, especially concerning the application of general-purpose computer vision techniques in indoor environments. Indeed, assuming the objective is to detect the presence of people, track them, and recognize dangerous behaviours by means of abrupt changes in their posture, robust techniques must cope with non-trivial difficulties. In particular, luminance changes and shadows must be taken into account, frequent posture changes must be faced, and large and long-lasting occlusions are common due to the vicinity of the cameras and the presence of furnitureand doors that can often hide parts of the person’s body. These problems are analyzed and solutions based on background suppression, appearance-based probabilistic tracking, and probabilistic reasoning for posture recognition are described.