Publications - AImageLab

Metodi di Deep Learning Efficienti e Adattivi per Sistemi di Automatic Data Capture

Authors: Vezzali, Enrico

I sistemi di Automatic Data Capture (ADC) rappresentano una tecnologia fondamentale per la logistica, il commercio e la produzione moderna, … (Read full abstract)

I sistemi di Automatic Data Capture (ADC) rappresentano una tecnologia fondamentale per la logistica, il commercio e la produzione moderna, consentendo tracciabilità, automazione e monitoraggio dei processi tramite la rapida acquisizione di informazioni visive o codificate. Tra queste tecnologie, i codici a barre restano una delle soluzioni più diffuse ed economiche per l’identificazione dei prodotti. Tuttavia, nonostante la loro maturità, il riconoscimento di codici e simboli presenta ancora difficoltà in condizioni industriali reali, dove variazioni di illuminazione, sfocature, lunghe distanze o bassa risoluzione riducono la leggibilità. Gli algoritmi di visione artificiale tradizionale – basati su analisi geometriche, operatori morfologici o sulla trasformata di Hough – sono affidabili in contesti controllati, ma non quando le condizioni di acquisizione si discostano dai parametri nominali. Le tecniche di deep learning, invece, offrono maggiore flessibilità e robustezza, ma richiedono risorse computazionali elevate che ne limitano l’uso su piattaforme embedded. Colmare questo divario tra accuratezza ed efficienza è quindi essenziale per la prossima generazione di sistemi ADC intelligenti. La tesi analizza strategie di benchmarking, ottimizzazione e deployment di modelli di deep learning efficienti per applicazioni ADC industriali. Il lavoro, svolto in collaborazione con Datalogic S.p.A., si concentra sull’integrazione di architetture neurali adattive in ambienti vincolati e in tempo reale. La prima parte affronta la carenza di dati open source e benchmark riproducibili nella localizzazione di codici a barre. A tal fine è stato sviluppato BarBeR – Barcode Benchmark Repository, un framework pubblico con 8 748 immagini annotate che unifica approcci classici e metodi di deep learning sotto protocolli comuni, garantendo confronti equi e riproducibilità. I test hanno confermato che, sebbene i modelli deep superino quelli tradizionali in accuratezza, il loro costo computazionale resta un ostacolo per l’esecuzione in tempo reale su dispositivi embedded. Per superare tale limite è stato proposto BaFaLo, un localizzatore leggero basato sulla segmentazione, ottimizzato per operare su CPU senza acceleratori. Ispirato al paradigma Fast-SCNN, BaFaLo bilancia velocità e precisione, rilevando codici piccoli o degradati in condizioni difficili e mantenendo prestazioni real-time. Poiché la sola localizzazione non basta, e occorre leggere i codici anche in condizioni avverse, è stato introdotto Mosaic-SR, un metodo di super-risoluzione adattivo a più passaggi che alloca le risorse di calcolo alle regioni più complesse. Guidato da una stima di incertezza, Mosaic-SR migliora accuratezza e latenza rispetto agli approcci uniformi, consentendo ricostruzioni di alta qualità su hardware embedded. L’ultima parte, svolta presso l’Integrated Systems Laboratory dell’ETH Zurich, riguarda la quantizzazione e il deployment di modelli generativi. Combinando strategie avanzate come SVDQuant e la quantizzazione della cache, è stato possibile ridurre di oltre il 50 % la memoria richiesta senza compromettere qualità o stabilità. Questi risultati aprono la strada all’uso di modelli generativi su piattaforme a risorse limitate e alla creazione di dataset sintetici quando i dati reali o open source sono insufficienti. In sintesi, la tesi dimostra come il deep learning efficiente e adattivo renda accessibili capacità visive avanzate ai sistemi ADC in tempo reale. Attraverso benchmarking, ottimizzazione e deployment di architetture neurali per rilevamento, miglioramento e generazione, il lavoro contribuisce all’evoluzione della visione industriale: da pipeline rigide e basate su regole a soluzioni flessibili e guidate dai dati, affidabili anche in condizioni operative reali

2026 Tesi di dottorato

IRIS

Video Frame Synthesis combining Conventional and Event Cameras

Authors: Pini, Stefano; Borghi, Guido; Vezzani, Roberto

Published in: INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE

Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output … (Read full abstract)

Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output a corresponding stream of asynchronous events. Despite having multiple advantages with respect to conventional cameras, their use is limited due to the scarce compatibility of asynchronous event streams with traditional data processing and vision algorithms. In this regard, we present a framework that synthesizes RGB frames from the output stream of an event camera and an initial or a periodic set of color key-frames. The deep learning-based frame synthesis framework consists of an adversarial image-to-image architecture and a recurrent module. Two public event-based datasets, DDD17 and MVSEC, are used to obtain qualitative and quantitative per-pixel and perceptual results. In addition, we converted into event frames two additional wellknown datasets, namely Kitti and Cityscapes, in order to present semantic results, in terms of object detection and semantic segmentation accuracy. Extensive experimental evaluation confirm the quality and the capability of the proposed approach of synthesizing frame sequences from color key-frames and sequences of intermediate events.

2021 Articolo su rivista

DOI IRIS

Self Paced Deep Learning for Weakly Supervised Object Detection

Authors: Sangineto, E.; Nabi, M.; Culibrk, D.; Sebe, N.

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

In a weakly-supervised scenario object detectors need to be trained using image-level annotation alone. Since bounding-box-level ground truth is not … (Read full abstract)

In a weakly-supervised scenario object detectors need to be trained using image-level annotation alone. Since bounding-box-level ground truth is not available, most of the solutions proposed so far are based on an iterative, Multiple Instance Learning framework in which the current classifier is used to select the highest-confidence boxes in each image, which are treated as pseudo-ground truth in the next training iteration. However, the errors of an immature classifier can make the process drift, usually introducing many of false positives in the training dataset. To alleviate this problem, we propose in this paper a training protocol based on the self-paced learning paradigm. The main idea is to iteratively select a subset of images and boxes that are the most reliable, and use them for training. While in the past few years similar strategies have been adopted for SVMs and other classifiers, we are the first showing that a self-paced approach can be used with deep-network-based classifiers in an end-to-end training pipeline. The method we propose is built on the fully-supervised Fast-RCNN architecture and can be applied to similar architectures which represent the input image as a bag of boxes. We show state-of-the-art results on Pascal VOC 2007, Pascal VOC 2010 and ILSVRC 2013. OnILSVRC 2013 our results based on a low-capacity AlexNet network outperform even those weakly-supervised approaches which are based on much higher-capacity networks.

2019 Articolo su rivista

DOI IRIS

Learning Non-Target Items for Interesting Clothes Segmentation in Fashion Images

Authors: Grana, Costantino; Calderara, Simone; Borghesani, Daniele; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

In this paper we propose a color-based approach for skin detection and interest garment selection aimed at an automatic segmentation … (Read full abstract)

In this paper we propose a color-based approach for skin detection and interest garment selection aimed at an automatic segmentation of pieces of clothing. For both purposes, the color description is extracted by an iterative energy minimization approach and an automatic initialization strategy is proposed by learning geometric constraints and shape cues. Experiments confirms the good performance of this technique both in the context of skin removal and in the context of classification of garments.

2012 Relazione in Atti di Convegno

IRIS

Multistage Particle Windows for Fast and Accurate Object Detection

Authors: G., Gualdi; A., Prati; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

The common paradigm employed for object detection is the sliding window (SW) search. This approach generates grid-distributed patches, at all … (Read full abstract)

The common paradigm employed for object detection is the sliding window (SW) search. This approach generates grid-distributed patches, at all possible positions and sizes, which are evaluated by a binary classifier: the trade-off between computational burden and detection accuracy is the real critical point of sliding windows; several methods have been proposed to speed up the search such as adding complementary features. We propose a paradigm that differs from any previous approach, since it casts object detection into a statistical-based search using a Monte Carlo sampling for estimating the likelihood density function with Gaussian kernels. The estimation relies on a multi-stage strategy where the proposal distribution is progressively refined by taking into account the feedback of the classifiers. The method can be easily plugged in a Bayesian-recursive framework to exploit the temporal coherency of the target objects in videos. Several tests on pedestrian and face detection, both on images and videos, with different types of classifiers (cascade of boosted classifiers, soft cascades and SVM) and features (covariance matrices, Haar-like features, integral channel features and histogram of oriented gradients) demonstrate that the proposed method provides higher detection rates and accuracy as well as a lower computational burden w.r.t. sliding window detection.

2012 Articolo su rivista

DOI IRIS

A Fast Multi-model Approach for Object Duplicate Extraction

Authors: Piccinini, Paolo; Prati, Andrea; Cucchiara, Rita

This paper presents an innovative approach for localizingand segmenting duplicate objects for industrial applications.The working conditions are challenging, withcomplex heavily-occluded … (Read full abstract)

This paper presents an innovative approach for localizingand segmenting duplicate objects for industrial applications.The working conditions are challenging, withcomplex heavily-occluded objects, arranged at random inthe scene. To account for high flexibility and processingspeed, this approach exploits SIFT keypoint extraction andmean shift clustering to efficiently partition the correspondencesbetween the object model and the duplicates ontothe different object instances. The re-projection (by meansof an Euclidean transform) of some delimiting points ontothe current image is used to segment the object shapes. Thisprocedure is compared in terms of accuracy with existinghomography-based solutions which make use of RANSACto eliminate outliers in the homography estimation. Moreover,in order to improve the extraction in the case of reflectiveor transparent objects, multiple object models are usedand fused together. Experimental results on different andchallenging kinds of objects are reported.

2009 Relazione in Atti di Convegno

DOI IRIS

Detecting objects, shadows and ghosts in video streams by exploiting color and motion information

Authors: Cucchiara, Rita; Grana, Costantino; M., Piccardi; A., Prati

Many approaches to moving object detection for traffic monitoring and video surveillance proposed in the literature are based on background … (Read full abstract)

Many approaches to moving object detection for traffic monitoring and video surveillance proposed in the literature are based on background suppression methods. How to correctly and efficiently update the background model and how to deal with shadows are two of the more distinguishing and challenging features of such approaches. This work presents a general-purpose method for segmentation of moving visual objects (MVOs) based on an object-level classification in MVOs, ghosts and shadows. Background suppression needs a background model to be estimated and updated: we use motion and shadow information to selectively exclude from the background model MVOs and their shadows, while retaining ghosts. The color information (in the HSV color space) is exploited to shadow suppression and, consequently, to enhance both MVOs segmentation and background update.

2001 Relazione in Atti di Convegno

DOI IRIS

Improving shadow suppression in moving object detection with HSV color information

Authors: Cucchiara, Rita; Grana, Costantino; M., Piccardi; A., Prati; S., Sirotti

Video-surveillance and traffic analysis systems can be heavily improved using vision-based techniques able to extract, manage and track objects in … (Read full abstract)

Video-surveillance and traffic analysis systems can be heavily improved using vision-based techniques able to extract, manage and track objects in the scene. However, problems arise due to shadows. In particular, moving shadows can affect the correct localization, measurements and detection of moving objects. This work aims to present a technique for shadow detection and suppression used in a system for moving visual object detection and tracking. The major novelty of the shadow detection technique is the analysis carried out in the HSV color space to improve the accuracy in detecting shadows. Signal processing and optic motivations of the approach proposed are described. The integration and exploitation of the shadow detection module into the system are outlined and experimental results are shown and evaluated

2001 Relazione in Atti di Convegno

IRIS

Statistic and knowledge-based moving object detection in traffic scenes

Authors: Cucchiara, Rita; Grana, Costantino; M., Piccardi; A., Prati

The most common approach used for vision-based traffic surveillance consists of a fast segmentation of moving visual objects (MVOs) in … (Read full abstract)

The most common approach used for vision-based traffic surveillance consists of a fast segmentation of moving visual objects (MVOs) in the scene together with an intelligent reasoning module capable of identifying, tracking and classifying the MVOs in dependency of the system goal. In this paper we describe our approach for MVOs segmentation in an unstructured traffic environment. We consider complex situations with moving people, vehicles and infrastructures that have different aspect model and motion model. In this case we define a specific approach based on background subtraction with statistic and knowledge-based background update. We show many results of real-time tracking of traffic MVOs in outdoor traffic scene such as roads, parking area intersections, and entrance with barriers

2000 Relazione in Atti di Convegno

DOI IRIS