Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Video Surveillance and Privacy: A Solvable Paradox?

Authors: Cucchiara, Rita; Baraldi, Lorenzo; Cornia, Marcella; Sarto, Sara

Published in: COMPUTER

Video Surveillance started decades ago to remotely monitor specific areas and allow control from human inspectors. Later, Computer Vision gradually … (Read full abstract)

Video Surveillance started decades ago to remotely monitor specific areas and allow control from human inspectors. Later, Computer Vision gradually replaced human monitoring, firstly through motion alerts and now with Deep Learning techniques. From the beginning of this journey, people have worried about the risk of privacy violations. This article surveys the main steps of Computer Vision in Video Surveillance, from early approaches for people detection and tracking to action analysis and language description, outlining the most relevant directions on the topic to deal with privacy concerns. We show how the relationship between Video Surveillance and privacy is a biased paradox since surveillance provides increased safety but does not necessarily require the people identification. Through experiments on action recognition and natural language description, we showcase that the paradox of surveillance and privacy can be solved by Artificial Intelligence and that the respect of human rights is not an impossible chimera.

2024 Articolo su rivista

What’s Outside the Intersection? Fine-grained Error Analysis for Semantic Segmentation Beyond IoU

Authors: Bernhard, Maximilian; Amoroso, Roberto; Kindermann, Yannic; Baraldi, Lorenzo; Cucchiara, Rita; Tresp, Volker; Schubert, Matthias

2024 Relazione in Atti di Convegno

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

Authors: Caffagni, Davide; Cocchi, Federico; Moratelli, Nicholas; Sarto, Sara; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS

Multimodal LLMs are the natural evolution of LLMs and enlarge their capabilities so as to work beyond the pure textual … (Read full abstract)

Multimodal LLMs are the natural evolution of LLMs and enlarge their capabilities so as to work beyond the pure textual modality. As research is being carried out to design novel architectures and vision-and-language adapters in this paper we concentrate on endowing such models with the capability of answering questions that require external knowledge. Our approach termed Wiki-LLaVA aims at integrating an external knowledge source of multimodal documents which is accessed through a hierarchical retrieval pipeline. Relevant passages using this approach are retrieved from the external knowledge source and employed as additional context for the LLM augmenting the effectiveness and precision of generated dialogues. We conduct extensive experiments on datasets tailored for visual question answering with external data and demonstrate the appropriateness of our approach.

2024 Relazione in Atti di Convegno

A Framework to Improve the Comparability and Reproducibility of Morphing Attack Detectors

Authors: Di Domenico, Nicolò; Borghi, Guido; Franco, Annalisa; Ferrara, Matteo; Maltoni, Davide

2023 Relazione in Atti di Convegno

Annotating the Inferior Alveolar Canal: the Ultimate Tool

Authors: Lumetti, Luca; Pipoli, Vittorio; Bolelli, Federico; Grana, Costantino

Published in: LECTURE NOTES IN COMPUTER SCIENCE

The Inferior Alveolar Nerve (IAN) is of main interest in the maxillofacial field, as an accurate localization of such nerve … (Read full abstract)

The Inferior Alveolar Nerve (IAN) is of main interest in the maxillofacial field, as an accurate localization of such nerve reduces the risks of injury during surgical procedures. Although recent literature has focused on developing novel deep learning techniques to produce accurate segmentation masks of the canal containing the IAN, there are still strong limitations due to the scarce amount of publicly available 3D maxillofacial datasets. In this paper, we present an improved version of a previously released tool, IACAT (Inferior Alveolar Canal Annotation Tool), today used by medical experts to produce 3D ground truth annotation. In addition, we release a new dataset, ToothFairy, which is part of the homonymous MICCAI2023 challenge hosted by the Grand-Challenge platform, as an extension of the previously released Maxillo dataset, which was the only publicly available. With ToothFairy, the number of annotations has been increased as well as the quality of existing data.

2023 Relazione in Atti di Convegno

Artificial intelligence evaluation of confocal microscope prostate images: our preliminary experience

Authors: Bianchi, G.; Puliatti, S.; Rodriguez, N.; Micali, S.; Bertoni, L.; Reggiani Bonetti, L.; Caramaschi, S.; Bolelli, F.; Pinamonti, M.; Rozze, D.; Grana, C.

Published in: MINERVA UROLOGY AND NEPHROLOGY

2023 Articolo su rivista

Avoiding the Pitfalls on Stock Market: Challenges and Solutions in Developing Quantitative Strategies

Authors: Bergianti, M.; Cioffo, N.; Del Buono, F.; Paganelli, M.; Porrello, A.

Published in: CEUR WORKSHOP PROCEEDINGS

Quantitative stock trading based on Machine Learning (ML) and Deep Learning (DL) has gained great attention in recent years thanks … (Read full abstract)

Quantitative stock trading based on Machine Learning (ML) and Deep Learning (DL) has gained great attention in recent years thanks to the ever-increasing availability of financial data and the ability of this technology to analyze the complex dynamics of the stock market. Despite the plethora of approaches present in literature, a large gap exists between the solutions produced by the scientific community and the practices adopted in real-world systems. Most of these works in fact lack a practical vision of the problem and ignore the main issues afflicting fintech practitioners. To fill such a gap, we provide a systematic review of the main dangers affecting the development of an ML/DL pipeline in the financial domain. They include managing the stochastic and non-stationary characteristics of stock data, various types of bias, overfitting of models and devising impartial valuation methods. Finally, we present possible solutions to these critical issues.

2023 Relazione in Atti di Convegno

BERT Classifies SARS-CoV-2 Variants

Authors: Ghione, G.; Lovino, M.; Ficarra, E.; Cirrincione, G.

Published in: SMART INNOVATION, SYSTEMS AND TECHNOLOGIES

Medical diagnostics faced numerous difficulties during the COVID-19 pandemic. One of these has been the need for ongoing monitoring of … (Read full abstract)

Medical diagnostics faced numerous difficulties during the COVID-19 pandemic. One of these has been the need for ongoing monitoring of SARS-CoV-2 mutations. Genomics is the technique most frequently used for precisely identifying variants. The ongoing global gathering of RNA samples of the virus has made such an approach possible. Nevertheless, variant identification techniques are frequently resource-intensive. As a result, the diagnostic capability of small medical laboratories might not be sufficient. In this work, an effective deep learning strategy for identifying SARS-CoV-2 variants is presented. This work makes two contributions: (1) a fine-tuning architecture of Bidirectional Encoder Representations from Transformers (BERT) to identify SARS-CoV-2 variants; (2) providing biological insights by exploiting BERT self-attention. Such an approach enables the analysis of the S gene of the virus to quickly recognize its variant. The selected model BERT is a transformer-based neural network first developed for natural language processing. Nonetheless, it has been effectively used in numerous applications, such as genomic sequence analysis. Thus, the fine-tuning of BERT was performed to adapt it to the RNA sequence domain, achieving a 98.59% F1-score on test data: it was successful in identifying variants circulating to date. The interpretability of the model was examined, since BERT utilizes the self-attention mechanism. In fact, it was discovered that by attending particular areas of the S gene, BERT extracts pertinent biological information on variants. Thus, the presented approach allows obtaining insights into the particular characteristics of SARS-CoV-2 RNA samples.

2023 Capitolo/Saggio

Buffer-MIL: Robust Multi-instance Learning with a Buffer-Based Approach

Authors: Bontempo, G.; Lumetti, L.; Porrello, A.; Bolelli, F.; Calderara, S.; Ficarra, E.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Histopathological image analysis is a critical area of research with the potential to aid pathologists in faster and more accurate … (Read full abstract)

Histopathological image analysis is a critical area of research with the potential to aid pathologists in faster and more accurate diagnoses. However, Whole-Slide Images (WSIs) present challenges for deep learning frameworks due to their large size and lack of pixel-level annotations. Multi-Instance Learning (MIL) is a popular approach that can be employed for handling WSIs, treating each slide as a bag composed of multiple patches or instances. In this work we propose Buffer-MIL, which aims at tackling the covariate shift and class imbalance characterizing most of the existing histopathological datasets. With this goal, a buffer containing the most representative instances of each disease-positive slide of the training set is incorporated into our model. An attention mechanism is then used to compare all the instances against the buffer, to find the most critical ones in a given slide. We evaluate Buffer-MIL on two publicly available WSI datasets, Camelyon16 and TCGA lung cancer, outperforming current state-of-the-art models by 2.2% of accuracy on Camelyon16.

2023 Relazione in Atti di Convegno

CarPatch: A Synthetic Benchmark for Radiance Field Evaluation on Vehicle Components

Authors: Di Nucci, D.; Simoni, A.; Tomei, M.; Ciuffreda, L.; Vezzani, R.; Cucchiara, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Neural Radiance Fields (NeRFs) have gained widespread recognition as a highly effective technique for representing 3D reconstructions of objects and … (Read full abstract)

Neural Radiance Fields (NeRFs) have gained widespread recognition as a highly effective technique for representing 3D reconstructions of objects and scenes derived from sets of images. Despite their efficiency, NeRF models can pose challenges in certain scenarios such as vehicle inspection, where the lack of sufficient data or the presence of challenging elements (e.g. reflections) strongly impact the accuracy of the reconstruction. To this aim, we introduce CarPatch, a novel synthetic benchmark of vehicles. In addition to a set of images annotated with their intrinsic and extrinsic camera parameters, the corresponding depth maps and semantic segmentation masks have been generated for each view. Global and part-based metrics have been defined and used to evaluate, compare, and better characterize some state-of-the-art techniques. The dataset is publicly released at https://aimagelab.ing.unimore.it/go/ carpatch and can be used as an evaluation guide and as a baseline for future work on this challenging topic.

2023 Relazione in Atti di Convegno

Page 23 of 110 • Total publications: 1098