Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval

Authors: Messina, Nicola; Stefanini, Matteo; Cornia, Marcella; Baraldi, Lorenzo; Falchi, Fabrizio; Amato, Giuseppe; Cucchiara, Rita

Image-text matching is gaining a leading role among tasks involving the joint understanding of vision and language. In literature, this … (Read full abstract)

Image-text matching is gaining a leading role among tasks involving the joint understanding of vision and language. In literature, this task is often used as a pre-training objective to forge architectures able to jointly deal with images and texts. Nonetheless, it has a direct downstream application: cross-modal retrieval, which consists in finding images related to a given query text or vice-versa. Solving this task is of critical importance in cross-modal search engines. Many recent methods proposed effective solutions to the image-text matching problem, mostly using recent large vision-language (VL) Transformer networks. However, these models are often computationally expensive, especially at inference time. This prevents their adoption in large-scale cross-modal retrieval scenarios, where results should be provided to the user almost instantaneously. In this paper, we propose to fill in the gap between effectiveness and efficiency by proposing an ALign And DIstill Network (ALADIN). ALADIN first produces high-effective scores by aligning at fine-grained level images and texts. Then, it learns a shared embedding space – where an efficient kNN search can be performed – by distilling the relevance scores obtained from the fine-grained alignments. We obtained remarkable results on MS-COCO, showing that our method can compete with state-of-the-art VL Transformers while being almost 90 times faster. The code for reproducing our results is available at https://github.com/mesnico/ALADIN.

2022 Relazione in Atti di Convegno

Applications of AI and HPC in the Health Domain

Authors: Oniga, D.; Cantalupo, B.; Tartaglione, E.; Perlo, D.; Grangetto, M.; Aldinucci, M.; Bolelli, F.; Pollastri, F.; Cancilla, M.; Canalini, L.; Grana, C.; Alcalde, C. M.; Cardillo, F. A.; Florea, M.

2022 Capitolo/Saggio

Automated Prediction of Kidney Failure in IgA Nephropathy with Deep Learning from Biopsy Images

Authors: Testa, F.; Fontana, F.; Pollastri, F.; Chester, J.; Leonelli, M.; Giaroni, F.; Gualtieri, F.; Bolelli, F.; Mancini, E.; Nordio, M.; Sacco, P.; Ligabue, G.; Giovanella, S.; Ferri, M.; Alfano, G.; Gesualdo, L.; Cimino, S.; Donati, G.; Grana, C.; Magistroni, R.

Published in: CLINICAL JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY

Background and objectives Digital pathology and artificial intelligence offer new opportunities for automatic histologic scoring. We applied a deep learning … (Read full abstract)

Background and objectives Digital pathology and artificial intelligence offer new opportunities for automatic histologic scoring. We applied a deep learning approach to IgA nephropathy biopsy images to develop an automatic histologic prognostic score, assessed against ground truth (kidney failure) among patients with IgA nephropathy who were treated over 39 years. We assessed noninferiority in comparison with the histologic component of currently validated predictive tools. We correlated additional histologic features with our deep learning predictive score to identify potential additional predictive features. Design, setting, participants, & measurements Training for deep learning was performed with randomly selected, digitalized, cortical Periodic acid–Schiff–stained sections images (363 kidney biopsy specimens) to develop our deep learning predictive score. We estimated noninferiority using the area under the receiver operating characteristic curve (AUC) in a randomly selected group (95 biopsy specimens) against the gold standard Oxford classification (MEST-C) scores used by the International IgA Nephropathy Prediction Tool and the clinical decision supporting system for estimating the risk of kidney failure in IgA nephropathy. We assessed additional potential predictive histologic features against a subset (20 kidney biopsy specimens) with the strongest and weakest deep learning predictive scores. Results We enrolled 442 patients; the 10-year kidney survival was 78%, and the study median follow-up was 6.7 years. Manual MEST-C showed no prognostic relationship for the endocapillary parameter only. The deep learning predictive score was not inferior to MEST-C applied using the International IgA Nephropathy Prediction Tool and the clinical decision supporting system (AUC of 0.84 versus 0.77 and 0.74, respectively) and confirmed a good correlation with the tubolointerstitial score (r50.41, P,0.01). We observed no correlations between the deep learning prognostic score and the mesangial, endocapillary, segmental sclerosis, and crescent parameters. Additional potential predictive histopathologic features incorporated by the deep learning predictive score included (1)inflammation within areas of interstitial fibrosis and tubular atrophy and (2) hyaline casts. Conclusions The deep learning approach was noninferior to manual histopathologic reporting and considered prognostic features not currently included in MEST-C assessment.

2022 Articolo su rivista

Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions

Authors: Cascianelli, Silvia; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION

Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to … (Read full abstract)

Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to the digitization of handwritten documents and reuse of their content. The task becomes even more challenging when dealing with historical documents due to the variability of the writing style and degradation of the page quality. State-of-the-art HTR approaches typically couple recurrent structures for sequence modeling with Convolutional Neural Networks for visual feature extraction. Since convolutional kernels are defined on fixed grids and focus on all input pixels independently while moving over the input image, this strategy disregards the fact that handwritten characters can vary in shape, scale, and orientation even within the same document and that the ink pixels are more relevant than the background ones. To cope with these specific HTR difficulties, we propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text. We design two deformable architectures and conduct extensive experiments on both modern and historical datasets. Experimental results confirm the suitability of deformable convolutions for the HTR task.

2022 Articolo su rivista

CaMEL: Mean Teacher Learning for Image Captioning

Authors: Barraco, Manuele; Stefanini, Matteo; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Describing images in natural language is a fundamental step towards the automatic modeling of connections between the visual and textual … (Read full abstract)

Describing images in natural language is a fundamental step towards the automatic modeling of connections between the visual and textual modalities. In this paper we present CaMEL, a novel Transformer-based architecture for image captioning. Our proposed approach leverages the interaction of two interconnected language models that learn from each other during the training phase. The interplay between the two language models follows a mean teacher learning paradigm with knowledge distillation. Experimentally, we assess the effectiveness of the proposed solution on the COCO dataset and in conjunction with different visual feature extractors. When comparing with existing proposals, we demonstrate that our model provides state-of-the-art caption quality with a significantly reduced number of parameters. According to the CIDEr metric, we obtain a new state of the art on COCO when training without using external data. The source code and trained models will be made publicly available at: https://github.com/aimagelab/camel.

2022 Relazione in Atti di Convegno

Catastrophic Forgetting in Continual Concept Bottleneck Models

Authors: Marconato, E.; Bontempo, G.; Teso, S.; Ficarra, E.; Calderara, S.; Passerini, A.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2022 Relazione in Atti di Convegno

Connected Components Labeling on Bitonal Images

Authors: Bolelli, Federico; Allegretti, Stefano; Grana, Costantino

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2022 Relazione in Atti di Convegno

Continual Learning in Real-Life Applications

Authors: Graffieti, G; Borghi, G; Maltoni, D

Published in: IEEE ROBOTICS AND AUTOMATION LETTERS

Y Existing Continual Learning benchmarks only partially address the complexity of real-life applications, limiting the realism of learning agents. In … (Read full abstract)

Y Existing Continual Learning benchmarks only partially address the complexity of real-life applications, limiting the realism of learning agents. In this letter, we propose and focus on benchmarks characterized by common key elements of real-life scenarios, including temporally ordered streams as input data, strong correlation of samples in short time ranges, high data distribution drift over the long time frame, and heavy class unbalancing. Moreover, we enforce online training constraints such as the need for frequent model updates without the possibility of storing a large amount of past data or passing the dataset multiple times through the model. Besides, we introduce a novel hybrid approach based on Continual Learning, whose architectural elements and replay memory management proved to be useful and effective in the considered scenarios. The experimental validation carried out, including comparisons with existing methods and an ablation study, confirms the validity and the suitability of the proposed approach.

2022 Articolo su rivista

Continual semi-supervised learning through contrastive interpolation consistency

Authors: Boschini, Matteo; Buzzega, Pietro; Bonicelli, Lorenzo; Porrello, Angelo; Calderara, Simone

Published in: PATTERN RECOGNITION LETTERS

Continual Learning (CL) investigates how to train Deep Networks on a stream of tasks without incurring forgetting. CL settings proposed … (Read full abstract)

Continual Learning (CL) investigates how to train Deep Networks on a stream of tasks without incurring forgetting. CL settings proposed in literature assume that every incoming example is paired with ground-truth annotations. However, this clashes with many real-world applications: gathering labeled data, which is in itself tedious and expensive, becomes infeasible when data flow as a stream. This work explores Continual Semi-Supervised Learning (CSSL): here, only a small fraction of labeled input examples are shown to the learner. We assess how current CL methods (e.g.: EWC, LwF, iCaRL, ER, GDumb, DER) perform in this novel and challenging scenario, where overfitting entangles forgetting. Subsequently, we design a novel CSSL method that exploits metric learning and consistency regularization to leverage unlabeled examples while learning. We show that our proposal exhibits higher resilience to diminishing supervision and, even more surprisingly, relying only on supervision suffices to outperform SOTA methods trained under full supervision.

2022 Articolo su rivista

Deep Segmentation of the Mandibular Canal: a New 3D Annotated Dataset of CBCT Volumes

Authors: Cipriano, Marco; Allegretti, Stefano; Bolelli, Federico; Di Bartolomeo, Mattia; Pollastri, Federico; Pellacani, Arrigo; Minafra, Paolo; Anesi, Alexandre; Grana, Costantino

Published in: IEEE ACCESS

Inferior Alveolar Nerve (IAN) canal detection has been the focus of multiple recent works in dentistry and maxillofacial imaging. Deep … (Read full abstract)

Inferior Alveolar Nerve (IAN) canal detection has been the focus of multiple recent works in dentistry and maxillofacial imaging. Deep learning-based techniques have reached interesting results in this research field, although the small size of 3D maxillofacial datasets has strongly limited the performance of these algorithms. Researchers have been forced to build their own private datasets, thus precluding any opportunity for reproducing results and fairly comparing proposals. This work describes a novel, large, and publicly available mandibular Cone Beam Computed Tomography (CBCT) dataset, with 2D and 3D manual annotations, provided by expert clinicians. Leveraging this dataset and employing deep learning techniques, we are able to improve the state of the art on the 3D mandibular canal segmentation. The source code which allows to exactly reproduce all the reported experiments is released as an open-source project, along with this article.

2022 Articolo su rivista

Page 26 of 106 • Total publications: 1059