Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Fluent and Accurate Image Captioning with a Self-Trained Reward Model

Authors: Moratelli, Nicholas; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Fine-tuning image captioning models with hand-crafted rewards like the CIDEr metric has been a classical strategy for promoting caption quality … (Read full abstract)

Fine-tuning image captioning models with hand-crafted rewards like the CIDEr metric has been a classical strategy for promoting caption quality at the sequence level. This approach, however, is known to limit descriptiveness and semantic richness and tends to drive the model towards the style of ground-truth sentences, thus losing detail and specificity. On the contrary, recent attempts to employ image-text models like CLIP as reward have led to grammatically incorrect and repetitive captions. In this paper, we propose Self-Cap a captioning approach that relies on a learnable reward model based on self-generated negatives that can discriminate captions based on their consistency with the image. Specifically, our discriminator is a fine-tuned contrastive image-text model trained to promote caption correctness while avoiding the aberrations that typically happen when training with a CLIP-based reward. To this end, our discriminator directly incorporates negative samples from a frozen captioner, which significantly improves the quality and richness of the generated captions but also reduces the fine-tuning time in comparison to using the CIDEr score as the sole metric for optimization. Experimental results demonstrate the effectiveness of our training strategy on both standard and zero-shot image captioning datasets.

2024 Relazione in Atti di Convegno

FOSSIL: Free Open-Vocabulary Semantic Segmentation through Synthetic References Retrieval

Authors: Barsellotti, Luca; Amoroso, Roberto; Baraldi, Lorenzo; Cucchiara, Rita

2024 Relazione in Atti di Convegno

FRCSyn Challenge at WACV 2024:Face Recognition Challenge in the Era of Synthetic Data

Authors: Melzi, Pietro; Tolosana, Ruben; Vera-Rodriguez, Ruben; Kim, Minchul; Rathgeb, Christian; Liu, Xiaoming; DeAndres-Tame, Ivan; Morales, Aythami; Fierrez, Julian; Ortega-Garcia, Javier; Zhao, Weisong; Zhu, Xiangyu; Yan, Zheyu; Zhang, Xiao-Yu; Wu, Jinlin; Lei, Zhen; Tripathi, Suvidha; Kothari, Mahak; Haider Zama, Md; Deb, Debayan; Biesseck, Bernardo; Vidal, Pedro; Granada, Roger; Fickel, Guilherme; Führ, Gustavo; Menotti, David; Unnervik, Alexander; George, Anjith; Ecabert, Christophe; Otroshi Shahreza, Hatef; Rahimi, Parsa; Marcel, Sébastien; Sarridis, Ioannis; Koutlis, Christos; Baltsou, Georgia; Papadopoulos, Symeon; Diou, Christos; Di Domenico, Nicolò; Borghi, Guido; Pellegrini, Lorenzo; Mas-Candela, Enrique; Sánchez-Pérez, Ángela; Atzori, Andrea; Boutros, Fadi; Damer, Naser; Fenu, Gianni; Marras, Mirko

Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are … (Read full abstract)

Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are still several challenges that must be covered in more detail. This paper offers an overview of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first international challenge aiming to explore the use of synthetic data in face recognition to address existing limitations in the technology. Specifically, the FRCSyn Challenge targets concerns related to data privacy issues, demographic biases, generalization to unseen scenarios, and performance limitations in challenging scenarios, including significant age disparities between enrollment and testing, pose variations, and occlusions. The results achieved in the FRCSyn Challenge, together with the proposed benchmark, contribute significantly to the application of synthetic data to improve face recognition technology.

2024 Relazione in Atti di Convegno

FRCSyn-onGoing: Benchmarking and comprehensive evaluation of real and synthetic data to improve face recognition systems

Authors: Melzi, Pietro; Tolosana, Ruben; Vera-Rodriguez, Ruben; Kim, Minchul; Rathgeb, Christian; Liu, Xiaoming; DeAndres-Tame, Ivan; Morales, Aythami; Fierrez, Julian; Ortega-Garcia, Javier; Zhao, Weisong; Zhu, Xiangyu; Yan, Zheyu; Zhang, Xiao-Yu; Wu, Jinlin; Lei, Zhen; Tripathi, Suvidha; Kothari, Mahak; Zama, Md Haider; Deb, Debayan; Biesseck, Bernardo; Vidal, Pedro; Granada, Roger; Fickel, Guilherme; Führ, Gustavo; Menotti, David; Unnervik, Alexander; George, Anjith; Ecabert, Christophe; Shahreza, Hatef Otroshi; Rahimi, Parsa; Marcel, Sébastien; Sarridis, Ioannis; Koutlis, Christos; Baltsou, Georgia; Papadopoulos, Symeon; Diou, Christos; Di Domenico, Nicolò; Borghi, Guido; Pellegrini, Lorenzo; Mas-Candela, Enrique; Sánchez-Pérez, Ángela; Atzori, Andrea; Boutros, Fadi; Damer, Naser; Fenu, Gianni; Marras, Mirko

Published in: INFORMATION FUSION

This article presents FRCSyn-onGoing, an ongoing challenge for face recognition where researchers can easily benchmark their systems against the state … (Read full abstract)

This article presents FRCSyn-onGoing, an ongoing challenge for face recognition where researchers can easily benchmark their systems against the state of the art in an open common platform using large-scale public databases and standard experimental protocols. FRCSyn-onGoing is based on the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first face recognition international challenge aiming to explore the use of real and synthetic data independently, and also their fusion, in order to address existing limitations in the technology. Specifically, FRCSyn-onGoing targets concerns related to data privacy issues, demographic biases, generalization to unseen scenarios, and performance limitations in challenging scenarios, including significant age disparities between enrollment and testing, pose variations, and occlusions. To enhance face recognition performance, FRCSyn-onGoing strongly advocates for information fusion at various levels, starting from the input data, where a mix of real and synthetic domains is proposed for specific tasks of the challenge. Additionally, participating teams are allowed to fuse diverse networks within their proposed systems to improve the performance. In this article, we provide a comprehensive evaluation of the face recognition systems and results achieved so far in FRCSyn-onGoing. The results obtained in FRCSyn-onGoing, together with the proposed public ongoing benchmark, contribute significantly to the application of synthetic data to improve face recognition technology.

2024 Articolo su rivista

From One to Many Lorikeets: Discovering Image Analogies in the CLIP Space

Authors: Xing, S.; Peruzzo, E.; Sangineto, E.; Sebe, N.

2024 Relazione in Atti di Convegno

From One to Many Lorikeets: Discovering Image Analogies in the CLIP Space

Authors: Xing, S.; Peruzzo, E.; Sangineto, E.; Sebe, N.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Drawing analogies between two pairs of entities in the form of A:B::C:D (i.e. A is to B as C is to … (Read full abstract)

Drawing analogies between two pairs of entities in the form of A:B::C:D (i.e. A is to B as C is to D) is a hallmark of human intelligence, as evidenced by sufficient findings in cognitive science for the last decades. In recent years, this property has been found far beyond cognitive science. Notable examples are word2vec and GloVe models in natural language processing. Recent research in computer vision also found the property of analogies in the feature space of a pretrained ConvNet feature extractor. However, analogy mining in the semantic space of recent strong foundation models such as CLIP is still understudied, despite the fact that they have been successfully applied to a wide range of downstream tasks. In this work, we show that CLIP possesses the similar ability of analogical reasoning in the latent space, and propose a novel strategy to extract analogies between pairs of images in the CLIP space. We compute all the difference vectors of a pair of any two images that belong to the same class in the CLIP space, and employ k-means clustering to group the difference vectors into clusters irrespective of their classes. This procedure results in cluster centroids representative of class-agnostic semantic analogies between images. Through extensive analysis, we show that the property of drawing analogies between images also exists in the CLIP space, which are interpretable by humans through a visualisation of the learned clusters.

2024 Relazione in Atti di Convegno

Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets

Authors: Cornia, Marcella; Baraldi, Lorenzo; Fiameni, Giuseppe; Cucchiara, Rita

Published in: INTERNATIONAL JOURNAL OF COMPUTER VISION

This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources, containing both … (Read full abstract)

This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources, containing both human-annotated and web-collected captions. Large-scale datasets with noisy image-text pairs, indeed, provide a sub-optimal source of supervision because of their low-quality descriptive style, while human-annotated datasets are cleaner but smaller in scale. To get the best of both worlds, we propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component. The proposed model avoids the need of object detectors, is trained with a single objective of prompt language modeling, and can replicate the style of human-collected captions while training on sources with different input styles. Experimentally, the model shows a strong capability of recognizing real-world concepts and producing high-quality captions. Extensive experiments are performed on different image captioning datasets, including CC3M, nocaps, and the competitive COCO dataset, where our model consistently outperforms baselines and state-of-the-art approaches.

2024 Articolo su rivista

High-level Biomedical Data Integration in a Semantic Knowledge Graph with OncodashKB for finding Personalized Actionable Drugs in Ovarian Cancer

Authors: Dreo, Johann; Lobentanzer, Sebastian; Gaydukova, Ekaterina; Baric, Marko; Maarala, Ilari; Muranen, Taru; Oikkonen, Jaana; Bolelli, Federico; Pipoli, Vittorio; Isoviita, Veli-Matti; Hynninen, Johanna; Schwikowski, Benno

Background: The growing amount of biomedical knowledge about cancer in combination with genome-scale patient profiling data offers unprecedented opportunities for … (Read full abstract)

Background: The growing amount of biomedical knowledge about cancer in combination with genome-scale patient profiling data offers unprecedented opportunities for personalized oncology. However, the large amounts of knowledge and data require scalable approaches to providing actionable information to support clinicians in decision-making [1]. Objective: To develop software and methods that integrate all relevant clinical and genomic data about patients and that enable the discovery of optimal personalized treatment options, together with the supporting literature knowledge and data. Methods: We exploit a Semantic Knowledge Graph (SKG), a type of database that represents medical data in the form of objects and relationships, linking previously unconnected information across several cancer databases. To build up this SKG (OncodashKB), we use the BioCypher library [2]. We then integrate clinical data from patients with high-grade serous ovarian cancer, including information on genome changes collected as part of the DECIDER project (http://deciderproject.eu). The SKG can then be queried to gather evidence paths linking patient-specific alterations to actionable drugs. Results: Our approach provides a fully automated, systematic, and reproducible data integration workflow, along with the use of existing expert-made ontologies to provide interoperability and semantic descriptions. The integrated data is assessed by experts on molecular tumor boards and allows for the exploration of relevant clinical and genomic patient data in a visually accessible format, designed for ease of interpretation by clinicians. Importantly, we expect the system to reveal unexpected evidence paths between patient sequencing data and optimal treatment options based on biomedical knowledge described in the literature and confirmed by high-level evidence. Conclusion: Decision support systems using graph databases emerge as valuable tools by revealing new connections between various patient data and treatment options shown in an easy-to-understand format. References: [1] Reisle, C., Williamson, L.M., Pleasance, E. et al. A platform for oncogenomic reporting and interpretation. Nat Commun 13, 756 (2022). https://doi.org/10.1038/s41467-022-28348-y [2] Lobentanzer, S., Aloy, P., Baumbach, J. et al. Democratizing knowledge representation with BioCypher. Nat Biotechnol 41, 1056–1059 (2023). https://doi.org/10.1038/s41587-023-01848-y.

2024 Abstract in Atti di Convegno

Identifying Impurities in Liquids of Pharmaceutical Vials

Authors: Rosati, Gabriele; Marchesini, Kevin; Lumetti, Luca; Sartori, Federica; Balboni, Beatrice; Begarani, Filippo; Vescovi, Luca; Bolelli, Federico; Grana, Costantino

The presence of visible particles in pharmaceutical products is a critical quality issue that demands strict monitoring. Recently, Convolutional Neural … (Read full abstract)

The presence of visible particles in pharmaceutical products is a critical quality issue that demands strict monitoring. Recently, Convolutional Neural Networks (CNNs) have been widely used in industrial settings to detect defects, but there remains a gap in the literature concerning the detection of particles floating in liquid substances, mainly due to the lack of publicly available datasets. In this study, we focus on the detection of foreign particles in pharmaceutical liquid vials, leveraging two state-of-the-art deep-learning approaches adapted to our specific multiclass problem. The first methodology employs a standard ResNet-18 architecture, while the second exploits a Multi-Instance Learning (MIL) technique to efficiently deal with multiple images (sequences) of the same sample. To address the issue of no data availability, we devised and partially released an annotated dataset consisting of sequences containing 19 images for each sample, captured from rotating vials, both with and without impurities. The dataset comprises 2,426 sequences for a total of 46,094 images labeled at the sequence level and including five distinct classes. The proposed methodologies, trained on this new extensive dataset, represent advancements in the field, offering promising strategies to improve the safety and quality control of pharmaceutical products and setting a benchmark for future comparisons.

2024 Relazione in Atti di Convegno

Integrated microRNA and proteome analysis of cancer datasets with MoPC

Authors: Lovino, M.; Ficarra, E.; Martignetti, L.

Published in: PLOS ONE

MicroRNAs (miRNAs) are small molecules that play an essential role in regulating gene expression by post-transcriptional gene silencing. Their study … (Read full abstract)

MicroRNAs (miRNAs) are small molecules that play an essential role in regulating gene expression by post-transcriptional gene silencing. Their study is crucial in revealing the fundamental processes underlying pathologies and, in particular, cancer. To date, most studies on miRNA regulation consider the effect of specific miRNAs on specific target mRNAs, providing wet-lab validation. However, few tools have been developed to explain the miRNAmediated regulation at the protein level. In this paper, the MoPC computational tool is presented, that relies on the partial correlation between mRNAs and proteins conditioned on the miRNA expression to predict miRNA-target interactions in multi-omic datasets. MoPC returns the list of significant miRNA-target interactions and plot the significant correlations on the heatmap in which the miRNAs and targets are ordered by the chromosomal location. The software was applied on three TCGA/CPTAC datasets (breast, glioblastoma, and lung cancer), returning enriched results in three independent targets databases.

2024 Articolo su rivista

Page 15 of 106 • Total publications: 1059