Publications - AImageLab

Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets

Authors: Cornia, Marcella; Baraldi, Lorenzo; Fiameni, Giuseppe; Cucchiara, Rita

Published in: INTERNATIONAL JOURNAL OF COMPUTER VISION

This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources, containing both … (Read full abstract)

This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources, containing both human-annotated and web-collected captions. Large-scale datasets with noisy image-text pairs, indeed, provide a sub-optimal source of supervision because of their low-quality descriptive style, while human-annotated datasets are cleaner but smaller in scale. To get the best of both worlds, we propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component. The proposed model avoids the need of object detectors, is trained with a single objective of prompt language modeling, and can replicate the style of human-collected captions while training on sources with different input styles. Experimentally, the model shows a strong capability of recognizing real-world concepts and producing high-quality captions. Extensive experiments are performed on different image captioning datasets, including CC3M, nocaps, and the competitive COCO dataset, where our model consistently outperforms baselines and state-of-the-art approaches.

2024 Articolo su rivista

DOI IRIS

High-level Biomedical Data Integration in a Semantic Knowledge Graph with OncodashKB for finding Personalized Actionable Drugs in Ovarian Cancer

Authors: Dreo, Johann; Lobentanzer, Sebastian; Gaydukova, Ekaterina; Baric, Marko; Maarala, Ilari; Muranen, Taru; Oikkonen, Jaana; Bolelli, Federico; Pipoli, Vittorio; Isoviita, Veli-Matti; Hynninen, Johanna; Schwikowski, Benno

Background: The growing amount of biomedical knowledge about cancer in combination with genome-scale patient profiling data offers unprecedented opportunities for … (Read full abstract)

Background: The growing amount of biomedical knowledge about cancer in combination with genome-scale patient profiling data offers unprecedented opportunities for personalized oncology. However, the large amounts of knowledge and data require scalable approaches to providing actionable information to support clinicians in decision-making [1]. Objective: To develop software and methods that integrate all relevant clinical and genomic data about patients and that enable the discovery of optimal personalized treatment options, together with the supporting literature knowledge and data. Methods: We exploit a Semantic Knowledge Graph (SKG), a type of database that represents medical data in the form of objects and relationships, linking previously unconnected information across several cancer databases. To build up this SKG (OncodashKB), we use the BioCypher library [2]. We then integrate clinical data from patients with high-grade serous ovarian cancer, including information on genome changes collected as part of the DECIDER project (http://deciderproject.eu). The SKG can then be queried to gather evidence paths linking patient-specific alterations to actionable drugs. Results: Our approach provides a fully automated, systematic, and reproducible data integration workflow, along with the use of existing expert-made ontologies to provide interoperability and semantic descriptions. The integrated data is assessed by experts on molecular tumor boards and allows for the exploration of relevant clinical and genomic patient data in a visually accessible format, designed for ease of interpretation by clinicians. Importantly, we expect the system to reveal unexpected evidence paths between patient sequencing data and optimal treatment options based on biomedical knowledge described in the literature and confirmed by high-level evidence. Conclusion: Decision support systems using graph databases emerge as valuable tools by revealing new connections between various patient data and treatment options shown in an easy-to-understand format. References: [1] Reisle, C., Williamson, L.M., Pleasance, E. et al. A platform for oncogenomic reporting and interpretation. Nat Commun 13, 756 (2022). https://doi.org/10.1038/s41467-022-28348-y [2] Lobentanzer, S., Aloy, P., Baumbach, J. et al. Democratizing knowledge representation with BioCypher. Nat Biotechnol 41, 1056–1059 (2023). https://doi.org/10.1038/s41587-023-01848-y.

2024 Abstract in Atti di Convegno

IRIS

Identifying Impurities in Liquids of Pharmaceutical Vials

Authors: Rosati, Gabriele; Marchesini, Kevin; Lumetti, Luca; Sartori, Federica; Balboni, Beatrice; Begarani, Filippo; Vescovi, Luca; Bolelli, Federico; Grana, Costantino

The presence of visible particles in pharmaceutical products is a critical quality issue that demands strict monitoring. Recently, Convolutional Neural … (Read full abstract)

The presence of visible particles in pharmaceutical products is a critical quality issue that demands strict monitoring. Recently, Convolutional Neural Networks (CNNs) have been widely used in industrial settings to detect defects, but there remains a gap in the literature concerning the detection of particles floating in liquid substances, mainly due to the lack of publicly available datasets. In this study, we focus on the detection of foreign particles in pharmaceutical liquid vials, leveraging two state-of-the-art deep-learning approaches adapted to our specific multiclass problem. The first methodology employs a standard ResNet-18 architecture, while the second exploits a Multi-Instance Learning (MIL) technique to efficiently deal with multiple images (sequences) of the same sample. To address the issue of no data availability, we devised and partially released an annotated dataset consisting of sequences containing 19 images for each sample, captured from rotating vials, both with and without impurities. The dataset comprises 2,426 sequences for a total of 46,094 images labeled at the sequence level and including five distinct classes. The proposed methodologies, trained on this new extensive dataset, represent advancements in the field, offering promising strategies to improve the safety and quality control of pharmaceutical products and setting a benchmark for future comparisons.

2024 Relazione in Atti di Convegno

IRIS

Integrated microRNA and proteome analysis of cancer datasets with MoPC

Authors: Lovino, M.; Ficarra, E.; Martignetti, L.

Published in: PLOS ONE

MicroRNAs (miRNAs) are small molecules that play an essential role in regulating gene expression by post-transcriptional gene silencing. Their study … (Read full abstract)

MicroRNAs (miRNAs) are small molecules that play an essential role in regulating gene expression by post-transcriptional gene silencing. Their study is crucial in revealing the fundamental processes underlying pathologies and, in particular, cancer. To date, most studies on miRNA regulation consider the effect of specific miRNAs on specific target mRNAs, providing wet-lab validation. However, few tools have been developed to explain the miRNAmediated regulation at the protein level. In this paper, the MoPC computational tool is presented, that relies on the partial correlation between mRNAs and proteins conditioned on the miRNA expression to predict miRNA-target interactions in multi-omic datasets. MoPC returns the list of significant miRNA-target interactions and plot the significant correlations on the heatmap in which the miRNAs and targets are ordered by the chromosomal location. The software was applied on three TCGA/CPTAC datasets (breast, glioblastoma, and lung cancer), returning enriched results in three independent targets databases.

2024 Articolo su rivista

DOI IRIS

Intelligent Multimodal Artificial Agents that Talk and Express Emotions

Authors: Rawal, Niyati; Maharjan, Rahul Singh; Romeo, Marta; Bigazzi, Roberto; Baraldi, Lorenzo; Cucchiara, Rita; Cangelosi, Angelo

2024 Relazione in Atti di Convegno

IRIS

Is Multiple Object Tracking a Matter of Specialization?

Authors: Mancusi, Gianluca; Bernardi, Mattia; Panariello, Aniello; Porrello, Angelo; Cucchiara, Rita; Calderara, Simone

Published in: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS

End-to-end transformer-based trackers have achieved remarkable performance on most human-related datasets. However, training these trackers in heterogeneous scenarios poses significant … (Read full abstract)

End-to-end transformer-based trackers have achieved remarkable performance on most human-related datasets. However, training these trackers in heterogeneous scenarios poses significant challenges, including negative interference - where the model learns conflicting scene-specific parameters - and limited domain generalization, which often necessitates expensive fine-tuning to adapt the models to new domains. In response to these challenges, we introduce Parameter-efficient Scenario-specific Tracking Architecture (PASTA), a novel framework that combines Parameter-Efficient Fine-Tuning (PEFT) and Modular Deep Learning (MDL). Specifically, we define key scenario attributes (e.g, camera-viewpoint, lighting condition) and train specialized PEFT modules for each attribute. These expert modules are combined in parameter space, enabling systematic generalization to new domains without increasing inference time. Extensive experiments on MOTSynth, along with zero-shot evaluations on MOT17 and PersonPath22 demonstrate that a neural tracker built from carefully selected modules surpasses its monolithic counterpart. We release models and code.

2024 Relazione in Atti di Convegno

IRIS

KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction

Authors: Di Nucci, Davide; Simoni, Alessandro; Tomei, Matteo; Ciuffreda, Luca; Vezzani, Roberto; Cucchiara, Rita

2024 Relazione in Atti di Convegno

IRIS

Large-Scale Transformer models for Transactional Data

Authors: Garuti, F.; Luetto, S.; Sangineto, E.; Cucchiara, R.

Published in: CEUR WORKSHOP PROCEEDINGS

Following the spread of digital channels for everyday activities and electronic payments, huge collections of online transactions are available from … (Read full abstract)

Following the spread of digital channels for everyday activities and electronic payments, huge collections of online transactions are available from financial institutions. These transactions are usually organized as time series, i.e., a time-dependent sequence of tabular data, where each element of the series is a collection of heterogeneous fields (e.g., dates, amounts, categories, etc.). Transactions are usually evaluated by automated or semi-automated procedures to address financial tasks and gain insights into customers’ behavior. In the last years, many Trees-based Machine Learning methods (e.g., RandomForest, XGBoost) have been proposed for financial tasks, but they do not fully exploit in an end-to-end pipeline all the information richness of individual transactions, neither they fully model the underling temporal patterns. Instead, Deep Learning approaches have proven to be very effective in modeling complex data by representing them in a semantic latent space. In this paper, inspired by the multi-modal Deep Learning approaches used in Computer Vision and NLP, we propose UniTTab, an end-to-end Deep Learning Transformer model for transactional time series which can uniformly represent heterogeneous time-dependent data in a single embedding. Given the availability of large sets of tabular transactions, UniTTab defines a pre-training self-supervised phase to learn useful representations which can be employed to solve financial tasks such as churn prediction and loan default prediction. A strength of UniTTab is its flexibility since it can be adopted to represent time series of arbitrary length and composed of different data types in the fields. The flexibility of our model in solving different types of tasks (e.g., detection, classification, regression) and the possibility of varying the length of the input time series, from a few to hundreds of transactions, makes UniTTab a general-purpose Transformer architecture for bank transactions.

2024 Relazione in Atti di Convegno

IRIS

Latent spectral regularization for continual learning

Authors: Frascaroli, Emanuele; Benaglia, Riccardo; Boschini, Matteo; Moschella, Luca; Fiorini, Cosimo; Rodolà, Emanuele; Calderara, Simone

Published in: PATTERN RECOGNITION LETTERS

While biological intelligence grows organically as new knowledge is gathered throughout life, Artificial Neural Networks forget catastrophically whenever they face … (Read full abstract)

While biological intelligence grows organically as new knowledge is gathered throughout life, Artificial Neural Networks forget catastrophically whenever they face a changing training data distribution. Rehearsal-based Continual Learning (CL) approaches have been established as a versatile and reliable solution to overcome this limitation; however, sudden input disruptions and memory constraints are known to alter the consistency of their predictions. We study this phenomenon by investigating the geometric characteristics of the learner’s latent space and find that replayed data points of different classes increasingly mix up, interfering with classification. Hence, we propose a geometric regularizer that enforces weak requirements on the Laplacian spectrum of the latent space, promoting a partitioning behavior. Our proposal, called Continual Spectral Regularizer for Incremental Learning (CaSpeR-IL), can be easily combined with any rehearsal-based CL approach and improves the performance of SOTA methods on standard benchmarks.

2024 Articolo su rivista

DOI IRIS

Mapping High-level Semantic Regions in Indoor Environments without Object Recognition

Authors: Bigazzi, Roberto; Baraldi, Lorenzo; Kousik, Shreyas; Cucchiara, Rita; Pavone, Marco

Published in: IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION

2024 Relazione in Atti di Convegno

DOI IRIS