Publications
Explore our research publications: papers, articles, and conference proceedings from AImageLab.
Tip: type @ to pick an author and # to pick a keyword.
Decoding Facial Expressions in Video: A Multiple Instance Learning Perspective on Action Units
Authors: Del Gaudio, Livia; Cuculo, Vittorio; Cucchiara, Rita
Facial expression recognition (FER) in video sequences is a longstanding challenge in affective computing and computer vision, particularly due to … (Read full abstract)
Facial expression recognition (FER) in video sequences is a longstanding challenge in affective computing and computer vision, particularly due to the temporal complexity and subtlety of emotional expressions. In this paper, we propose a novel pipeline that leverages facial Action Units (AUs) as structured time series descriptors of facial muscle activity, enabling emotion classification in videos through a Multiple Instance Learning (MIL) framework. Our approach models each video as a bag of AU-based instances, capturing localized temporal patterns, and allows for robust learning even when only coarse video-level emotion labels are available. Crucially, the approach incorporates interpretability mechanisms that highlight the temporal segments most influential to the final prediction, providing informed decision-making and facilitating downstream analysis. Experimental results on benchmark FER video datasets demonstrate that our method achieves competitive performance using only visual data, without requiring multimodal signals or frame-level supervision. This highlights its potential as an interpretable and efficient solution for weakly supervised emotion recognition in real-world scenarios.
Deep Learning for Classifying Anti-Shigella Opsono- Phagocytosis-Promoting Monoclonal Antibodies
Authors: Pianfetti, Elena; Cardamone, Dario; Roscioli, Emanuele; Ciano, Giorgio; Maccari, Giuseppe; Sala, Claudia; Micoli, Francesca; Rappuoli, Rino; Medini, Duccio; Ficarra, Elisa
Published in: LECTURE NOTES IN COMPUTER SCIENCE
Shigellosis is an acute small intestine infection caused by different species of Shigella. Worldwide, the emergence of antibiotic-resistant strains aggravates … (Read full abstract)
Shigellosis is an acute small intestine infection caused by different species of Shigella. Worldwide, the emergence of antibiotic-resistant strains aggravates the impact of Shigella infections. In this context, human monoclonal antibodies (mAbs) offer an alternative to traditional antimicrobials. However, identifying a potent candidate mAb requires intense and meticulous efforts. Here, we show the potential of Deep Learning to screen mAbs rapidly. We measured the phagocytosis-promoting activity of mAbs by analyzing images collected with a high-throughput and high-content confocal fluorescence microscope. We acquired images of S. sonnei and S. flexneri infecting THP-1-derived macrophages and evaluated the effect of different mAbs and of a wide selection of Deep Learning tools. We found that our model can generalize on strains and mAbs not encountered in training. Importantly, our approach enables the screening and characterization of multiple anti-Shigella mAbs at the same time, facilitating the identification of potent antibacterial candidates. Our code is available on the GitHub repository vOPA_Shigella.
Depth-Based Privileged Information for Boosting 3D Human Pose Estimation on RGB
Authors: Simoni, A.; Marchetti, F.; Borghi, G.; Becattini, F.; Davoli, D.; Garattoni, L.; Francesca, G.; Seidenari, L.; Vezzani, R.
Published in: LECTURE NOTES IN COMPUTER SCIENCE
Diffusion Transformers for Tabular Data Time Series Generation
Authors: Garuti, Fabrizio; Sangineto, Enver; Luetto, Simone; Forni, Lorenzo; Cucchiara, Rita
DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection
Authors: Cappellino, Chiara; Mancusi, Gianluca; Mosconi, Matteo; Porrello, Angelo; Calderara, Simone; Cucchiara, Rita
DualPose: Dual-Block Transformer Decoder with Contrastive Denoising for Multi-Person Pose Estimation
Authors: Fincato, M.; Vezzani, R.
Published in: SENSORS
Multi-person pose estimation is the task of detecting and regressing the keypoint coordinates of multiple people in a single image. … (Read full abstract)
Multi-person pose estimation is the task of detecting and regressing the keypoint coordinates of multiple people in a single image. Significant progress has been achieved in recent years, especially with the introduction of transformer-based end-to-end methods. In this paper, we present DualPose, a novel framework that enhances multi-person pose estimation by leveraging a dual-block transformer decoding architecture. Class prediction and keypoint estimation are split into parallel blocks so each sub-task can be separately improved and the risk of interference is reduced. This architecture improves the precision of keypoint localization and the model's capacity to accurately classify individuals. To improve model performance, the Keypoint-Block uses parallel processing of self-attentions, providing a novel strategy that improves keypoint localization accuracy and precision. Additionally, DualPose incorporates a contrastive denoising (CDN) mechanism, leveraging positive and negative samples to stabilize training and improve robustness. Thanks to CDN, a variety of training samples are created by introducing controlled noise into the ground truth, improving the model's ability to discern between valid and incorrect keypoints. DualPose achieves state-of-the-art results outperforming recent end-to-end methods, as shown by extensive experiments on the MS COCO and CrowdPose datasets. The code and pretrained models are publicly available.
ECoGNet: an EEG-based Effective Connectivity Graph Neural Network for Brain Disorder Detection
Authors: Burger, Jacopo; Cuculo, Vittorio; D'Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella
Alzheimer’s Disease (AD) and Frontotemporal Dementia (FTD), among the most prevalent neurodegenerative disorders, disrupt brain activity and connectivity, highlighting the … (Read full abstract)
Alzheimer’s Disease (AD) and Frontotemporal Dementia (FTD), among the most prevalent neurodegenerative disorders, disrupt brain activity and connectivity, highlighting the need for tools that can effectively capture these alterations. Effective Connectivity Networks (ECNs), which model causal interactions between brain regions, offer a promising approach to characterizing AD and FTD related neural changes. In this study, we estimate ECNs from EEG traces using a state-of-the-art causal discovery method specifically designed for time-series data, to recover the causal structure of the interactions between brain areas. The recovered ECNs are integrated into a novel Graph Neural Network architecture (ECoGNet), where nodes represent brain regions and edge features encode causal relationships. Our method combines ECNs with features summarizing local brain dynamics to improve AD and FTD detection. Evaluated on a publicly available EEG dataset, the proposed approach demonstrates superior performance compared to models that either use non-causal connectivity networks or omit connectivity information entirely.
Empowering the Operator: Fault Diagnosis and Identification in an Industrial Environment Through a User-Friendly IoT Architecture
Authors: Bertoli, Annalisa; Fantuzzi, Cesare
Published in: COMPUTERS
In recent years, the increasing complexity of production systems driven by technological development has created new opportunities in the industrial … (Read full abstract)
In recent years, the increasing complexity of production systems driven by technological development has created new opportunities in the industrial world but has also brought challenges in the practical use of these systems by operators. One of the biggest changes is data existence and its accessibility. This work proposes an IoT architecture specifically designed for real-world industrial environments. The goal is to present a system that can be effectively implemented to monitor operations and production processes in real time. This solution improves fault detection and identification, giving the operators the critical information needed to make informed decisions. The IoT architecture is implemented in two different industrial applications, demonstrating the flexibility of the architecture across various industrial contexts. It highlights how the system is monitored to reduce downtime when a fault occurs, making clear the loss in performance and the fault that causes this loss. Additionally, this approach supports human operators in a deeper understanding of their working environment, enabling them to make decisions based on real-time data.