Generative AI-Based Multimodal Sentiment Analysis of Low-Resource Languages

Authors

  • Mohammad Usman Zafar Vision, Linguistics and Machine Intelligence Research Lab, Multan, Pakistan Author
  • Hamid Ghous Australian Scientific and Engineering Solutions, Australia Author
  • Sana Jamshaid Vision, Linguistics and Machine Intelligence Research Lab, Multan, Pakistan Author
  • Mubasher H. Malik Vision, Linguistics and Machine Intelligence Research Lab, Multan, Pakistan Author

Keywords:

Machine learning, Deep learning, Punjabi, Saraiki, Generative AI

Abstract

Multimodal sentiment analysis (MSA), which integrates emotional computing approaches across textual, visual, and audio modalities, has become a crucial tool in understanding human behavior. The majority of MSA research, however, is on high-resource languages, which leaves a big gap in the investigation of attitudes among groups speaking low-resource languages like Saraiki and Punjabi. This study addresses a significant problem in the field of sentiment analysis for low-resource languages, which frequently lack enough annotated datasets and strong tools compared to high-resource languages. By proposing a Generative AI-driven framework that trades on fake data generation, transfer learning, and multimodal fusion techniques, the research offers a transformative approach to overcoming these restrictions. This study introduces a generative AI-based multimodal framework for sentiment analysis of low-resource languages, with a focus on Punjabi and Saraiki. Manually created two video datasets. Using these videos, both audio and representative frames were extracted. To address the dataset shortage, Generative Adversarial Networks (GANs) were applied to generate synthetic audio and frames, thereby improving data range and class balance. The process of feature extraction was done using Wav2Vec2 for audio and ResNet-50 for frames. The extracted embeddings were further categorized using both traditional machine learning methods and deep learning approaches. On the whole, the experimental findings indicated that deep learning classifiers performed better in comparison to traditional machine learning models in both Punjabi and Saraiki datasets. CNN and BiLSTM had the best F1-scores of 0.66 and 0.47, and the respective AUC of 0.83 and 0.67, which indicates the model is very good at multimodal sentiment analysis. The research emphasizes the efficiency of multimodal and generative approaches for sentiment analysis in underrepresented languages and suggests directions for future research.

Downloads

Published

2026-05-05