nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2025, 06, v.41 47-57
基于EMLLaMA的情感识别与推理
基金项目(Foundation): 情感计算与先进智能机器安徽省重点实验室(合肥工业大学); 中央高校基本科研业务费专项资金资助(PA2023GDSK0061); 重点实验室自主创新专项“面向教育场景的多模态信息情感识别的研究与实现”
邮箱(Email):
DOI:
摘要:

提出一种基于扩展数据训练的EMLLaMA的情感分析大模型架构,有效探讨了当前多模态情感识别分析中所面临的数据稀缺的问题,并在提出解决方案后设计了对应的EMLLaMA训练框架。实验结果显示,该模型在对比主流的大模型在情感分析领域中取得了最优结果,尤其是在推理过程中能克服模型幻觉实现精准的情绪推理,为后续的多模态情感分析提供了全新的基准数据,同时证明了该模型在情感分析中的有效性。

Abstract:

LLaMA plays a significant role in the field of large-scale models and is one of the most popular foundation models in the era of AGI. However, it heavily relies on community-driven manual fine-tuning or retrieval-augmented generation(RAG) to supplement new data, and its mathematical and logical reasoning capabilities are relatively weak. These shortcomings limit its application in specialized fields. Additionally, in the field of sentiment analysis, most existing methods rely on traditional unimodal recognition approaches that do not incorporate sentiment reasoning, similar to the limitations faced by previous large language models in handling sentiment. Some intuitive MLLaMa sentiment frameworks have been proposed specifically for sentiment analysis, but their analytical capabilities remain limited due to constraints in data sources. To address these issues, this paper proposes a sentiment analysis large-scale model architecture for MLLaMa based on extended data training, which effectively tackles the data scarcity challenges prevalent in current multimodal sentiment recognition analysis.Following the proposed solution, a corresponding MMLaMa training framework has been designed. Experimental results show that this model achieves peak performance in sentiment analysis compared to mainstream large models,particularly excelling in overcoming model hallucinations during the inference process to deliver precise emotional reasoning. This provides a novel benchmark dataset for future multimodal sentiment analysis and meanwhile demonstrates the model's effectiveness in sentiment analysis.

参考文献

[1]安俊秀,田茂云.基于多尺度特征注意力融合的语音情感识别[J/OL].微电子学与计算机,1-9[2025-05-10].

[2]Liu Qi, Huang Zhenya, Yin Yu, Chen Enhong, Xiong Hui, Su Yu, Hu Guoping. EKT:Exercise-aware knowledge tracing for student performance prediction[J]. IEEE Transactions on Knowledge and Data Engineering,2023,35(08):1234-1245.

[3]宋佳磊,左兴权,张修建,等.大语言模型评估方法综述[J].宇航计测技术,2025,45(02):1-30.

[4]耿霞,汪尧.基于CLIP增强细粒度特征的换装行人重识别方法[J].计算机工程,2025,51(04):293-302.

[5]Li J, Li D, Savarese S, Zhao P. BLIP-2:Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models[J/OL]. ar Xiv preprint arXiv:2301.12597, 2023.

[6]Girdhar R, Gu J, Darrell T, et al. ImageBind:One Embedding Space to Bind Them All[J]. arXiv preprint arXiv:2305.05665, 2023.

[7]高玮军,赵晓萱.改进MobileNet与融合低阶多通道的模因情感分类[J/OL].计算机技术与发展,1-10[2025-05-10].

[8]Chung H W, Hou L, Longpre S, Zoph B, Tay Y, Fedus W, et al. Scaling Instruction-Finetuned Language Models[J]. arXiv preprint arXiv:2210.11416, 2022.

[9]Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, et al. Training Language Models to Follow Instructions with Human Feedback[J]. Advances in Neural Information Processing Systems, 2022.

[10]Iyer S, Lin X V, Pasunuru R, Mihaylov T, Simig D, Yu P, et al. OPT-IML:Scaling Language Model Instruction Meta Learning through the Lens of Generalization[J]. arXiv preprint arXiv:2212.12017, 2022.

[11]Jieru Y ,Xueran L ,Qiang X , et al.LLaVA-Endo:a large language-and-vision assistant for gastrointestinal endoscopy[J].Frontiers of Computer Science,2025,19(04):

[12]Xie H, Peng C J, Tseng Y W, Chen H J, Hsu C F, Shuai H H, Cheng W H. EmoVIT:Revolutionizing Emotion Insights with Visual Instruction Tuning[J]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2024:26586-26595.

[13]Yang J, Huang Q, Ding T, Lischinski D, Cohen-Or D, Huang H. EmoSet:A Large-scale Visual Emotion Dataset with Rich Attributes[J]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023:20383-20394.

[14]Jiang X, Zong Y, Zheng W, Tang C, Xia W, Lu C, Liu J. DFEW:A Large-Scale Database for Recognizing Dynamic Facial Expressions in the Wild[J]. Proceedings of the 28th ACM International Conference on Multimedia, 2020:2881-2889.

[15]Lian Z, Sun H, Sun L, Chen K, Xu M, Wang K, Xu K, He Y, Li Y, Zhao J, et al. MER 2023:Multi-label Learning, Modality Robustness, and Semi-supervised Learning[J]. Proceedings of the 31st ACM International Conference on Multimedia, 2023:9610-9614.

[16]刘双巧,周璐,李彩艳,等.基于SentencePiece的中医学分词模型建模研究[J].世界中医药,2021,16(06):981-985,990.

[17]Li KunChang, He Yinan, Wang Yi, et al. VideoChat:Chat-centric video understanding[J/OL]. arXiv, 2023. arXiv:2305.06355.

[18]Maaz M, Rasheed H, Khan S, Khan FS. Video-ChatGPT:Towards Detailed Video Understanding via Large Vision and Language Models[J]. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics(ACL), 2024.

[19]Su Y, Lan T, Li H, Xu J, Wang Y, Cai D. PandaGPT:One Model To Instruction-Follow Them All[J]. Proceedings of TLLM 2023,2023.

[20]Su Yixuan, Lan Tian, Li Huayang, Xu Jialu, Wang Yan, Cai Deng. PandaGPT:One model to instruction-follow them all[J/OL]. arXiv, 2023. arXiv:2305.16355.

[21]Li KunChang, He Yinan, Wang Yi, Li Yizhuo, Wang Wenhai, Luo Ping, Wang Yali, Wang Limin, Qiao Yu. VideoChat:Chat-centric video understanding[J]. arXiv, 2023, 2305(06355):1-10.

[22]Luo Ruipu, Zhao Ziwang, Yang Min, Dong Junwei, Qiu Minghui, Lu Pengcheng, Wang Tao, Wei Zhongyu. VALLEY:Video assistant with large language model enhanced ability[J/OL]. arXiv, 2023. arXiv:2306.07207.

[23]Bin Lin, Bin Zhu, Yang Ye, Munan Ning, Peng Jin, and Li Yuan. Video-LLaVA:Learning united visual representation by alignment before projection[J/OL]. arXiv preprint arXiv:2311.10122, 2023.

[24]Jiang Xingxun, Zong Yuan, Zheng Wenming, Tang Chuangao, Xia Wanchuang, Lu Cheng, Liu Jiateng. DFEW:A large-scale database for recognizing dynamic facial expressions in the wild[J]. Proceedings of the 28th ACM International Conference on Multimedia,2020:2881-2889.

[25]高静宇.将语言模型与图像对齐以实现多模态生成[J].软件学报, 2023, 34(06):1234–1242.

[26]Sun Licai, Lian Zheng, Liu Bin, Tao Jianhua. MAE-DFER:Efficient masked autoencoder for self-supervised dynamic facial expression recognition[J]. Proceedings of the 31st ACM International Conference on Multimedia, 2023:6110–6121.

[27]Chen, Y., Li, J., Shan, S., Wang, M.,&Hong, R.(2024). From static to dynamic:Adapting landmark-aware image models for facial expression recognition in videos. IEEE Transactions on Affective Computing.

[28]周宇.wav2vec 2.0:一种自监督学习语音表示的框架[J].神经信息处理系统进展, 33:12449–12460, 2020.

[29]刘殷涵,杜静菲,陈丹琪. RoBERTa:一种稳健优化的BERT预训练方法[J].软件学报, 2019, 30(07):1234–1245.

[30]Zheng Lian, Haiyang Sun, Licai Sun, Kang Chen, Mingyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, Jinming Zhao, et al. MER 2023:Multi-label Learning, Modality Robustness, and Semi-Supervised Learning[J]. Proceedings of the 31st ACM International Conference on Multimedia, 2023:9610–9614.

[31]Chaoyue Ding, Daoming Zong, Baoxiang Li, Ken Zheng, Dinghao Zhou, Jiakui Li, and Qunyan Zhou. Learning aligned audiovisual representations for multimodal sentiment analysis[J]. Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing, 2023:21–28.

[32]Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Jingkang Yang, Chunyuan Li, Ziwei Liu. MIMIC-IT:Multi-Modal In-Context Instruction Tuning[J/OL]. arXiv preprint arXiv:2306.05425, 2023.

[33]Zheng Lian, Licai Sun, Mingyu Xu, Haiyang Sun, Ke Xu, Zhuofan Wen, Shun Chen, Bin Liu, and Jianhua Tao. Explainable multimodal emotion reasoning[J/OL]. arXiv preprint arXiv:2306.15401, 2023.

基本信息:

中图分类号:TP391.1

引用信息:

[1]叶海燕.基于EMLLaMA的情感识别与推理[J].普洱学院学报,2025,41(06):47-57.

基金信息:

情感计算与先进智能机器安徽省重点实验室(合肥工业大学); 中央高校基本科研业务费专项资金资助(PA2023GDSK0061); 重点实验室自主创新专项“面向教育场景的多模态信息情感识别的研究与实现”

发布时间:

2026-01-13

出版时间:

2026-01-13

网络发布时间:

2026-01-13

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文