Research

IST researchers well-represented at international AI conference

Empirical Methods in Natural Language Processing is a top-tier conference on natural language understanding, AI text generation and multimodal natural language processing

UNIVERSITY PARK, Pa. — The Penn State College of Information Sciences and Technology (IST) was well-represented at November gathering of scholars and researchers discussing natural language processing and computational linguistics. The Conference on Empirical Methods in Natural Language Processing (EMNLP 2025) accepted more than a dozen papers from IST researchers.

From understanding the bias involved with a simple photograph to how artificial intelligence (AI) can provide mental health support to people suffering from PTSD, here is a sampling of the work accepted to EMNLP:

Title: Beyond Checkmate: Exploring the Creative Chokepoints in AI Text

Penn State authors: 

Nafis Irtiza Tripto and Majhabin Nahar, graduate students pursuing a doctoral degree in informatics; Saranya Venkatraman, who earned a doctorate degree from IST in 2024; Dongwon Lee, professor in the Department of Privacy and Cybersecurity Informatics

Summary: This work shows how AI writing and human writing differ. By comparing parts of text to stages of a chess game — opening, middlegame, endgame — the study finds that the body/middle segment of text is where AI and human styles are most different. This insight boosts AI-text detection and deepens the understanding of how people use language creatively.

Title: Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time

Penn State authors: Yifan Lan and Yuanpu Cao, graduate students pursuingdoctoral degrees in informatics, and Jinghui Chen, assistant professor in the Department of Informatics and Intelligent Systems

Summary: Multi-modal large language models (MLLMs) — AI systems that understand both images and text — are becoming more common, but they also raise new safety concerns. This work shows that carefully designed images can quietly influence MLLMs to give biased or slanted answers without being obviously harmful. This paper introduces a technique called Preference Hijacking (Phi), which uses a single “hijacked” image to steer MLLM responses at inference time without changing the model itself.

Title: The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support

Penn State authors: Suhas BN, who earned a doctorate from IST earlier this year; Yash Mahajan, graduate student pursuing a master’s degree in informatics; Dominik Mattioli, postdoctoral researcher; and Saeed Abdullah, associate professor in the Department of Human-Centered Computing and Social Informatics

Summary: This paper looks at whether smaller AI language models can give caring, supportive responses to people with PTSD. The researchers introduced a new dataset of 10,000 short conversations based on realistic PTSD experiences and used it to test several small models. After training, these models became more empathetic, and some came close to matching how empathetic humans judged the responses to be, even though larger models still performed best. The results suggest that smaller, more efficient AI systems could still provide helpful emotional support in mental health settings.

Title: SimpleDoc: Multi-Modal Document Understanding With Dual-Cue Page Retrieval and Iterative Refinement

Penn State authors: Yiran Wu and Jiale Liu, graduate students pursuing doctoral degrees in informatics; Qingyun Wu, assistant professor in the Department of Informatics and Intelligent Systems

Summary: This work introduces SimpleDoc, an easy-to-use framework that helps answer questions about documents with many pages, images and text. It finds the most relevant pages by looking at both overall meaning and key content, then reasons through them step by step to produce accurate answers. Compared with earlier approaches, SimpleDoc gives better answers while needing to review fewer pages.

Title: From Noise to Nuance: Enriching Subjective Data Annotation Through Qualitative Analysis

Penn State authors: Ruyuan Wan, graduate student pursuing a doctoral degree in informatics, and Ting-Hao "Kenneth" Huang, associate professor in the Department of Human-Centered Computing and Social Informatics

Summary: Subjective data annotation is a key part of many natural language processing (NLP) tasks. High-quality annotation is vital to prevent harmful downstream effects. However, differences in how annotators label data are often dismissed as mistakes (noise) instead of valuable information (nuance). This work compares subjective data annotation and qualitative data analysis across their task nature and practice and provides five actionable recommendations.

Title: LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles - ACL Anthology

Penn State Authors: Ho Yin “Sam” Ng and Aashish Anantha Ramakrishnan, graduate students pursuing a doctoral degree in informatics; Ting-Yao Hsu, who earned a doctorate from Engineering earlier this year; Dongwon Lee, Ting-Hao “Kenneth” Huang

Summary: While AI often generates generic figure captions, our paper introduces LaMP-Cap, a dataset designed to personalize captions by analyzing the surrounding text and images in a scholarly paper. Our research demonstrates that using this multimodal context helps AI better mimic an author’s unique professional writing style.

Title: Reinforcement Learning for Large Language Models via Group Preference Reward Shaping

Penn State Authors: Huaisheng Zhu, Zhimeng Guo, Hangfan Zhang, graduate students pursuing doctoral degrees in the College of IST; Teng Xiao and Siyuan Xu, graduate students pursuing doctoral degrees in the College of Engineering; Vasant Honavar, professor in IST’s Department of Informatics and Intelligent Systems

Summary: Group Preference Reward Shaping (GPRS) is a lightweight reinforcement learning method that helps align large language models with human preferences using preference-based reward shaping. The paper offers a theoretical analysis that GPRS steadily improves model performance over time and does not require extra evaluation models. GPRS achieves top results on tests that measure learning from human feedback and reasoning ability.

Last Updated December 22, 2025

Contact