Mahesh Ramesh - Research

Publications

Playing with Fire: What Transfers When RL Trains a Language Agent?

Mahesh Ramesh^*, Kaousheik Jayakumar^*, Hemanth Ram, Pavan Thodima, Ramani Duraiswami, Dinesh Manocha, Aniket Rege, Emmanouil-Vasileios Vlatakis-Gkaragkounis (* equal contribution)

ICML 2026 Workshop · RLxF: Reinforcement Learning from World Feedback

Studies what transfers when models are trained with RL and which factors control that transfer. The paper shows that RL generalization depends on pre-training knowledge, but RL without strong prior knowledge is still useful: hard-task RL-trained models perform better than base models when enough information is provided in context, improve on domains where the model is already competent, and provide a better initialization for further staged RL training.

[ICML]
Sparks of Cooperative Reasoning: LLMs as Strategic Hanabi Agents

Mahesh Ramesh, Kaousheik Jayakumar, Aswinkumar Ramkumar, Pavan Thodima, Aniket Rege, Emmanouil-Vasileios Vlatakis-Gkaragkounis

ICML 2026

Introduced a multi-turn benchmark to evaluate state-tracking and cooperation in frontier models. RL-trained a 4B model on curated data, outperforming all non-reasoning baselines. Released 1500+ (~90K data points) game trajectories for SFT and move-level ratings for RLVR.

[arXiv] [code]
CuRe: Cultural Gaps in the Long Tail of Text-to-Image Models

Aniket Rege, Zinnia Nie, Mahesh Ramesh, Unmesh Raskar, Zhuoran Yu, Aditya Kusupati, Yong Jae Lee, Ramya Korlakai Vinayak

ICCV 2025 · CVPR DemoDiv Workshop (Oral)

Curated 300 artifacts across 6 cultural axes and 64 countries. Developed Marginal Information Attribution metrics achieving 2x Spearman correlation improvement over prior scorers. Conducted 2,700 artifact-level surveys with culturally-aligned annotators.

[arXiv] [project page]
MABViT: Modified Attention Block Enhances Vision Transformers

Mahesh Ramesh, Aswinkumar Ramkumar

Deployable AI Workshop, AAAI 2024 (Oral)

Integrated Gated Linear Units into the attention module for parallel MLP-attention computation. Achieved 0.6% accuracy gain over ViT-S/16, surpassed ViT-B/16 with half the parameters, and 17% faster training convergence.

[arXiv]

Thesis

Multivariate Time-Series Data Augmentation for Anomaly Detection

Mahesh Ramesh

B.Tech Thesis, IIT Madras · Dassault Aviation Collaboration · Presented at RBCDSAI 2023

Designed a deep neural network to augment scarce flight telemetry for improved anomaly-detection recall. Segmented mixed discrete/continuous sensor streams into state clusters. Implemented a VAE with attention mechanism and soft constraints, achieving a 15% reduction in discrimination score.