Simon Yu

I am a 2nd year PhD student at Northeastern University, advised by Weiyan Shi. In summer 2025, I interned at Orby AI, mentored by Peng Qi. I am a main organizer of the NeurIPS 2025 Workshop on Multi-Turn Interactions in LLMs. I was a M.Res. and BSc student at the University of Edinburgh, also part of the EdinburghNLP. I am part of the Cohere For AI community working with Marzieh Fadaee. I have done research about alignment, instruction tuning and safety.

My research interests lie in three main directions:

Self-Play Learning: Self-improvement methods to improve LLMs capabilities without human supervision, including work on SPIRAL for self-play in zero-sum games show improvement in reasoning.
Environment Scaling: Scaling the environment for models to interact with, works include TextArena for multi-agent environment and evaluation and GEM for unified, scalable environment generation.
Continual Learning: Building agents that learn and adapt continuously across tasks, including PolySkill for compositional skill learning for Agents.

One of the most influential lessons to me is from The Bitter Lesson by Richard Sutton and The Era of Experience by David Silver and Richard Sutton. The idea is not just limited to AI but can be applied to any choice in life. Always choose the path that benefits in the long run, instead of the path that might be easier in the short run.

news

Dec 06, 2025	New! Hosting the Multi-Turn Interaction workshop @ NeurIPS 2025, see you in San Diego!
Oct 06, 2025	New! Our GEM paper received Oral at the SEA workshop @ NeurIPS 2025 and Spotlight at the MTI-LLM workshop @ NeurIPS 2025!
Oct 01, 2025	New! New preprint out on GEM: A Gym for Generalist LLMs.
Oct 01, 2025	New! New preprint out on Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity.
Jun 24, 2025	New! New preprint out on SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning.

selected publications

Arxiv

PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction

Simon Yu, Gang Li, Weiyan Shi , and Peng Qi

2025
Arxiv

TextArena

Leon Guertler, Bobby Cheng, Simon Yu, Bo Liu, Leshem Choshen, and 1 more author

2025
Arxiv

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Bo Liu^*, Simon Yu^* , Zichen Liu^*, Leon Guertler^*, Penghui Qi, and 7 more authors

2025

PDF Code
Arxiv

Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

Jiayi Zhang^*, Simon Yu^*, Derek Chong^*, Anthony Sicilia, Michael Tomz, and 2 more authors

2025

PDF Code
Arxiv

GEM: A Gym for Generalist LLMs

Zichen Liu^*, Anya Sims^*, Keyu Duan^* , Changyu Chen^*, Simon Yu, and 11 more authors

2025

PDF Code
DataWorld @ ICML 2025

Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement

Simon Yu^*, Liangyu Chen^*, Sara Ahmadian, and Marzieh Fadaee

2024

PDF Code
COLM

Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models

Simon Yu, Jie He, Pasquale Minervini, and Jeff Z. Pan

In Proceedings of the Conference on Language Modeling (COLM), 2024

PDF Code

Acknowledgement

Since I began my research, I have met many intelligent, disciplined, and wonderful peers to work with, including (but not limited to) Andrej Jovanovic@Cambridge, Hanxu Hu@UZH, Chenmien Tan@Edinburgh, Pinzhen Chen@Edinburgh, Yijun Yang@Edinburgh and Liangyu Chen@Stanford. I have truly learned a lot from them, and I enjoyed all the discussion we had.