Simon Yu

You can also call me U Chi Lok (余知樂) or Simão (in Portuguese)

personal.jpg

I am a 2nd year PhD student at Northeastern University, advised by Weiyan Shi. In summer 2025, I interned at Orby AI, mentored by Peng Qi. I am a main organizer of the NeurIPS 2025 Workshop on Multi-Turn Interactions in LLMs. I was a M.Res. and BSc student at the University of Edinburgh, also part of the EdinburghNLP. I am part of the Cohere For AI community working with Marzieh Fadaee. I have done research about alignment, instruction tuning and safety.

My research interests lie in three main directions:

  1. Self-Play Learning: Self-improvement methods to improve LLMs capabilities without human supervision, including work on SPIRAL for self-play in zero-sum games show improvement in reasoning.
  2. Environment Scaling: Scaling the environment for models to interact with, works include TextArena for multi-agent environment and evaluation and GEM for unified, scalable environment generation.
  3. Continual Learning: Building agents that learn and adapt continuously across tasks, including PolySkill for compositional skill learning for Agents.

One of the most influential lessons to me is from The Bitter Lesson by Richard Sutton and The Era of Experience by David Silver and Richard Sutton. The idea is not just limited to AI but can be applied to any choice in life. Always choose the path that benefits in the long run, instead of the path that might be easier in the short run.

news

Dec 06, 2025 New! Hosting the Multi-Turn Interaction workshop @ NeurIPS 2025, see you in San Diego!
Oct 06, 2025 New! Our GEM paper received Oral at the SEA workshop @ NeurIPS 2025 and Spotlight at the MTI-LLM workshop @ NeurIPS 2025!
Oct 01, 2025 New! New preprint out on GEM: A Gym for Generalist LLMs.
Oct 01, 2025 New! New preprint out on Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity.
Jun 24, 2025 New! New preprint out on SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning.

selected publications

Acknowledgement

Since I began my research, I have met many intelligent, disciplined, and wonderful peers to work with, including (but not limited to) Andrej Jovanovic@Cambridge, Hanxu Hu@UZH, Chenmien Tan@Edinburgh, Pinzhen Chen@Edinburgh, Yijun Yang@Edinburgh and Liangyu Chen@Stanford. I have truly learned a lot from them, and I enjoyed all the discussion we had.