Simon Yu
You can also call me U Chi Lok (余知樂) or Simão (in Portuguese)
I am a 2nd year PhD student at Northeastern University, advised by Weiyan Shi. In summer 2025, I interned at Orby AI, mentored by Peng Qi. I am a main organizer of the NeurIPS 2025 Workshop on Multi-Turn Interactions in LLMs. I was a M.Res. and BSc student at the University of Edinburgh, also part of the EdinburghNLP. I am part of the Cohere For AI community working with Marzieh Fadaee. I have done research about alignment, instruction tuning and safety.
My research interests lie in three main directions:
- Self-Play Learning: Self-improvement methods to improve LLMs capabilities without human supervision, including work on SPIRAL for self-play in zero-sum games show improvement in reasoning.
- Environment Scaling: Scaling the environment for models to interact with, works include TextArena for multi-agent environment and evaluation and GEM for unified, scalable environment generation.
- Continual Learning: Building agents that learn and adapt continuously across tasks, including PolySkill for compositional skill learning for Agents.
One of the most influential lessons to me is from The Bitter Lesson by Richard Sutton and The Era of Experience by David Silver and Richard Sutton. The idea is not just limited to AI but can be applied to any choice in life. Always choose the path that benefits in the long run, instead of the path that might be easier in the short run.
news
| Dec 06, 2025 | New! Hosting the Multi-Turn Interaction workshop @ NeurIPS 2025, see you in San Diego! |
|---|---|
| Oct 06, 2025 | New! Our GEM paper received Oral at the SEA workshop @ NeurIPS 2025 and Spotlight at the MTI-LLM workshop @ NeurIPS 2025! |
| Oct 01, 2025 | New! New preprint out on GEM: A Gym for Generalist LLMs. |
| Oct 01, 2025 | New! New preprint out on Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity. |
| Jun 24, 2025 | New! New preprint out on SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning. |
selected publications
- Arxiv
- Arxiv
- DataWorld @ ICML 2025
- COLMIn Proceedings of the Conference on Language Modeling (COLM), 2024
Acknowledgement
Since I began my research, I have met many intelligent, disciplined, and wonderful peers to work with, including (but not limited to) Andrej Jovanovic@Cambridge, Hanxu Hu@UZH, Chenmien Tan@Edinburgh, Pinzhen Chen@Edinburgh, Yijun Yang@Edinburgh and Liangyu Chen@Stanford. I have truly learned a lot from them, and I enjoyed all the discussion we had.