cv
Basics
Name | Simon Yu |
Label | PhD Candidate |
simon011130@gmail.com | |
Summary | PhD Candidate specializing in Conversational Agent, Alignment, and Safety in ML, with a strong background in AI and Computer Science. |
Work
-
2023.10 - Present Research Associate
Cohere for AI
Research on data selection and clustering for instruction fine-tuning.
- Collaborating with Liangyu Chen, Marzieh Fadaee, Sara Ahmadian
-
2023.06 - 2023.09 Machine Learning Engineer Intern
Expedia Group, Inc.
Worked on deploying AI models more efficiently for business applications in the Expedia For Business (E4B) teams.
- Designed and assisted in the deployment of Machine Learning models through a streamlined pipeline using ONNX and CI/CD tools
- Migrated deployment pipeline from Jenkins to GitHub Actions
- Employed CI/CD tools (Spinnaker, Github Actions) and Cloud Services (AWS S3 Bucket, Lambda)
-
2022.10 - 2023.09 Research Assistant
Institute for Language, Cognition and Computation (ILCC), University of Edinburgh
Research in Supervised Contrastive Learning in NLP, Few-shot learning, Commonsense Question Answering.
- 1 Paper Accepted by ACL 2023 Main (Poster)
- Supervised by Dr Jeff Z. Pan
-
2022.03 - 2022.09 Research Intern
Huawei Noah's Lab R&D
Specialized in E-commerce Knowledge Graphs, Entity Linking, and Disambiguation from unstructured data.
- Enhanced the performance of the entity extraction module
- Contributed to benchmarking work
- Assisted in the design and development of the product's Knowledge Graph
- Leveraged data processing tools such as Apache Spark and Hadoop
- Employed CI/CD tools (Nexus Repository, Jenkins)
Education
-
2024.09 - Present Boston, MA
-
2023.10 - 2024.08 Edinburgh, UK
-
2019.09 - 2023.07 Edinburgh, UK
BSc
The University of Edinburgh
Artificial Intelligence and Computer Science
- Natural Language Understanding
- Generation, and Machine Translation
- Text Technologies for Data Science
- Software Engineering and Professional Practice
Awards
- 2023
Outstanding Dissertation Award
The University of Edinburgh
Scored top (90%) in the final year dissertation among the class
Publications
-
2024 Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models
Conference on Language Modeling 2024 (COLM-2024)
Co-authored with Jie He, Pasquale Miniverini, and Jeff Pan
-
2024 Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?
Under review
Co-authored with Pinzhen Chen, Zhicheng Guo, Barry Haddow
-
2024 Fine-Tuning Large Language Models with Sequential Instructions
Under review
Co-authored with Hanxu Hu and Pinzhen Chen
-
2023 BUCA: A Binary Classification Approach to Unsupervised Commonsense Question Answering
The 2023 Annual Meeting of the Association for Computational Linguistics (Main, ACL-2023)
Co-authored with Jie He, Victor Gutierrez-Basulto, and Jeff Pan
-
2023 Instances and Labels: Hierarchy-aware Joint Supervised Contrastive Learning for Hierarchical Multi-Label Text Classification
The 2023 Conference on Empirical Methods in Natural Language Processing (Findings, EMNLP-2023)
Co-authored with Jie He, Victor Gutierrez-Basulto, and Jeff Pan
Skills
Machine Learning | |
Natural Language Processing | |
Conversational Agents | |
AI Safety | |
Instruction Tuning |
Programming | |
Python | |
ONNX | |
CI/CD | |
GitHub Actions | |
Apache Spark | |
Hadoop |
Cloud Services | |
AWS S3 Bucket | |
AWS Lambda |
Languages
English | |
Fluent |
Interests
Artificial Intelligence | |
Conversational Agents | |
AI Safety | |
Natural Language Processing | |
Machine Learning |