Published: 27 Aug 2025 597 views
This project aims to develop new RLHF algorithms to effectively solve complex RL tasks without access to a predefined reward function. The primary goal of this project will be the development of a novel RLHF framework that can learn more complex behaviours while requiring significantly less interactive human feedback than current RLHF methods. To achieve this, the project will focus on extracting more information from uncertain, incorrect, and inconsistent human feedback than is possible with current methods.
The direction of this project is highly flexible, and the student will have the opportunity to explore related directions that match their research interests. We intend for this project to explore applications of the new RLHF framework, such as fine-tuning and aligning large language models (LLMs), and the use of human feedback in robotics. The project may also explore the use of LLMs as part of the RLHF framework itself, to generate and/or interpret natural language feedback. The specific applications and research directions will depend on the student's own interests.
The preferred starting date for this position would be in February 2026, but this is very flexible.
Check also:
Wits-Edinburgh Programme in Sustainable African Futures by Mastercard Foundation 2026-2027
Department of Aeronautics MSc Scholarship at Imperial College London 2026
To apply for a PhD studentship, applications must be made directly to the University of Sheffield using the Postgraduate Online Application Form. Make sure you name Dr. Bei Peng and Dr. Robert Loftin as your proposed supervisor(s).
Information on what documents are required and a link to the application form can be found here - https://www.sheffield.ac.uk/postgraduate/phd/apply/applying
The form has comprehensive instructions for you to follow, and pop-up help is available.
Your research proposal should:
For more details, visit University of Sheffield webpage
A friend or someone might be interested in this opportunity, kindly share.