Generative AI Advance Fine-Tuning for LLMs
This course is part of multiple programs. Learn more
Instructors: Joseph Santarcangelo +3 more
Instructor ratings
We asked all learners to give feedback on our instructors based on the quality of their teaching style.
What you'll learn
Skills you'll gain
There are 2 modules in this course
During this course, you’ll explore different approaches to fine-tuning and causal LLMs with human feedback and direct preference. You’ll look at LLMs as policies for probability distributions for generating responses and the concepts of instruction-tuning with Hugging Face. You’ll learn to calculate rewards using human feedback and reward modeling with Hugging Face. Plus, you’ll explore reinforcement learning from human feedback (RLHF), proximal policy optimization (PPO) and PPO Trainer, and optimal solutions for direct preference optimization (DPO) problems. As you learn, you’ll get valuable hands-on experience in online labs where you’ll work on reward modeling, PPO, and DPO. If you’re looking to add in-demand capabilities in fine-tuning LLMs to your resume, ENROLL TODAY and build the job-ready skills employers are looking for in just two weeks!
Fine-Tuning Causal LLMs with Human Feedback and Direct Preference
Explore more from Machine Learning
©2025 ementorhub.com. All rights reserved