Ementorhub

What you'll learn

In-demand gen AI engineering skills in fine-tuning LLMs employers are actively looking for in just 2 weeks

Instruction-tuning and reward modeling with the Hugging Face, plus LLMs as policies and RLHF

Direct preference optimization (DPO) with partition function and Hugging Face and how to create an optimal solution to a DPO problem

How to use proximal policy optimization (PPO) with Hugging Face to create a scoring function and perform dataset tokenization

Skills you'll gain

Prompt Engineering

Large Language Modeling

Performance Tuning

Reinforcement Learning

Generative AI

Natural Language Processing

There are 2 modules in this course

During this course, you’ll explore different approaches to fine-tuning and causal LLMs with human feedback and direct preference. You’ll look at LLMs as policies for probability distributions for generating responses and the concepts of instruction-tuning with Hugging Face. You’ll learn to calculate rewards using human feedback and reward modeling with Hugging Face. Plus, you’ll explore reinforcement learning from human feedback (RLHF), proximal policy optimization (PPO) and PPO Trainer, and optimal solutions for direct preference optimization (DPO) problems. As you learn, you’ll get valuable hands-on experience in online labs where you’ll work on reward modeling, PPO, and DPO. If you’re looking to add in-demand capabilities in fine-tuning LLMs to your resume, ENROLL TODAY and build the job-ready skills employers are looking for in just two weeks!

Fine-Tuning Causal LLMs with Human Feedback and Direct Preference