Efficiently Serving LLMs

Instructor: Travis Addair

What you'll learn

  •   Learn how Large Language Models (LLMs) repeatedly predict the next token, and how techniques like KV caching can greatly speed up text generation.
  •   Code for efficient LLM app serving, balancing model output speed and serving many users at once.
  •   Explore the fundamentals of Low Rank Adapters and see how Predibase builds their framework inference server to serve fine-tuned models at once.
  • Skills you'll practice

  •   Performance Tuning
  •   Generative AI
  •   Large Language Modeling
  •   PyTorch (Machine Learning Library)
  •   OpenAI
  • ©2025  ementorhub.com. All rights reserved