Ementorhub

Efficiently Serving LLMs

Instructor: Travis Addair

What you'll learn

Learn how Large Language Models (LLMs) repeatedly predict the next token, and how techniques like KV caching can greatly speed up text generation.

Code for efficient LLM app serving, balancing model output speed and serving many users at once.

Explore the fundamentals of Low Rank Adapters and see how Predibase builds their framework inference server to serve fine-tuned models at once.

Skills you'll practice

Performance Tuning

Generative AI

Large Language Modeling

PyTorch (Machine Learning Library)

OpenAI

Improving Accuracy of LLM Applications

Pretraining LLMs

Open Source LLMOps Solutions

Operationalizing LLMs on Azure