Reinforcement learning is a type of machine learning where an agent learns by interacting with an environment, receiving rewards for good actions and penalties for bad ones. It's how AlphaGo beat the world Go champion and how ChatGPT was aligned with human preferences (RLHF).

How Reinforcement Learning Works

Unlike supervised learning (learn from labeled data) or unsupervised learning (find patterns), RL learns through trial and error. An agent takes actions in an environment, observes outcomes, and adjusts its strategy to maximize cumulative reward over time.

RLHF (Reinforcement Learning from Human Feedback) is how GPT-4 and Claude are fine-tuned to be helpful, harmless, and honest. Human raters rank model outputs, and RL optimizes the model to produce responses humans prefer.

Key Concepts

  • Agent — The learner/decision-maker that takes actions in the environment
  • Reward Signal — Feedback from the environment — positive for good actions, negative for bad ones
  • Policy — The strategy the agent uses to decide actions — learned and improved through training

Frequently Asked Questions

What is RLHF?

Reinforcement Learning from Human Feedback. Humans rank AI outputs, and RL trains the model to produce preferred responses. It's how ChatGPT and Claude learn to be helpful and safe.

Where is reinforcement learning used?

Game AI (AlphaGo, OpenAI Five), robotics (learning to walk/grasp), recommendation systems, autonomous driving, and LLM alignment (RLHF).