The 2-Minute Primer

This is a quick primer. If you'd rather just get to the building, skip to Act 1. None of this is required btw. It'll all make more sense in context anyway.

But if you want the quick version of what neural networks and reinforcement learning are before we start, keep going. Two minutes, tops.

· · ·

Machine learning

Instead of writing rules for a program to follow, you give it data and let it figure out the rules on its own. The program looks at thousands of examples, finds patterns, and gets better at the task over time. Nobody tells it how but it just learns.

Neural network

A neural network is the thing that does the learning. It takes in numbers, passes them through layers of simple math, and produces an output. During training, it adjusts the math slightly after every mistake, so the next output is a little less wrong. Do this enough times and the outputs actually start being useful.

Training

Training is just the process of feeding the network examples and letting it make mistakes. Each mistake tells it which direction to adjust. Run this loop thousands of times and the network gradually gets better at the task.

Reinforcement learning

Most machine learning learns from labeled data. "This image is a cat. This one is a dog." Reinforcement learning is different. There are no labels. Instead, an agent takes actions in an environment, gets rewards or penalties based on what happens, and learns which actions lead to better outcomes. Think of it like training a dog. You don't explain the rules. You just reward the behavior you want.

Reward function

The reward function is how you tell the agent what "good" and "bad" mean. Ate food? Here's a reward. Died? That's a penalty. Moved closer to food? Small reward. It's the only feedback the agent gets, which means the reward function shapes everything. Get it wrong and the agent will learn exactly what you told it to learn, which might not be what you actually wanted.

The training loop

The agent plays the game. It makes a move, sees what happens, gets a reward, and stores that experience. Then it looks back at a bunch of past experiences and adjusts itself to make better decisions next time. Play, remember, learn, repeat. That's the whole cycle.

· · ·

That's the vocabulary. None of it needs to be fully understood right now. It will make more sense once you see it in action.

Read Act 1