Backpropagate
Posts
Teaching Series #5 - Adversarial Search Algorithm

Teaching Series #5 - Adversarial Search Algorithm

"...The one who searches, is the one who finds..."

Dr Milos Ojdanic
June 12, 2025 • Estimated Reading Time: 8 minutes

Imagine you are trying to find your favourite book in a huge library without any external help. You could just randomly start looking for books on shelves, right? Or, you could have a system – maybe you start from the top left and systematically go through each shelf. That "system" of yours is kind of like a search algorithm!

In computer science, a search algorithm is a systematic process for finding a solution to a problem or specific information within a defined dataset or "space." Think of it as a set of instructions a computer follows to explore possibilities until it reaches a goal.

Every search problem has a search space, which is basically all the possible states or configurations. For example, in a game like chess, the search space includes every possible board arrangement. In a game like Tic Tac Toe, which we are going to explore in this lesson, the state of a game is a move you play. Remember, a state is a specific configuration within that space at a given time.

Throughout history, many, many different kinds of algorithms emerged. Maybe the most known are Uninformed Search Algorithms. They are called "uninformed" because they don't have any extra information about the goal. They just explore the search space systematically until they find what they are looking for. They can do Breadth-First Search (BFS), which means they check first all the possibilities (e.g., you check hallways you can take in the library, but you don’t explore beyond the corners). If you decide to take one hallway and follow it all until you reach the end, including following corners, you would do Depth-First Search (DFS). In their core, these algorithms are simple, and not of our interest, we want to focus on Informed Search Algorithms.

Informed search algorithms are different because they use extra knowledge or "hints" to guide their way. This extra knowledge comes in the form of a heuristic function.

A heuristic function (often just called a "heuristic") is essentially an educated guess that estimates how close you are (at a given state) to your aim (the goal state). It doesn't tell us the exact cost or path, but it gives us a good idea of which direction to go.

Here is an example. You are playing Tic Tac Toe. What is the goal state? To win, right? What is a given state? The move you make, right? X or O 🙂 Now, you would like to have some function that gives you an estimate of how far you are from your goal if you play that specific move, X or O 🙂 The algorithm can prioritise and help you which move to make next, searching much faster, especially in large search spaces of moves and possibilities. This is often called an evaluation function because it evaluates the "promisingness" _{(if I wrote it right}🙂₎ of each state (your move).

So, bottom line: For a heuristic to be useful, it needs to be Estimable: We need to be able to calculate it fairly easily. Informative: It should give us a reasonably accurate idea of how close we are to the goal.

In a game like Tic Tac Toe, or let’s put it in the other way, in the type of problem that Tic Tac Toe is, the "search space" becomes a game tree. Each node in the game tree represents a possible state of the game (e.g., the configuration of X and O on the grid). The branches represent possible moves that players can make.

So the goal isn't just to find the best move (node), it's to find the best move (node) while assuming your opponent is trying to stop you or beat you. So while playing the game - stepping from one node (move) to another - your goal is to maximise the score assigned to that node (move), while your opponent wants to maximise theirs (minimise yours).

Minimax Principle

A rational player will always choose the move that maximises their possible gain. In other words:

As the maximising player (you), you want to choose a move that leads to the highest possible score for yourself.
As the minimising player (your opponent), they will choose a move that leads to the lowest possible score for you (which, think about it, is the highest score for them!).

The Adversarial Search Algorithm (or minimax) recursively explores the game tree, assigning values (scores) to the leaves (game-ending states). Then, it propagates these values up the tree:

At your turn (the "max" node), you pick the state with the highest value.
At your opponent's turn (the "min" node), you pick the state with the lowest value for you (assuming that the opponent is highly competent and will play the worst case scenario for you 🙂 )

It's like looking several moves ahead, assuming your opponent will always play perfectly to make your life as difficult as possible. So in that case, you are choosing the move that gives you the best outcome even under that worst-case assumption (imagine knowing that if you select one move, you are going straight into the loss, even tho you are not aware of that yet).

When we talk about choosing the moves, we need to rely on the outcome of the particular state. In the example of Tic Tac Toe, each move we create will have a score assigned to it. Imagine, if you make a move, we need to ask a question if you won, lost, or it's a draw. If you win, we assign a high score (one that we want to maximise), and if you lose, we assign a low score (one that we want to minimise).

As you cannot win after the first move, we explore how the game evolves after the played move. Once we reach the “end states”, we evaluate the state and backpropagate the score achieved.

In simple words, whenever you choose a move, you want the state that has the highest score. When you have possible moves in front of you, each move will lead the game in a different direction. Thus, you want to make a move that has the highest score for you to win. How is this score calculated? By letting the player (your opponent) play after your move, let it explore the states the same way as you did, and let it assign the scores to the possible game states (opponent's turn). Once the opponent has explored all the states, it will propagate to you the minimum score (as it is the score which indicates the opponent played the perfect move, your minimum is their maximum 🙂 ). Once you have all the scores assigned to your potential possible moves (states), you pick the one that has the highest value, as it will be the highest among all the opponents’ minimum. Which means you have selected the state that indicates the best game direction for you, even tho it represents the opponent’s minimum of the further moves, because if you choose any state other than your maximum, you will be going into the direction that the opponent has more opportunity to win, you play against their perfect plays 🙂

The Minimax algorithm job is to look at all these possibilities, figure out what the final score would be for each of your initial moves (assuming perfect play from both sides), and then recommend the initial move that gives you the highest final score.

Let’s dive into the notes. A picture says more than a 1000 words, would you not agree? 🙂

For better comprehension and exploration, play the game, check the code and generate the tree to see what the decision tree graph looks like. Here is a small glimpse of what you will be able to generate, explore and find in the repo 🙂

Be curious, always learn.

Sincerely, MO