Introduction
Large Language Models have demonstrated remarkable capabilities, but standard prompting often limits their potential for complex reasoning tasks. Chain-of-Thought (CoT) prompting showed that letting models “think step by step” improves resultsโbut what if a single reasoning path isn’t enough?
Tree of Thoughts (ToT) generalizes CoT by maintaining a tree of reasoning possibilities, allowing models to explore multiple paths, evaluate their promise, and backtrack when needed. This approach has achieved state-of-the-art results on tasks requiring planning, search, and strategic reasoning.
In 2026, ToT and related algorithms are essential tools for building sophisticated AI systems. This comprehensive guide explores the algorithms, implementations, and practical applications of tree-based reasoning.
From Chain to Tree: The Evolution
Chain-of-Thought (CoT)
class ChainOfThought:
"""
Chain-of-Thought: Linear step-by-step reasoning.
"""
def __init__(self, llm):
self.llm = llm
def solve(self, problem: str) -> str:
"""
Solve problem with linear reasoning.
"""
prompt = f"""Solve this problem step by step:
Problem: {problem}
Let's think step by step:"""
reasoning = self.llm.generate(prompt)
return reasoning
Tree of Thoughts
class Thought:
"""
A single thought in the reasoning tree.
"""
def __init__(self, text: str, value: float = 0.0, parent: 'Thought' = None):
self.text = text
self.value = value # Quality/score of this thought
self.parent = parent
self.children = []
self.depth = 0 if parent is None else parent.depth + 1
def add_child(self, child: 'Thought'):
"""Add child thought."""
child.parent = self
child.depth = self.depth + 1
self.children.append(child)
def get_path(self) -> list:
"""Get full reasoning path from root."""
path = [self]
current = self
while current.parent:
path.append(current.parent)
current = current.parent
return list(reversed(path))
class TreeOfThoughts:
"""
Tree of Thoughts: Explore multiple reasoning paths.
"""
def __init__(self, llm, max_depth=5, num_branches=3):
self.llm = llm
self.max_depth = max_depth
self.num_branches = num_branches
self.root = None
def generate_thoughts(self, prompt: str, context: str = "") -> list:
"""
Generate multiple candidate thoughts from current state.
"""
full_prompt = f"""Given the current reasoning state:
{context}
Generate {self.num_branches} different possible next steps or conclusions.
For each, provide a brief explanation.
Current state: {prompt}
Generate {self.num_branches} options:"""
response = self.llm.generate(full_prompt)
# Parse response into individual thoughts
thoughts = self._parse_thoughts(response)
return thoughts
def _parse_thoughts(self, response: str) -> list:
"""Parse LLM response into thought objects."""
# Simplified parsing
lines = response.strip().split('\n')
thoughts = []
for line in lines:
if line.strip():
thoughts.append(Thought(text=line.strip()))
return thoughts[:self.num_branches]
def evaluate_thought(self, thought: Thought, problem: str) -> float:
"""
Evaluate quality of a thought.
"""
prompt = f"""Evaluate how promising this reasoning step is for solving the problem.
Problem: {problem}
Reasoning step: {thought.text}
Rate from 0-10 how promising this is:
Give only a number:"""
response = self.llm.generate(prompt).strip()
try:
value = float(response)
thought.value = value / 10.0 # Normalize to 0-1
except:
thought.value = 0.5
return thought.value
def solve(self, problem: str) -> str:
"""
Solve problem using tree search.
"""
# Initialize root
self.root = Thought(text=problem)
# Breadth-first search
frontier = [self.root]
for depth in range(self.max_depth):
next_frontier = []
for thought in frontier:
# Generate candidate thoughts
context = self._get_context(thought)
candidates = self.generate_thoughts(
problem,
context
)
# Evaluate and add children
for candidate in candidates:
candidate.value = self.evaluate_thought(candidate, problem)
thought.add_child(candidate)
next_frontier.append(candidate)
# Select top candidates for next depth
next_frontier = sorted(
next_frontier,
key=lambda t: t.value,
reverse=True
)[:self.num_branches]
frontier = next_frontier
# Return best solution
return self._select_best_solution(self.root)
def _get_context(self, thought: Thought) -> str:
"""Get reasoning context from root to current thought."""
path = thought.get_path()
return " -> ".join([t.text for t in path])
def _select_best_solution(self, root: Thought) -> str:
"""Select best solution from tree."""
# Find leaf with highest value
best_leaf = None
best_value = -float('inf')
def find_leaves(node):
nonlocal best_leaf, best_value
if not node.children:
if node.value > best_value:
best_value = node.value
best_leaf = node
else:
for child in node.children:
find_leaves(child)
find_leaves(root)
if best_leaf:
return best_leaf.text
return root.text
ToT Algorithms
1. BFS (Breadth-First Search)
class ToT_BFS:
"""
Tree of Thoughts with Breadth-First Search.
"""
def __init__(self, llm, num_branches=5, evaluation_limit=20):
self.llm = llm
self.num_branches = num_branches
self.evaluation_limit = evaluation_limit
def solve(self, problem: str) -> dict:
"""
Solve using BFS exploration.
"""
# Step 1: Generate initial thoughts
root = Thought(problem)
# Step 2: Expand breadth-first
frontier = [root]
all_nodes = [root]
while frontier and len(all_nodes) < self.evaluation_limit:
current = frontier.pop(0)
# Check if leaf
if self._is_terminal(current):
return {
'solution': current.text,
'value': current.value,
'nodes_explored': len(all_nodes)
}
# Generate children
children = self._generate_children(current, problem)
# Evaluate children
for child in children:
child.value = self._evaluate(child, problem)
current.add_child(child)
frontier.append(child)
all_nodes.append(child)
# Return best found
return self._get_best(all_nodes)
def _generate_children(self, node: Thought, problem: str) -> list:
"""Generate child thoughts."""
prompt = f"""Given the current reasoning:
{node.text}
Generate 3 different possible next steps:"""
response = self.llm.generate(prompt)
thoughts = []
for line in response.split('\n'):
if line.strip():
thoughts.append(Thought(line.strip()))
return thoughts[:self.num_branches]
def _evaluate(self, thought: Thought, problem: str) -> float:
"""Evaluate thought quality."""
prompt = f"""Rate (0-10) how good this reasoning step is for:
Problem: {problem}
Step: {thought.text}
Number:"""
try:
score = float(self.llm.generate(prompt).strip())
return score / 10.0
except:
return 0.5
def _is_terminal(self, node: Thought) -> bool:
"""Check if this is a terminal node."""
# Simplified: check if solution found
return "therefore" in node.text.lower() or "final answer" in node.text.lower()
def _get_best(self, nodes: list) -> dict:
"""Get best solution."""
best = max(nodes, key=lambda n: n.value)
return {
'solution': best.text,
'value': best.value,
'nodes_explored': len(nodes)
}
2. DFS (Depth-First Search)
class ToT_DFS:
"""
Tree of Thoughts with Depth-First Search.
"""
def __init__(self, llm, max_depth=5):
self.llm = llm
self.max_depth = max_depth
def solve(self, problem: str) -> dict:
"""Solve using DFS."""
self.best_solution = None
self.best_value = -float('inf')
root = Thought(problem)
self._dfs(root, problem, 0)
return {
'solution': self.best_solution,
'value': self.best_value
}
def _dfs(self, node: Thought, problem: str, depth: int):
"""Recursive DFS."""
# Check depth limit
if depth >= self.max_depth:
value = self._evaluate(node, problem)
if value > self.best_value:
self.best_value = value
self.best_solution = node.text
return
# Generate children
children = self._generate_children(node, problem)
# Recurse
for child in children:
child.value = self._evaluate(child, problem)
node.add_child(child)
self._dfs(child, problem, depth + 1)
def _generate_children(self, node: Thought, problem: str) -> list:
"""Generate possible next steps."""
prompt = f"""Given:
{node.text}
What are 2-3 possible next steps?"""
response = self.llm.generate(prompt)
return [Thought(line.strip())
for line in response.split('\n') if line.strip()][:3]
def _evaluate(self, thought: Thought, problem: str) -> float:
"""Evaluate thought."""
prompt = f"""Rate 0-10: {thought.text}"""
try:
return float(self.llm.generate(prompt).strip()) / 10.0
except:
return 0.5
3. Best-First Search
class ToT_BestFirst:
"""
Tree of Thoughts with Best-First Search.
Always expands the most promising node.
"""
def __init__(self, llm, max_nodes=30):
self.llm = llm
self.max_nodes = max_nodes
def solve(self, problem: str) -> dict:
"""Solve using best-first search."""
import heapq
root = Thought(problem)
root.value = 1.0
# Priority queue (max heap via negative values)
frontier = [(-root.value, id(root), root)]
visited = set()
while frontier and len(visited) < self.max_nodes:
# Pop most promising
_, _, current = heapq.heappop(frontier)
# Check if terminal
if self._is_terminal(current):
return {
'solution': current.text,
'value': current.value,
'nodes': len(visited)
}
# Generate children
children = self._generate_children(current, problem)
for child in children:
child.value = self._evaluate(child, problem)
current.add_child(child)
heapq.heappush(frontier, (-child.value, id(child), child))
visited.add(id(child))
# Return best found
return self._get_best_solution(frontier)
def _generate_children(self, node: Thought, problem: str) -> list:
"""Generate children."""
prompt = f"""Given {node.text}, what are 3 options?"""
response = self.llm.generate(prompt)
return [Thought(line.strip())
for line in response.split('\n') if line.strip()][:3]
def _evaluate(self, thought: Thought, problem: str) -> float:
"""Evaluate thought."""
prompt = f"""Score 0-10: {thought.text}"""
try:
return float(self.llm.generate(prompt).strip()) / 10.0
except:
return 0.5
def _is_terminal(self, node: Thought) -> bool:
"""Check if terminal."""
return "answer" in node.text.lower() or node.depth > 5
def _get_best_solution(self, frontier: list) -> dict:
"""Get best from frontier."""
if not frontier:
return {'solution': '', 'value': 0}
best = max(frontier, key=lambda x: -x[0])
return {
'solution': best[2].text,
'value': -best[0],
'nodes': len(frontier)
}
4. Beam Search
class ToT_BeamSearch:
"""
Tree of Thoughts with Beam Search.
Maintains top-k candidates at each level.
"""
def __init__(self, llm, beam_width=3, depth=4):
self.llm = llm
self.beam_width = beam_width
self.depth = depth
def solve(self, problem: str) -> dict:
"""Solve using beam search."""
# Initialize beam with root
beam = [Thought(problem)]
for level in range(self.depth):
candidates = []
# Generate children for all beam elements
for node in beam:
children = self._generate_children(node, problem)
for child in children:
child.value = self._evaluate(child, problem)
node.add_child(child)
candidates.append(child)
# Keep top-k
beam = sorted(candidates, key=lambda n: n.value,
reverse=True)[:self.beam_width]
# Return best
best = max(beam, key=lambda n: n.value)
return {
'solution': best.text,
'value': best.value,
'beam_width': self.beam_width,
'depth': self.depth
}
def _generate_children(self, node: Thought, problem: str) -> list:
"""Generate children."""
prompt = f"""Given {node.text}, what are 2 options?"""
response = self.llm.generate(prompt)
return [Thought(line.strip())
for line in response.split('\n') if line.strip()][:2]
def _evaluate(self, thought: Thought, problem: str) -> float:
"""Evaluate thought."""
prompt = f"""Score 0-10: {thought.text}"""
try:
return float(self.llm.generate(prompt).strip()) / 10.0
except:
return 0.5
Extensions and Variants
1. Graph of Thoughts (GoT)
class GraphOfThoughts:
"""
Graph of Thoughts: Allow arbitrary connections between thoughts.
"""
def __init__(self, llm):
self.llm = llm
self.thoughts = {} # thought_id -> Thought
self.edges = [] # (from_id, to_id) pairs
def solve(self, problem: str) -> str:
"""Solve with graph reasoning."""
# Create initial thought
root = self._create_thought(problem)
# Iteratively expand
active = {root.id}
for _ in range(10):
# Generate candidates from active thoughts
candidates = []
for thought_id in active:
thought = self.thoughts[thought_id]
new_thoughts = self._generate_variations(thought)
candidates.extend(new_thoughts)
# Evaluate and add best
for candidate in candidates:
score = self._evaluate(candidate)
if score > 0.7:
self._add_thought(candidate, parent=thought)
# Check for solution
solution = self._check_solution()
if solution:
return solution
return self._get_best()
def _create_thought(self, text: str) -> Thought:
"""Create new thought with ID."""
import uuid
thought = Thought(text)
thought.id = str(uuid.uuid4())
self.thoughts[thought.id] = thought
return thought
def _add_thought(self, thought: Thought, parent: Thought):
"""Add thought to graph."""
thought.id = str(uuid.uuid4())
self.thoughts[thought.id] = thought
self.edges.append((parent.id, thought.id))
2. Self-Consistency with ToT
class SelfConsistentToT:
"""
Combine ToT with self-consistency: sample multiple paths, vote on answer.
"""
def __init__(self, llm, num_samples=5):
self.llm = llm
self.num_samples = num_samples
def solve(self, problem: str) -> str:
"""Solve with multiple ToT paths."""
solutions = []
values = []
for _ in range(self.num_samples):
# Run ToT
tot = ToT_BFS(self.llm, num_branches=3, evaluation_limit=10)
result = tot.solve(problem)
solutions.append(result['solution'])
values.append(result['value'])
# Vote on answer
answer = self._majority_vote(solutions)
return {
'solution': answer,
'confidence': max(values),
'paths': len(solutions)
}
def _majority_vote(self, solutions: list) -> str:
"""Select most common answer."""
from collections import Counter
counts = Counter(solutions)
return counts.most_common(1)[0][0]
3. Reflexion (Learning from Feedback)
class ReflexionToT:
"""
ToT with reflection: learn from failed attempts.
"""
def __init__(self, llm, max_attempts=3):
self.llm = llm
self.max_attempts = max_attempts
self.memory = [] # Failed attempts
def solve(self, problem: str) -> str:
"""Solve with reflection."""
for attempt in range(self.max_attempts):
# Run ToT
tot = ToT_BFS(self.llm, num_branches=3)
result = tot.solve(problem)
# Check if correct
is_correct = self._verify(result['solution'], problem)
if is_correct:
return result
# Store failure and reflect
self.memory.append({
'attempt': attempt,
'solution': result['solution'],
'problem': problem
})
# Generate reflection
reflection = self._reflect(attempt, result, problem)
# Incorporate into next attempt
problem = f"{problem}\n\nPrevious attempts failed because: {reflection}"
return result
def _verify(self, solution: str, problem: str) -> bool:
"""Verify solution."""
prompt = f"""Is this solution correct?
Problem: {problem}
Solution: {solution}
Answer yes or no:"""
return "yes" in self.llm.generate(prompt).lower()
def _reflect(self, attempt: int, result: dict, problem: str) -> str:
"""Generate reflection on failure."""
prompt = f"""Why might this solution be wrong?
Problem: {problem}
Solution: {result['solution']}
Explain the flaw:"""
return self.llm.generate(prompt)
Complete Implementation
class ToTSolver:
"""
Complete Tree of Thoughts solver with multiple strategies.
"""
def __init__(self, llm, strategy='beam'):
self.llm = llm
self.strategy = strategy
self.strategies = {
'bfs': ToT_BFS,
'dfs': ToT_DFS,
'best_first': ToT_BestFirst,
'beam': ToT_BeamSearch
}
def solve(self, problem: str, **kwargs) -> dict:
"""
Solve problem using selected strategy.
"""
if self.strategy == 'beam':
solver = ToT_BeamSearch(
self.llm,
beam_width=kwargs.get('beam_width', 3),
depth=kwargs.get('depth', 4)
)
elif self.strategy == 'bfs':
solver = ToT_BFS(
self.llm,
num_branches=kwargs.get('num_branches', 5),
evaluation_limit=kwargs.get('evaluation_limit', 20)
)
else:
solver = ToT_BFS(self.llm)
return solver.solve(problem)
def solve_with_fallback(self, problem: str) -> dict:
"""
Try multiple strategies and return best.
"""
results = []
for strategy in ['beam', 'bfs', 'best_first']:
self.strategy = strategy
try:
result = self.solve(problem)
results.append(result)
except Exception as e:
continue
if results:
return max(results, key=lambda r: r.get('value', 0))
return {'solution': 'Failed', 'value': 0}
Practical Applications
1. Mathematical Reasoning
class MathToT:
"""
ToT for math problem solving.
"""
def solve(self, problem: str) -> str:
"""Solve math problem with ToT."""
# Use beam search for mathematical reasoning
solver = ToT_BeamSearch(
self.llm,
beam_width=5,
depth=6
)
# Add math-specific prompting
problem = f"""Solve this math problem step by step.
Show your work and verify each step.
{problem}"""
result = solver.solve(problem)
# Verify solution
if self._verify_solution(result['solution'], problem):
return result
return result
def _verify_solution(self, solution: str, problem: str) -> bool:
"""Verify math solution."""
prompt = f"""Check if this solution is mathematically correct.
Problem: {problem}
Solution: {solution}
Is the solution correct? Answer yes or no:"""
return "yes" in self.llm.generate(prompt).lower()
2. Strategic Planning
class PlanningToT:
"""
ToT for complex planning tasks.
"""
def solve(self, goal: str, constraints: list) -> str:
"""Solve planning problem with ToT."""
problem = f"""Goal: {goal}
Constraints: {', '.join(constraints)}
Create a detailed plan:"""
# Use best-first to explore options
solver = ToT_BestFirst(self.llm, max_nodes=50)
return solver.solve(problem)
3. Creative Writing
class CreativeToT:
"""
ToT for creative content generation.
"""
def generate(self, prompt: str, style: str = "default") -> str:
"""Generate creative content with exploration."""
# Generate multiple options
solver = ToT_BFS(self.llm, num_branches=3, evaluation_limit=15)
# Use DFS for creative depth
dfs = ToT_DFS(self.llm, max_depth=5)
# Generate different approaches
approaches = solver.solve(f"Creative directions for: {prompt}")
# Expand best approach
expanded = dfs.solve(approaches['solution'])
return expanded['solution']
Comparison of Strategies
| Strategy | Best For | Pros | Cons |
|---|---|---|---|
| BFS | Solution diversity | Explores broadly | May miss deep solutions |
| DFS | Complex goals | Goes deep | May miss better paths |
| Best-First | Quality focus | Always expands best | Memory intensive |
| Beam | Efficiency | Memory efficient | May miss alternatives |
Best Practices
1. When to Use ToT
def should_use_tot(problem: str) -> bool:
"""
Determine if ToT is appropriate for problem.
"""
# Use ToT for:
# - Complex multi-step problems
# - Planning tasks
# - Problems with multiple solution paths
# - Cases where backtracking helps
complexity_indicators = [
'plan', 'strategy', 'design', 'optimize',
'multiple', 'options', 'consider', 'alternatives'
]
return any(ind in problem.lower() for ind in complexity_indicators)
2. Optimizing Parameters
def tune_tot_parameters(task_type: str) -> dict:
"""Get optimal parameters for task type."""
configs = {
'math': {'strategy': 'beam', 'beam_width': 5, 'depth': 6},
'planning': {'strategy': 'best_first', 'max_nodes': 50},
'creative': {'strategy': 'dfs', 'max_depth': 5},
'general': {'strategy': 'beam', 'beam_width': 3, 'depth': 4}
}
return configs.get(task_type, configs['general'])
3. Evaluation
def evaluate_tot(llm, problems: list, expected: list) -> dict:
"""Evaluate ToT performance."""
solver = ToT_BeamSearch(llm, beam_width=3, depth=4)
correct = 0
for problem, exp in zip(problems, expected):
result = solver.solve(problem)
# Simple check
if exp.lower() in result['solution'].lower():
correct += 1
return {
'accuracy': correct / len(problems),
'total': len(problems)
}
Future Directions in 2026
Emerging Innovations
- Monte Carlo ToT: Random sampling for exploration
- Learning to Search: Train search policies for ToT
- Multi-Modal ToT: Reasoning across modalities
- Collaborative ToT: Multiple LLMs reasoning together
Resources
Conclusion
Tree of Thoughts represents a fundamental advance in AI reasoning. By moving beyond linear chain-of-thought to tree and graph structures, we enable AI systems to explore multiple reasoning paths, backtrack from dead ends, and find better solutions to complex problems.
The key insight is that complex reasoning isn’t always a single pathโsometimes we need to explore alternatives, evaluate their promise, and adapt our strategy. ToT provides a framework for doing exactly that.
Whether you’re solving mathematical problems, creating strategic plans, or generating creative content, ToT and its variants offer a powerful approach to leveraging AI reasoning capabilities. As these techniques mature, expect to see them integrated into more sophisticated AI systems.
The future of AI reasoning lies not in finding a single answer, but in intelligently exploring the space of possibilitiesโand Tree of Thoughts is leading the way.
Comments