The Traveling Salesman Problem: From 19th Century Puzzles to Genetic Algorithms

A salesman must visit a set of cities exactly once and return home, minimizing total travel distance. This deceptively simple puzzle—the Traveling Salesman Problem (TSP)—has haunted mathematicians, computer scientists, and logistics planners for nearly two centuries. It remains one of the most studied problems in computational optimization.

Historical Origins

The TSP’s roots trace to the 1830s, when Irish mathematician William Rowan Hamilton and British mathematician Thomas Kirkman studied related mathematical problems involving traversing graph vertices. Hamilton created the “Icosian Game” in 1857, a puzzle requiring players to find paths visiting each vertex of a dodecahedron exactly once.

The problem as we know it crystallized in the 1930s. Karl Menger studied the “messenger problem” in Vienna, and Hassler Whitney at Princeton coined the name “Traveling Salesman Problem.” The 1940s brought the first serious computational attacks when mathematicians at RAND Corporation, including George Dantzig and Delbert Ray Fulkerson, began developing solution methods.

In 1954, Dantzig, Fulkerson, and Selmer Johnson solved a 49-city instance optimally—a remarkable achievement for the era. Their cutting-plane method laid groundwork for integer programming. By the 1970s, the TSP had become the canonical example of NP-hard problems, meaning no known algorithm can solve all instances efficiently.

Why It Matters

The TSP’s significance extends far beyond academic curiosity:

Logistics and Transportation: Delivery companies, from postal services to Amazon, solve TSP variants daily. FedEx and UPS route thousands of vehicles through millions of potential paths. Even small improvements translate to massive fuel savings.

Manufacturing: Circuit board drilling, where a machine must visit thousands of holes in optimal order, is a direct TSP application. Chip fabrication uses similar routing for wire placement.

DNA Sequencing: Genome assembly involves finding optimal orderings of DNA fragments—structurally identical to TSP.

Telescope Scheduling: Astronomical surveys must observe targets across the sky efficiently, minimizing slew time between observations.

The TSP belongs to the NP-hard complexity class. For n cities, there are (n-1)!/2 possible routes. Ten cities yield 181,440 routes—tractable by brute force. Twenty cities produce over 60 quadrillion routes. Fifty cities exceed the number of atoms in the observable universe.

This exponential explosion means exact solutions become impractical beyond a few dozen cities. Real-world applications with thousands of locations require approximation methods.

Genetic Algorithms: Evolution as Optimizer

Genetic algorithms (GAs) apply biological evolution principles to optimization. A population of candidate solutions evolves through selection, crossover, and mutation, gradually improving over generations.

For TSP:

Individual: A sequence of cities representing a tour
Fitness: Total tour distance (lower is better)
Selection: Better solutions reproduce more frequently
Crossover: Combine routes from two parent solutions
Mutation: Randomly swap cities within a route

GAs don’t guarantee optimal solutions but often find excellent approximations quickly—making them practical for large instances where exact methods fail.

Solving TSP with Python and DEAP

DEAP (Distributed Evolutionary Algorithms in Python) provides a flexible framework for implementing genetic algorithms. The library includes an official TSP example that demonstrates the core concepts. Here’s an adapted version solving a 20-city TSP:

import random
import numpy as np
from deap import algorithms, base, creator, tools

# Generate 20 random cities as (x, y) coordinates
NUM_CITIES = 20
random.seed(42)
CITIES = [(random.uniform(0, 100), random.uniform(0, 100)) for _ in range(NUM_CITIES)]

def distance(city1, city2):
    """Euclidean distance between two cities."""
    return np.sqrt((city1[0] - city2[0])**2 + (city1[1] - city2[1])**2)

# Precompute distance matrix
DISTANCE_MATRIX = [[distance(CITIES[i], CITIES[j])
                    for j in range(NUM_CITIES)]
                   for i in range(NUM_CITIES)]

def evaluate(individual):
    """Calculate total tour distance."""
    total = DISTANCE_MATRIX[individual[-1]][individual[0]]  # Return to start
    for i in range(len(individual) - 1):
        total += DISTANCE_MATRIX[individual[i]][individual[i + 1]]
    return (total,)  # Must return tuple

# DEAP setup
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))  # Minimize
creator.create("Individual", list, fitness=creator.FitnessMin)

toolbox = base.Toolbox()

# Individual: random permutation of city indices
toolbox.register("indices", random.sample, range(NUM_CITIES), NUM_CITIES)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.indices)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

# Genetic operators
toolbox.register("evaluate", evaluate)
toolbox.register("mate", tools.cxOrdered)  # Order crossover preserves city visits
toolbox.register("mutate", tools.mutShuffleIndexes, indpb=0.05)
toolbox.register("select", tools.selTournament, tournsize=3)

def main():
    random.seed(42)

    # Create population
    pop = toolbox.population(n=300)

    # Track best solution
    hof = tools.HallOfFame(1)

    # Statistics
    stats = tools.Statistics(lambda ind: ind.fitness.values)
    stats.register("min", np.min)
    stats.register("avg", np.mean)

    # Run evolution
    pop, log = algorithms.eaSimple(
        pop, toolbox,
        cxpb=0.7,      # Crossover probability
        mutpb=0.2,     # Mutation probability
        ngen=100,      # Generations
        stats=stats,
        halloffame=hof,
        verbose=True
    )

    best = hof[0]
    print(f"\nBest tour distance: {evaluate(best)[0]:.2f}")
    print(f"Best route: {best}")

    return pop, log, hof

if __name__ == "__main__":
    main()

Key Implementation Details

Order Crossover (cxOrdered): Standard crossover would produce invalid tours with duplicate or missing cities. Order crossover preserves visit sequences from parents while maintaining valid permutations.

Shuffle Mutation (mutShuffleIndexes): Swaps cities within a tour with probability 0.05 per position. This local modification explores neighboring solutions.

Tournament Selection: Randomly samples three individuals and picks the fittest. Balances selection pressure with diversity.

Running this code typically converges to within 5-10% of optimal for 20 cities within seconds. Larger instances may require parameter tuning or advanced techniques like island models or local search hybridization.

Beyond Basic GAs

Modern TSP solvers combine multiple techniques:

Lin-Kernighan heuristic: Local search that swaps edge sequences
Ant Colony Optimization: Probabilistic technique inspired by ant pheromone trails
Simulated Annealing: Accepts worse solutions with decreasing probability
Branch and Bound: Exact method pruning infeasible regions

The current record for proven optimal TSP solution stands at 85,900 cities, achieved in 2006 after years of computation. Practical applications typically use heuristics that find solutions within 1-2% of optimal.

Conclusion

The Traveling Salesman Problem bridges pure mathematics and practical engineering. Its simple statement belies profound computational complexity. While we can’t efficiently solve it exactly for large instances, approximation methods like genetic algorithms provide useful solutions for real-world applications.

The TSP reminds us that some problems resist elegant solutions. Progress comes not from breakthroughs but from accumulating clever techniques—each shaving percentages off tour lengths, each saving fuel, time, and resources at scale.

For logistics companies routing millions of packages, those percentages translate to billions of dollars. The salesman’s ancient puzzle remains urgently relevant.

Historical Origins#

Why It Matters#

Genetic Algorithms: Evolution as Optimizer#

Solving TSP with Python and DEAP#

Key Implementation Details#

Beyond Basic GAs#

Conclusion#