Nature magazine is carrying a story reporting on the recent work of Gary McGuire of University College Dublin. McGuire's team have been studying the question of the minimal number of clues that must be present in a valid Sudoku puzzle, and recently announced their findings.
What's particularly interesting about their technique is that it combines Mathematics and Computer Science, using techniques from both disciplines:
A potential way to demonstrate that could be to check all possible completed grids for every 16-clue puzzle, but that would take too much computing time. So McGuire simplified the problem by designing a 'hitting-set algorithm'. The idea behind this was to search for what he calls unavoidable sets, or arrangements of numbers within the completed puzzle that are interchangeable and so could result in multiple solutions. To prevent the unavoidable sets from causing multiple solutions, the clues must overlap, or 'hit', the unavoidable sets. Once the unavoidable sets are found, it is a much smaller—although still non-trivial—computing task to show that no 16-clue puzzle can hit them all.
Having spent two years testing the algorithm, McGuire and his team used about 7 million CPU hours at the Irish Centre for High-End Computing in Dublin, searching through possible grids with the hitting-set algorithm. “The only realistic way to do it was the brute force approach,” says Gordon Royle, a mathematician at the University of Western Australian in Perth who had been working on the problem of counting 17 clue puzzles using different algorithms. “It’s a challenging problem that inspires people to push computing and mathematical techniques to the limit. It’s like climbing the highest mountain.”
I've been reading through the paper and enjoying it, even if parts of it are beyond my skill (hey, Sudoku plus Mathematics plus Computer Science -- this is nerd heaven in my view!).
The authors note that their "hitting set" approach could have a number of applications beyond this Sudoku question:
This new algorithm has many other potential applications than just sudoku. To begin with, it is applicable to any instance of the hitting set problem, and such situations frequently occur in Bioinformatics (e.g., gene expression analysis), Computer Networks and Software Testing . We note that the vertex cover problem from graph theory can be reduced to the hitting set problem, and so can the set cover problem, so both of these are actually equivalent to the hitting set problem. The set cover problem has applications to interference in cellular networks.
In case you find that the paper is a bit much to digest, let me also recommend this wonderful video, produced by Brady Haran, in which James Grime walks us through the solution, explaining the overall technique, and using some nifty illustrations to make it clear.
I wish they had had videos like this when I was studying Maths in school!
Of course, if they had, I might never have switched to Computer Science.
While wandering around reading about all of this, I came across this book: Taking Sudoku Seriously: The Math Behind the World's Most Popular Pencil Puzzle. It looks pretty interesting; I might give it a try (once I finish some of my back-logged books. And the Kindle edition is actually reasonably priced!
Oh, and lastly: thanks Maggie Koerth-Baker for throwing a pointer to this work up on Boing Boing; you are quickly becoming one of my favorite writers!