July 6, 2021

How I Used Alteryx to Solve Sudoku Puzzles 4,000 Times Faster

By Michael Green

Anyone who knows me well knows that I love board games and puzzles. I also love to develop algorithms that play board games or solve puzzles for me. In this article, I’ll show you how I used an analytics software called Alteryx to create a robot that solves Sudoku puzzles for me – thousands of times faster than I can.

First – just in case you’re not familiar with Sudoku – let’s review the rules:

  1. For each of the cells in a 9×9 grid, a “digit” between 1 and 9 must be selected
  2. No digit may be repeated along any row, column, or “box” (3×3 square)
  3. Have fun!
A picture of an unsolved and solved Sudoku puzzle

Left: An unsolved Sudoku puzzle.

Right: Solution to this Sudoku puzzle.

It seems simple, right? Wrong! It can be incredibly challenging to solve a Sudoku puzzle. That’s what makes it fun. It is believed that the average Sudoku player spends approximately 20 minutes solving a typical Sudoku puzzle. For very hard puzzles, that number can be much higher.

While the game itself can be challenging and complex, several strategies can be used successfully to find the solution. Sometimes, these strategies can be very repetitive and require a lot of manual effort to execute (the game is typically played on paper with a pencil). This tedious manual effort can cause an infraction to Rule 3: “Have fun!”

 

After playing Sudoku many times myself, I realized that I was using the same patterns, again and again, to find the solutions to these puzzles. My brain was having fun, but my writing hand was getting sore. So, I decided to make a robot that can follow the same thought process that I was using but hopefully solve the puzzles a lot faster and without manual effort. A software tool that I enjoy using – Alteryx – is really useful for automating “analytic processes” such as this.  

But wait! How can a Sudoku puzzle be solved with an “analytic process”?

*SPOILER ALERT*

This article takes a deep dive into the inner workings of data, algorithms, Sudoku, Alteryx, and macros. While it will help to have some familiarity with these and related concepts, prior experience is not required to enjoy this article. I will do my best to explain each concept as it becomes relevant. That said, this is a very technical article, so I won’t be offended if you “skim” until the final section, which shows the results/analysis of the project. 

If you’re feeling adventurous and want to dive into the details, read on!

The first step in this “analytic process” represents a Sudoku grid as data rather than an image. For example, the above Sudoku puzzle can be translated into 81 consecutive digits in one long string of text:

				
					007491605200060309000007010058600004003000090006200187904070002670830000810045000
				
			

Here, each digit represents a single cell of the Sudoku grid, from left to right and top to bottom. Zeros represent an empty cell on the grid. Digits 1-9 represents a cell whose value is already known.

We can create a dataset that includes many of these text strings, stacked on top of each other, where each row represents a unique Sudoku puzzle. For example, here is how we list 4 Sudoku puzzles as data:

If we can find a way to process this puzzle data using Alteryx and solve for the unknown cells (zeros), we can automate the problem-solving approach and save a bunch of time!

Here’s how Alteryx works:

  1. Each data-processing step is broken into a small chunk of code.
  2. The chunk of code is packaged inside a brightly colored icon called a “tool.”
  3. Each tool can be dragged and dropped onto a blank white “canvas” to form a sequence of data-processing steps.
  4. The output of a tool can be “plugged in” to the input of another tool, and in this way, you can form a sequence of data-processing steps that are performed in a certain order.
  5. When several tools are combined, they form a workflow (“robot” or “algorithm”) that can run automatically.

For example, here’s the workflow that I designed to solve Sudoku puzzles:

Screenshot of a Sudoku Puzzle solver

This workflow has four main parts, each consisting of multiple tools:

  1. Input data.  This includes an Input tool that reads a CSV file containing 1,000 randomly selected Sudoku puzzles and a Select tool that allows me to clean my data before using it in the rest of the workflow.
  2. Run macro. This part looks deceptively simple. The “macro” is actually an entire workflow hidden inside one tool. This macro is where I keep the “secret sauce” – the algorithm that actually solves Sudoku puzzles. I will dive deeper into this section in a few moments.
  3. Verify solutions.  Once the macro has generated solutions for all 1,000 puzzles, I want to ensure that these solutions are correct. I cross-reference the solutions against a list of known solutions that have been published online. This section includes Join, Filter, and Union tools – which each play their part in confirming that the solutions generated by the algorithm are correct.
  4. Export results.  I would like to sort the solved Sudoku puzzles by their relative difficulty level and save a CSV file that contains the solved puzzle data.

Whenever I “run” this workflow, Alteryx executes the above data-processing steps automatically. If I decide to upload a different dataset of random Sudoku puzzles and run the same workflow on the new data, the same process will be applied.

The result is that I start with a list of 1,000 unsolved Sudoku puzzles, and the workflow creates a list of verified solutions automatically.

So that’s it! By dragging and dropping a few “tools” onto the Alteryx white space “canvas” and connecting them together, I have designed a fully automated data-processing workflow that can solve Sudoku puzzles.

“But wait! That can’t really be all there is to this story.”

Correct! There’s a whole lot more to this story. We haven’t seen any Sudoku-solving method yet, just the framework for transforming puzzle data into solutions IF we know how to solve the puzzles. The plot thickens when we look a little closer at Part 2 of 4 within my workflow: the macro that I mentioned earlier. Let’s dive in a little deeper.

Remember that I told you that an entire workflow could be “hiding” inside an icon called a “macro”? In this example, I have created a “batch macro” that cycles through each Sudoku puzzle in my list, and finds the solution for each individually. Let’s see what is inside the batch macro:

Screenshot of Batch Macro

This macro contains an entirely new workflow, consisting of 3 main parts:

  1. Input 1 puzzle.  While my original workflow processes the data for 1,000 puzzles in a single dataset, my macro uses a special tool called a Control Parameter to ensure that only one record from the original workflow is processed at a time. The Control Parameter is taking data from my original workflow (through the Input node of the macro) but “throttling” it so that only one record can get through at a time.
Macro containing algorithm

2.  Solve one puzzle.  Ok, you might notice that there is only one tool in this section. Bear with me for another minute! The “secret sauce” of the Sudoku-solving algorithm isn’t visible yet. In fact, there is ANOTHER macro here, which has yet ANOTHER workflow inside it! We’ll dive into this in a moment. However, for now, assume that this new macro can find a solution for a single Sudoku puzzle.

3.  Output 1 solution.  If we assume that we are able to generate a solution for a single Sudoku puzzle, we now need a way to send the solved puzzle back into the original workflow. The Macro Output tool enables us to do this. After the macro has run 1,000 times (once for each puzzle), then all 1,000 solutions will be combined together and sent back to the original workflow (through the Output node of the macro).

Macro output

So this is great and all, but we still haven’t solved any Sudoku puzzles yet. We have loaded data into Alteryx, found a way to separate the data into 1,000 individual runs of a batch macro (which itself executes another mysterious macro containing the algorithm), combined and sent all 1,000 results back into the original workflow, then validated that the proposed solutions are actually correct, and then finally exported the data into a spreadsheet.

But HOW do we actually solve the Sudoku puzzle?

The secret lies within the final macro I mentioned. You know, “the macro within a macro.” Like the Inception movie. In the above screenshot, the final macro is placed in Section 2 (“Solve 1 Puzzle”) of the batch macro.

Let’s open up this final macro and peek inside:

Ahh, now we’re getting somewhere. This may look a little complicated, but we can finally begin to see the components of a Sudoku puzzle solver. This macro is called an “Iterative Crosshatching Macro.”

  • Iterative.   An Iterative macro will apply a given method, again and again, until certain criteria are met. For example, we can design an iterative macro that will not stop running until every cell of a Sudoku grid has been solved.
  • Crosshatching. The Sudoku-solving method we use in this macro is called Crosshatching.

To understand why we need an iterative Alteryx macro at this point, let’s take a step back and discuss the Crosshatching algorithm in more detail.

Crosshatching is a process of elimination that may be used to solve Sudoku puzzles. Recall Rule #2 of Sudoku: “No digit may be repeated along any row, column or box.” If we pick any digit (say, 1) and try to narrow down where that digit may be found, we may be able to find one or more places where we can legally place that digit.  

Here’s an example where we seek to find the digit “1” in the lower-right box of a Sudoku grid:

Crosshatching Example

 Left: Possible cells for the digit “1”.

Right: Crosshatching helps us narrow down the correct placement. 

Here, we note that Column 7, Column 8, and Row 9 already have the digit “1” in them. There is only one empty cell remaining in this box, which does not violate the rules of Sudoku: Row 8, Column 9.

Note, in this case, we were able to safely place a “1” in the cell, as shown. However, this method does not always result in finding a definitive placement of a digit. Only if we already have enough evidence to reject every other location in a box can we safely place a digit. In many cases, we may SUSPECT that a digit might be placed in a certain empty cell, but we cannot be CERTAIN until we have narrowed down all other cells within the box.

So let’s say that we have checked all possible “1” locations. Maybe we found a couple of cells where a “1” can be confidently placed, and maybe we found a few other cells which are a “maybe.” Let’s just keep the “1” in the cells where we are CERTAIN it belongs there.

Now what? Our next action could be to search for any cells where we can place “2”, “3”, and so on. We can search digits 1-9 and, if we are confident that any of these digits belong in any cells, we can add them to our grid.

Realistically, after we have done this, we will probably only have made a handful of changes to our Sudoku grid. There are many remaining cells to be solved because we could not identify the proper digit for every cell with certainty (at least not yet).  

We may want to search all digits 1-9 again. Since we have presumably added numbers to our grid during the first iteration, we are possibly better equipped to narrow down additional digits that were not obvious during our first iteration.

And we can repeat this process, again and again, until we have filled every cell with the proper digit. Once we have done this, we will have solved the entire puzzle.

This is where the concept of an iterative macro in Alteryx comes into play. We want to apply the Crosshatching method, again and again, for all digits – until we have found all correct placements of digits into all empty cells.

So with this in mind, let’s take another look at my Iterative Crosshatching Macro, which I introduced a few moments ago – and dig into the details of what it’s doing in 3 main parts.

Iterative Crosshatching Macro

1. Input & Reshape Puzzle.  In this first section, we can transform the “long string of text” format of the puzzle into something more useful for Crosshatching. For example, Row Number, Column Number, Box Number, and Given Digit, which are contained in that particular cell. (Note, I count Rows and Columns from the top-left corner of the grid.) I used several tools, including RegEx and Transpose, to split each digit into its own row, and then I used a Join tool to combine the digit with its corresponding Row, Column, and Box identifier. Here is what the data looks like, before and after this transformation:

2. Execute One Iteration of the Crosshatching Algorithm. This section performs a Crosshatching method for all digits 1-9 on the given puzzle. Tools in this section include Summarize, Join, Formula, Append, and others.  If any digits may be legally placed into the Sudoku grid after the Crosshatching iteration is applied, they are added to the grid.  In the following example, one iteration of the Crosshatching method in this macro enables us to solve for the digit “1” in the cell corresponding to Row 2, Column 3, and Box 1 of our Sudoku puzzle.  

Execute 1 iteration of the Crosshatching Algorithm

Left: Puzzle data before the Crosshatching method is applied. 

Right: Puzzle data after one iteration.

Because placing a “1” anywhere else in Box 1 would create a conflict with other 1’s in the grid (which are currently positioned in Row 1, Row 3, and Column 2), there is only one cell in this box where we can confidently place a “1” (Row 2, Column 3).  The puzzle data is updated to reflect this newfound cell value of 1.

1. Reshape & Output Puzzle.  The puzzle is transformed back into a long string of numbers, with the new digits added.

Digit "1" has been added

Left: Puzzle data, with results of Iteration #1 added.    

Right: Reshaped data, ready for Iteration #2.

Now, here’s the critical part. Each iteration only solves a handful of digits at a time. For this iterative macro to work, it must keep running until all cells in the Sudoku puzzle have been solved.

How does Alteryx know how many times to run the iterative macro?

You may have noticed that Part 3 (Reshape & Output Puzzle) included TWO macro outputs. Let’s zoom in to Part 3 and take a look.

Reshape and output puzzle

Within the blue box, I have highlighted the two Macro Output tools. The top output is only for fully solved puzzles, and the bottom output is only for partially solved puzzles. I have included a Filter tool that tests whether the current iteration has produced a fully solved puzzle or a partially solved puzzle.

If there are no remaining empty cells, then the puzzle is SOLVED, and the puzzle data is passed to the Macro Output tool on the top. If there are any remaining empty cells, the puzzle is UNSOLVED, and the puzzle data is passed to the Macro Output tool on the bottom.

Now here’s the beauty of how an iterative macro works: As long as the puzzle data continues to be passed to the bottom Macro Output tool (which is called an “iteration output”), the macro will keep running. I have programmed Alteryx to stop running the iterative macro as soon as there is no data left in the bottom macro output tool.  

As soon as the puzzle is fully solved, the filter will ensure that the puzzle data is sent only to the top output tool. Because there is no longer any data sent to the bottom “iterative” output, the iterative macro will stop running.

And, voila!  

Once the iterative macro has stopped running, the current puzzle has been solved.

Once the current puzzle has been solved, the batch macro can move on to a new puzzle.

Once all 1,000 puzzles have been solved, the solved puzzle data is sent back to the workflow, verified, and then exported into a CSV file.

So, let’s recap:

To solve 1,000 Sudoku puzzles with Alteryx:

  1. Create the main workflow, which inputs the 1,000 puzzles data, runs a batch macro, verifies that the solutions from the batch macro are correct, and then exports the results.
  2. Design the batch macro to take one puzzle at a time and run an iterative macro to solve the puzzle.
  3. Design the iterative macro to apply the Crosshatching algorithm to a single puzzle, scanning for each digit 1-9 and finding legal placements of digits within empty cells. The macro will continue running and solving a few cells per iteration until all cells have been solved.

Phew! It seems like a lot of work to solve Sudoku puzzles. Let’s see if it pays off.

I ran this workflow for 1,000 randomly selected puzzles. I also added a calculated field in Alteryx called “count of iterations” to count the number of times that the iterative macro needed to run for each puzzle in order to find a complete solution. In this way, I captured a relative difficulty level of each puzzle, indicating how many times Crosshatching needed to be applied to solve the puzzle.  

The entire workflow ran in 5 minutes. That’s a quarter of the average time it takes a person to solve a Sudoku puzzle. And it solved approximately 1,000 puzzles within that time.  Yay!

And now, the bad news. I said “approximately” 1,000 puzzles were solved. In fact, not every single puzzle was able to be solved with the Crosshatching algorithm. Some of the most challenging Sudoku puzzles require advanced techniques beyond the scope of Crosshatching. In the case of my 1,000 randomly selected puzzles, 14 puzzles (1.4%) were not able to be solved with Crosshatching. On the plus side, this means 98.6% of the puzzles were solved!

I thought it would be interesting to take a look at the distribution of the 1,000 puzzles in this study, along with their levels of difficulty (as measured by the number of iterations required by Crosshatching to fully solve them). Here is a chart of the results:

Chart of Results

As shown here, the majority of Sudoku puzzles were solved within 5-7 iterations of the Crosshatching algorithm. The easiest puzzles could be solved in as few as three iterations, and the hardest puzzles required 15 iterations. As mentioned earlier, 1.4% of the puzzles could not be solved at all with Crosshatching. In this case, the Iterative macro continued to run until the maximum number of iterations (which was set to 20) and then moved on to the next puzzle.

I hope you’ve enjoyed this article about how to design a Sudoku-solving robot with Alteryx!

Together, we have explored:

  • How Sudoku works, and how to apply the Crosshatching algorithm to solve Sudoku puzzles
  • How Alteryx works, and how to design Workflows, Batch Macros, and Iterative Macros
  • How to build a Sudoku-solving algorithm that solves nearly 1,000 puzzles in 5 minutes – for a total speed boost of about 4,000 times vs. an average player

 

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit