January 1, 2022

Using the Optimization Tool in Alteryx

By John Emery

As any user can attest, Alteryx is a fantastic and powerful tool. With well over 100 pre-built tools you can quickly and easily clean messy data, perform calculations, or analyze spatial information. Because Alteryx offers so many tools, it is reasonable that some of them will serve a specific niche and are little-known and little-understood.

Introducing: the Optimization tool, perhaps the most intimidating tool in Alteryx Designer.

The Optimization tool is used to, well, solve optimization problems. These are a class of problems where the goal is to maximize or minimize some value subject to one or more constraints. Microsoft Excel’s Solver add-in is very similar to the Optimization tool, so if you know one then learning the other should be a breeze.

In general, the most difficult aspect to using the Optimization tool is not understanding what optimization is nor understanding what data to connect to what anchors. Rather, the difficulty lies in understanding and formulating your optimization problem clearly from the onset. With a clear direction in mind, the rest of the pieces should fall in place smoothly.

Through the rest of this article we will:

  • Take a quick look at a simple linear optimization problem to gain an understanding of how they work
  • Walk through the various configuration options and input/output anchors for the Optimization tool
  • Step through an example using Alteryx

Optimization Problems

The simplest type of optimization problem is known as linear optimization. These are problems where the objective function—that is, the function to be maximized or minimized—is a linear function of the form:

The c values above are known as coefficients and represent values associated with their respective decision variables, noted as x.

Our goal is to maximize or minimize the objective function according to one or more constraints that form a feasible region of possible solutions.

Example Problem

To make some extra money, you do lawn care around your neighborhood. You offer two services: lawn mowing and hedge trimming. You charge a flat fee for each service—$40 for lawn mowing and $25 for trimming—so your goal is to optimize your work to maximize profit. On average, it takes 60 minutes to mow a lawn and 30 minutes to trim hedges. You do not want to work more than 6 hours per day. Furthermore, each lawn uses one gallon of fuel while trimming hedges uses up one gallon. You can carry 9 gallons of fuel for the day's work.

Although this certainly sounds like a textbook problem, this is exactly how optimization problems are posed in the “real world.” Our first goal is to model the problem by listing each of the key pieces of information in the correct format.

Decision Variables

These are the variables that are used to obtain the maximum or minimum that we wish to find.

  • L = Number of lawns to mow per day
  • H = Number of hedge trimming jobs per day

Pay close attention when determining your problem’s decision variables. A poor or incorrect choice here can make your life very difficult, while a correct choice will let the rest of the problem flow smoothly.

Coefficients

Since our goal is to maximize profit, the coefficients for our decision variables are the dollar values for mowing a lawn ($40) and trimming a hedge ($25).

Objective Function

Our goal is to maximize profit, therefore:

  • Max Profit → Profit = 40L + 25H

Explicit Constraints

Explicit constraints are those that are mentioned directly in the problem statement. Here, they are:

  • Hours Constraint:      1L + 0.5H ≤ 6
  • Fuel Use Constraint:  1L + 1H ≤ 9

Implicit Constraints

Implicit constraints are those that are not written directly in the problem statement. Rather, they are often common-sense constraints. Here, we have:

  • L ≥ 0
  • H ≥ 0

For this problem, the number of lawns, L, and the number of hedges, H, must be non-negative integers.

In this simple example, our linear optimization problem is two-dimensional, thus we can draw a diagram to more clearly see the interplay between variables and constraints.

The feasible region of possible solutions is bounded by our explicit and implicit constraints. In the image above, it can be seen below the red and blue lines for non-negative values of L and H.

The optimal solution will lie on one of the vertices of the feasible region. For this problem, that is either:

  • L = 0, H = 9
  • L = 3, H = 6
  • L = 6, H = 0

Plugging these values into our objective function, we find the following profit values:

  • 40*0 + 25*9 = 225
  • 40*3 + 25*6 = 270
  • 40*6 + 25*0 = 240

From these values, we can conclude that the optimal solution to this problem is when L = 3 and H = 6, with a value of $270. In general, the optimal solution to a linear programming problem such as this will be the point of the feasible region in the farthest top-right corner.

How to Use the Optimization Tool in Alteryx

Alteryx’s Optimization Tool, located in the Prescriptive tool palette, allows users to solve linear programming, mixed integer linear programming, and quadratic programming optimization problems. If you have a problem similar to the example we discussed above, it is likely that the Optimization tool is worth a try—it makes solving these types of problems much easier than developing a complicated macro to iterate through the problem.

Configuring the Tool

In the example above, determining the decision variables was the most challenging step. In my mind, the most difficult step in using the Optimization tool is understanding what data needs to flow into each anchor and how that data should be structured.

Input Anchors

The Optimization tool includes 4 input anchors, all of which are optional (only matrix-based models use input anchors):

  • O: Contains a list of variables, their coefficients, lower bounds, upper bounds, and data types.
  • A: A matrix of variables, their “costs,” and variable constraints. The way the data is structured here is dictated by your choice of matrix from above: constraints in rows, variables in rows, or SLAM.
  • B: A matrix of model constraints by variable and overall.
  • Q: Used for quadratic programming problems to specify the quadratic portion of the problem.

The Configuration Menu

There are combinations of configurations, depending on your specific needs. Your selection in the “Select input mode” drop-down will dictate the appearance of the configuration menu.

Select Input Mode – Specify the model as matrices

This is the default setting, and most likely the one you will use. It requires users to connect to the O, A, and B (and optionally, Q) anchors.

When this input mode is selected, you then have to select options for the following:

  • Select problem type:
    • Linear program
    • Mixed Integer program
    • Quadratic program
  • Select solver:
    • Glpk (for linear and mixed integer)
    • Symphony (for linear and mixed integer)
    • Quadprog (for quadratic only)
  • Maximize Objective:
    • Defaulted to no (i.e. a minimization problem)
  • Select constraint mode for input Anchor A
    • Dense matrix, constraints in rows
    • Dense matrix, variables in rows
    • Sparse (SLAM) matrix
  • Display field mapping for input Anchor O
    • A yes/no toggle, defaulted to no
    • When set to no, you must use specific field names for the mapping to occur correctly
    • When set to yes, you can name your fields anything you like and map them manually

Select Input Mode – Enter the model manually

This option allows you to enter in all variables, coefficients, and constraints as a comma-separated list. Using this setting avoids the need to create data sets, as none of the input anchors are used.

Select Input Mode – Specify the model from a file

This option allows you to connect to a special optimization file that contains your data. Again, no input anchors are needed when using this option.

The only new option here is:

Solving the Problem

We have now worked through an example manually and discussed the configuration options of Alteryx’s Optimization tool. Now we will finally work through the same example problem in Alteryx.

For this example, I will be using the matrix input method, as that is the most common way to use the Optimization tool.

To recap, our problem is the following:

  • Maximize profit from mowing lawns and trimming hedges
  • Limit time spent to no more than 6 hours per day
    • Each lawn takes 1 hour to complete
    • Each hedge takes 0.5 hours to complete
  • Limit fuel use to no more than 9 gallons per day
    • Lawn and hedge jobs both use 1 gallon of fuel

The Starting Data Set

To begin, we will use a Text Input tool to create our data set. I like to put each variable—the two different job types, in this case—on the rows and list the value (Price) and costs (Hours and Fuel) as separate columns. This offers a simple and easy-to-read view of the data that we are working with.

For the next step we will use a Formula tool to identify each variable’s lower and upper bounds (where applicable) and data type. For this problem, we know that all values must be non-negative, thus our lower bound is 0. There is no specified upper bound, so for this problem we can ignore it. Finally, we know that jobs come in integer units (1, 2, 3, …) so the data type is set to “I” for integer.

The O Input

Now that the data is ready, we must split it into two streams: one for the O input and one for the A.

The O input should contain our variables, coefficient, lower and upper bounds, and data type columns. In older versions of Alteryx these were required to have specific names (“variable” or “ub” for example), but in newer versions you can manually map any field.

Using a Select tool, deselect any constraints columns—Fuel and Hours, here. Connect the Select tool to the Optimization tool’s O input anchor.

The A Input

The A input contains information on each variable’s constraints. To set up this input, all we need to do is connect a Select tool and select only the variable and constraint fields. Thus, for this problem, we will only have three columns (Variable, Hours, Fuel).

The B Input

We have saved the B input for last because it does not come directly from our initial data set. Instead, we will create a new data set using another Text Input tool. This new input has three columns: a string constraint label, the direction of the constraint, and the right-hand side (value) of the constraint.

In this problem, we have two sets of constraints:

  • Total hours must be less than or equal to 6
  • Total fuel must be less than or equal to 9

Thus, our B input data set looks like this:

Connecting it to the Optimization Tool

Now that we have created the O, A, and B data sets (we don’t need a Q data set because this is a linear optimization problem), we can configure the Optimization tool itself.

Connect each of the three data sets as in the image above. Next, click on the Optimization tool to begin configuring it.

The default settings are:

  • Select input mode: Specify the model as matrices
  • Select problem type: Linear Program
  • Solver Solver: Glpk
  • Maximize Objective: Off
  • Select constraint mode for Input Anchor A: Dense matrix, constraints in rows
  • Display field mapping for Input Anchor O: Off

The first three settings are exactly what we need for this basic linear programming problem, so leave them as-is. Since our goal is to maximize profit, make sure to turn that setting on. The data we constructed for input A has each variable in its own row, so change this setting to “dense matrix, variables in rows.”

Finally, turn on field mapping for input A and select the appropriate field for each option.

Reading the Output

We have now done all of the hard work. We constructed the data, added necessary calculated fields, and split the data into our O, A, and B input streams. We finally configured the Optimization tool according to our model’s needs.

Connect a Browse tool to each of the three output anchors: S, D, and I.

  • S: Displays the “objective value,” which is the value we were attempting to optimize; displays the value for each variable required to reach that objective value.
  • D: Shows a description of the model as a whole. Includes information such as the problem type, objective value, number of constraints, and more.
  • I: An interactive report that combines information from the previous two outputs. This is a really interesting view worth checking out—see the image below.

After all that work, we finally have an answer: mowing 3 lawns and trimming 6 hedges gives us a total profit of $270. The Slack column in the bottom of the output tells us how much of either constraint went unused. In this case, we used up exactly our 6-hour and 9-gallon limits.

Closing Words

This blog post ran a bit long. You can literally take graduate-level courses on this subject (speaking from experience, they are not easy), so there is a great deal of information that I had to leave on the cutting room floor.

If you need additional help, check out Alteryx’s built-in prescriptive analytics samples. There are some excellent problems available for you to really get your hands dirty with the optimization tool. Or, if you have more questions about Alteryx, you can talk to our expert consultants today and have all your questions answered!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit