366 capsules, each filled with a single day of the year, held the futures of over 850,000 young men within their shells on the first of December, 1969. The first draft lottery for the Vietnam War was held for American men born between 1944 and 1950. This was the first military draft in the United States since World War II.
With capsules filled, they were poured into a tumbler and drawn, one-by-one, until all 366 dates were assigned a number. September 14th, April 24th, December 30th, and so on, until the last date, June 8th, was called.
Eventually, the first 195 dates were called to serve, but before the first boots hit the ground mathematicians were criticizing the results of the lottery, claiming that it was unfair to those with birthdays later in the year.
The 1969 Lottery Results
We’ll begin this analysis by discussing the results of the draft as it happened in 1969. If the draft lottery was truly fair, we would expect to see similar positions assigned on average for each month of the year.
In a theoretically fair draft, we would expect to see an average draft position of 183.5 for each month of the year. Obviously, that wasn’t what happened in real life. From the chart above, it is immediately apparent that months earlier in the year were given higher draft numbers on average than later months.
As we mentioned earlier, the first 195 dates drawn were called to service, so let’s also look at how many days from each month were called.
With the exception of October, each month from the second half of the year saw over half of its dates called within the first 195 numbers — December had a striking 26 out of 31 days called while March skated by with only one-third of its days being called.
Finally, we’ll look at the deviation from the expected average draft position by month. Calculating this value quantifies just how far from the norm each month was.
Recall that the expected average monthly draft position is 183.5. If we assign each monthly average position a value m1, m2, …, m12, we calculate the deviation D of the ith month as:
Applying the above formula, we built the chart seen above. A commonly used statistical test is to compare the total absolute deviation from the actual lottery against a random set of trials. We calculate the total absolute deviation as:
Our goal in the next section is to apply a Monte Carlo simulation to calculate the probability of a random trial exceeding the deviation seen from the 1969 draft. This will give us a good understanding of the fairness of the draft. The deviation from the Monte Carlo simulation we will denote as X, and the probability is defined as:
Monte Carlo Simulation
Before we get started on our problem, we’ll spend a brief amount of time explaining what a Monte Carlo simulation is.
Simply put, a Monte Carlo simulation is one of a family of probabilistic methods that uses a series of random trials to calculate the probability of a random event happening. Basically, you perform a trial over and over and calculate the percentage of time certain events occurred.
In our case, we will run 100,000 trials where we randomly assign draft positions to each day of the year. Once each trial has been conducted, we will then aggregate and find the average draft position for each month and calculate the deviation from expectation. Finally, we will then calculate the number of trials in which the deviation exceeds 272.3.
To perform this analysis we used Alteryx, but a similar analysis can easily be programmed in any computational programming language such as R or Python. You could even open up Microsoft Excel and code it in VBA if you’re so inclined.
Preparing the Simulation
As with all things in Alteryx, there are multiple ways to go about solving a problem. Running a Monte Carlo simulation is a perfect candidate for a thing called a “batch macro,” but those are more complicated to build, so we kept things as simple as possible by building the simulation in a standard workflow.
To begin, we start with a data set that contains the universe of possible values: each of the 366 days in the year.
We then create a data set that contains one record per iteration that our simulation will run. This is accomplished by using the Generate Rows tool and setting limit of the condition expression to the number of iterations desired.
Once these two data sets have been cross joined (via an Append Fields tool), you will have a very large data set. 10,000 iterations only takes a few minutes; 100,000 iterations takes a couple of minutes the first time it is run.
For each day within each iteration, a random uniformly distributed position is assigned. It is extremely important that these random assignments come from a uniform distribution. Since we are modeling what a fair lottery should look like, all possibilities must be equally likely.
To verify this, we’ll plot the average draft position by month across all iterations.
The chart above may be one of the most boring bar charts ever published, but it is exactly what we want to see — the difference between the highest and lowest average values is only about 0.2. If you’re satisfied that the results are uniform, you may continue.
With 100,000 iterations, we should have a pretty good idea of what a “normal” amount of deviation is, and where along the spectrum the 1969 draft lottery falls. The chart below plots the total absolute deviations from each of the iterations.
We can immediately see that very few records fall into the red “Exceeded Actual Deviation” group. In addition, the shape of the histogram almost perfectly models a bell curve, which is to be expected thanks to the central limit theorem.
Of the 100,000 iterations, a mere 1,177 exceeded the total deviation from the 1969 draft lottery. That works out to about a 1.2% chance that the lottery came from this random distribution.
In other words, this analysis points out that the draft lottery was very likely conducted unfairly, causing birthdays later in the year to have a higher likelihood of being assigned a low draft number.
An Alternate Monte Carlo Simulation
The simulation strategy above required the generation of tens of millions of data points: 100,000 iterations ⨯ 366 days = 36,600,600 points to randomly generate. An alternate method that requires less computing power is as follows: look at the permutation of monthly position ranks. Another benefit of this method is that we avoid the danger of outliers skewing data by transforming raw observations into ranks.
This method is from a family of statistics known as nonparametric statistics.
The table below lists each month of the year along with its rank based on average draft position. The highest average (i.e. the “safest” month) is assigned rank 1 and so on down the line.
Our hypothesis, like above, is that the draft lottery was fair. For the hypothesis to be true, we need to show the likelihood of this permutation randomly occurring.
We’ll denote the monthly rank set z = (5, 4, 1, …, 12) and define the total absolute distance D as:
Using Alteryx (or your preferred software package), we quickly calculate that the 1969 draft had a rank distance of 18.
Is a distance of 18 large or small? To determine this, and to make a judgment on the fairness of the draft, we will rank the months for each iteration created in the previous section. From there we can calculate the proportion of iterations in which the distance is less than or equal to 18.
The total distance must be a number between 0 and 72. The chart below depicts the distribution of simulated distance values.
Over 100,000 iterations, only 108 yielded a distance less than or equal to 18; that comes out to a proportion of about 0.1% of all iterations. This tells us that it would be extremely unlikely to have a lottery that randomly ranked the months as was witnessed by the 1969 draft lottery.
In other words: there is quite strong evidence that the 1969 Vietnam War draft lottery was unfair to those born later in the year.
This is an example where a seemingly simple thing — drawing capsules from a tumbler — can have profound impacts upon the lives of millions of people. As a November 2 baby (draft number 34 in the 1969 lottery), I’m glad that I wasn’t born between 1944 and 1950.
What Went Wrong?
How could such a simple procedure go so wrong? In theory, pulling capsules from a well-mixed tumbler should provide random results and more-or-less uniform results across the months.
The key word in the previous sentence is “well-mixed.” In fact, the tumbler was anything but well-mixed. When the tumbler was filled, January capsules were placed in first, followed by February, and so on. Evidently, the mixing process did a poor job of shuffling the capsules around, leaving the capsules in the tumbler striated like a layer cake.
Later drafts were conducted slightly differently. One tumbler was filled with numbers 1-366 and a second tumbler had each day of the year. The draft official pulled out a number and a birthday, so if November 2 was pulled with 205, all men born on November 2 were assigned position 205.
Can you model this lottery? How does its fairness correspond to the first lottery?
Do you have more questions about Tableau? Talk to our expert consultants today and have all your questions answered!