1  What is Probability?

We hear the word “probability” often. Here are just a few quotes from recent online articles which mention probability.

You have some familiarity with the words “probability”, “chance”, “odds”, or “likelihood” from everyday life. But what do we really mean when talk about “probability”?

This chapter provides a brief but non-technical introduction to randomness and probability. Many of the topics introduced in this chapter will be covered in much more detail in later chapters.

1.1 Randomness

A wide variety of situations involve probability. Consider just a few examples.

  • The probability that you roll doubles in a turn of a board game.
  • The probability you win the next Powerball lottery if you purchase a single ticket, 4-8-15-16-42, plus the Powerball number, 23.
  • The probability that a randomly selected Cal Poly student is a California resident.
  • The probability that the high temperature in San Luis Obispo, CA tomorrow is above 90 degrees F.
  • The probability that Hurricane Martin makes landfall in the U.S in 2028.
  • The probability that the Philadelphia Eagles win the next Superbowl.
  • The probability that the Republican candidate wins the 2032 U.S. Presidential Election.
  • The probability that extraterrestrial life currently exists somewhere in the universe.
  • The probability that Alexander Hamilton actually wrote 51 of The Federalist Papers. (The papers were published under a common pseudonym and authorship of some of the papers is disputed.)
  • The probability that you ate an apple on April 17, 2019.

Example 1.1 What is one feature that all of the situations have in common? How are the situations above similar, and how are they different? Is the interpretation of “probability” the same in all situations? The goal here is to just think about these situations, and not to compute any probabilities (or to even think about how you would).

Solution 1.1. This example is intended to motivate discussion, so you might have thought of some other ideas we don’t address here. That’s good! And some of the things you considered might come up later in the book. Here are a few observations.

The one feature that all of the situations have in common is uncertainty. Sometimes the uncertainty arises from a physical phenomenon that can result in multiple potential outcomes, like rolling dice or drawing the winning Powerball number. In other cases, there is uncertainty because there will be only one outcome but it is in the future, like tomorrow’s high temperature or the result of the next Superbowl. But there can also be uncertainty about the past: there are some Federalist papers for which the author is unknown, and you probably don’t know for sure whether or not you ate an apple on April 17, 2019.

Whenever there is uncertainty, it is reasonable to consider the relative likelihood or plausibility of possibilities. For example, even though you don’t know for certain whether you ate an apple on April 17, 2019, you might think the probability is high if you’re usually an apple-a-day person (or you were in 2019). We don’t know for sure what team will win the next Superbowl, but we might think that the Eagles are more likely than the Cleveland Brows to be the winner.

While all of the situations in this example involve uncertainty, it seems that there are different “types” of uncertainty. Even though we don’t know which side a die will land on, the notion of “fairness” implies that the sides are “equally likely”. Likewise, there are some rules to how the Powerball drawing works, and it seems like these rules should determine the probability of drawing that particular winning number.

However, there aren’t any specific “rules of uncertainty” that govern whether or not you ate an apple on April 17, 2019. You either did or you didn’t, but that doesn’t mean these two possibilities are necessarily equally likely or plausible. Regarding the Superbowl, of course there are rules that govern the NFL season and playoffs, but there are no “rules of uncertainty” that tell us precisely how likely any particular team is to win any particular game, let alone how likely a team is to advance to and win the Superbowl.

It also seems that there are different interpretations of probability. Given that a six-sided die is fair, we might all agree that the probability that it lands on any particular side is 1/6. Similarly, given the rules of the Powerball lottery, we might all agree on the probability that a drawing results in a particular winning number. However, there isn’t necessarily consensus about what the high temperature will be in San Luis Obispo tomorrow; different weather prediction models, forecasters, or websites might provide different values for the probability that the high temperature will be above 90 degrees Fahrenheit. Similarly, Superbowl odds might vary by source. Situations like tomorrow’s weather or the Superbowl where there is no consensus about the “rules of uncertainty” require some subjectivity in determining probabilities.

Finally, some of these situations are naturally repeatedable. We could (in principle) roll a pair of dice many times and see how often we get doubles, or repeat the Powerball drawing over and over to see how the winning numbers behave. However, many of these situations involve something that only happens once, like the next Superbowl, tomorrow, or April 17, 2019. Even when the phenomenon happens only once in reality, we can still develop models of what might happen if we were to hypothetically repeat the phenomenon many times. For example, meteorologists use historical data and meteorological models to forecast many potential paths of a hurricane.

The subject of probability concerns random phenomena.

Definition 1.1 A phenomenon is random if there are multiple potential possibilities, and there is uncertainty about which possibility is realized. Uncertainty is understood in broad terms, and in particular does not only concern future occurrences.

Some phenomena involve physical randomness, like flipping coins, rolling dice, drawing Powerballs at random from a bin, or random digit dialing. In many other situations randomness just vaguely reflects uncertainty. We will refer to as “random” any scenario that involves a reasonable degree of uncertainty.

In this book, “random” and “uncertain” are synonyms. Unfortunately, some of the everyday meanings of “random”, like “haphazard” or “unexpected”, are contrary to what we mean by “random” in this book. For example, we would consider Steph Curry attempting a free throw to be a random phenomenon because we’re not certain if he’ll make it or miss it; but we would not consider this process to be haphazard or unexpected.

Random does not necessarily mean equally likely. In a random phenomenon, certain outcomes or events might be more or less likely than others. For example,

  • About 84% of students at Cal Poly are California residents, so it’s more likely than not that a randomly selected Cal Poly student is a California resident.
  • Not all NFL teams are equally likely to win the next Superbowl.

Uncertainty is not something to be feared, and randomness is often desirable. In particular, many statistical applications often employ the planned use of randomness with the goal of collecting “good” data. For example,

  • Random selection involves selecting a sample of individuals at random from a population (e.g., via random digit dialing), with the goal of selecting a representative sample.
  • Random assignment involves assigning individuals at random to groups (e.g., in a randomized experiment), with the goal of constructing groups that are similar in all aspects so that the effect of a treatment (like a new vaccine) can be isolated.

1.1.1 Exercises

Exercise 1.1 For each of the following, provide examples of random phenomenon that fit the description. Try to think of examples that are interesting to you personally!

  1. Just two possible outcomes, but they are not equally likely.
  2. Physically repeatable (at least in principle).
  3. Well defined “rules of randomness”.
  4. Involves subjectivity in determining probabilities.
  5. Involves uncertainty about the future.
  6. Involves uncertainty about the present or past.
  7. Associated with the planned use of randomness in a particular statistical study.

1.2 Interpretations of probability

The probability of an event associated with a random phenomenon is a number in the interval \([0, 1]\) measuring the event’s likelihood, degree of uncertainty, or relative plausibility. A probability can take any value in the continuous scale from 0 to 1, and can be reported either as a decimal (e.g., 0.305) or as a percent (e.g., 30.5%).

A few examples of probabilities:

  • The probability that a fair coin lands on heads 5 times in 10 flips is 0.246.
  • A group of people all put their names in a hat for a Secret Santa gift exchange; the probability that at least one person in the group draws their own name is 0.632.
  • The probability that a randomly selected full term baby weighs more than 4000 grams at birth is 0.09.
  • The probability that a magnitude 5+ earthquake occurs somewhere in the world within the next 48 hours is 0.96.
  • According to FiveThirtyEight as of Nov 8, 2016, the probability that Donald Trump would win the 2016 U.S. Presidential Election was 0.286.

Throughout this book we will see many methods for computing and approximating probabilities such as these. But given the value of a probability, what does it mean? For example, what does it mean for there to be a “30% chance of rain tomorrow”? Just as there are various types of randomness, there are a few ways of interpreting probability, most notably, long run relative frequency and subjective probability.

1.2.1 Long run relative frequency

One of the oldest documented1 problems in probability is the following: If three fair six-sided dice are rolled, what is more likely—a sum of 9 or a sum of 10? Let’s try to answer this question by simply rolling dice and seeing what happens. Roll three fair six-sided dice, find the sum, and repeat; then see how often we get a sum of 9 versus a sum of 10. Table 1.1 displays the results of a few repetitions. We encourage you to try this out on your own now; of course, your results will naturally be different from ours.

Table 1.1: Results of 10 sets of three rolls of a fair six-sided die.
Repetition First roll Second roll Third roll Sum
1 3 6 3 12
2 1 2 4 7
3 4 2 4 10
4 2 2 1 5
5 4 6 1 11
6 5 1 2 8
7 3 1 3 7
8 5 5 6 16
9 5 6 3 14
10 4 5 2 11

A sum of 9 occurred in 0 repetitions and a sum of 10 occurred in 1 repetition. We see that a sum of 10 occurred more frequently than a sum of 9, but our results should not be very convincing. After all, we only performed 10 repetitions and your results are probably different than ours. We can get a much better picture by performing many, many repetitions. This would be a time consuming process by hand, but it’s quick and easy on a computer. We can have a computer simulate, say, one million repetitions—each repetition resulting in the sum of three rolls—to produce a table like Table 1.1 but with one million rows instead of 10, and then count how many repetitions result in each possible value of the sum. Figure 1.1 displays the results of such a computer simulation. We’ll see throughout the book how to conduct and analyze computer simulations like this; just focus on the process and results for now. A sum of 9 occurred in 115392 or 11.5% of the one million repetitions, and a sum of 10 occurred in 125026 or 12.5% of repetitions. The simulation results suggest that a sum of 10 is more likely to occur than a sum of 9, because a sum of 10 did occur more often than a sum of 9 when we rolled the dice many times. It seems reasonable to conclude that when rolling three fair six-sided dice the probability that the sum is 10 is greater than the probability that the sum is 9.

Figure 1.1: Results of one million sets of three rolls of fair six-sided dice. Sets in which the sum of the dice is 9 (10) are represented by the orange (blue) spike.

In the dice rolling problem we assessed relative likelihoods of a sum of 9 or 10 by repeating the phenomenon many times. The sum of any single set of three rolls is uncertain, but over many sets of three rolls a clear pattern of which sums occur more frequently than others emerges in Figure 1.1. This is the idea behind the relative frequency interpretation of probability. We’ll investigate this idea further in the context of the most iconic random phenomenon: coin flipping.

We might all agree2 that the probability that a single flip of a fair coin lands on heads is 1/2, a.k.a., 0.5, a.k.a, 50%. There are only two outcomes, heads (H) and tails (T), and the notion of “fairness” implies that they should be equally likely, so we have a “50/50 chance” of heads. But how else can we interpret this 50%? As in the dice rolling problem, we can consider what would happen if we flipped the coin many times. Now, if we flipped the coin twice, we wouldn’t expect to necessarily see one head and one tail. But in many flips, we might expect to see heads on something close to 50% of flips.

Let’s try this out. Table 1.2 displays the results of 10 flips of a fair coin. The first column is the flip number (first flip, second flip, and so on) and the second column is the result of the flip (H or T). The third column displays the running number of flips that result in H and the fourth column displays the running proportion of flips that result in H. For example, the first flip results in T so the running proportion of H after 1 flip is 0/1 = 0; the first two flips result in (T, H) so the running proportion of H after 2 flips is 1/2 = 0.5; the first three flips result in (T, H, H) so the running proportion of H after 3 flips is 2/3 = 0.667; and so on. Figure 1.2 plots the running proportion of H by the number of flips. We see that with just a small number of flips, the proportion of H fluctuates considerably and is not guaranteed to be close to 0.5. Of course, the results depend on the particular sequence of coin flips. We encourage you to flip a coin 10 times and compare your results.

Table 1.2: Results and running proportion of H for 10 flips of a fair coin.
Flip Result Running count of H Running proportion of H
1 T 0 0.000
2 H 1 0.500
3 H 2 0.667
4 H 3 0.750
5 T 3 0.600
6 T 3 0.500
7 T 3 0.429
8 T 3 0.375
9 T 3 0.333
10 H 4 0.400

.

Figure 1.2: Proportion of flips resulting in H versus number of flips for the 10 coin flips in Table 1.2

As in the dice rolling example, we shouldn’t be satisfied with the results of just 10 repetitions. Now we’ll flip the coin 90 more times for a total of 100 flips. Figure 1.3 (a) displays the results, while Figure 1.3 (b) also displays the results for 3 additional sets of 100 flips. The running proportion of H fluctuates considerably in the early stages, but settles down and tends to get closer to 0.5 as the number of flips increases. However, each of the four sets results in a different proportion of heads after 100 flips: 0.41 (gray), 0.49 (orange), 0.52 (blue), 0.53 (green). Even after 100 flips the proportion of flips that result in H isn’t guaranteed to be very close to 0.5.

(a) One set of 100 flips.
(b) Four sets of 100 flips.
Figure 1.3: Running proportion of H versus number of flips for four sets of 100 coin flips.

Now we’ll flip the coin 900 more times for a total of 1000 flips in each of the four sets. Figure 1.4 (a) displays the results, while Figure 1.4 (b) also displays the results for 3 additional sets of 1000 flips. Again, the running proportion fluctuates considerably in the early stages, but settles down and tends to get closer to 0.5 as the number of flips increases. Compared to the results after 100 flips, there is less variability between sets in the proportion of H after 1000 flips: 0.498 (gray), 0.485 (orange), 0.506 (blue), 0.462 (green). Now, even after 1000 flips the proportion of flips that result in H isn’t guaranteed to be exactly 0.5, but we see a tendency for the proportion to get closer to 0.5 as the number of flips increases.

(a) One set of 1000 flips.
(b) Four sets of 1000 flips.
Figure 1.4: Running proportion of H versus number of flips for four sets of 1000 coin flips.

In a large number of flips of a fair coin we expect the proportion of flips which result in H to be close to 0.5, and the more flips there are the closer to 0.5 we expect the proportion to be. That is, the probability that a flip of a fair coin results in H, 0.5, can be interpreted as the long run proportion of flips that result in H, or in other words, the long run relative frequency of H.

Definition 1.2 The probability of an event associated with a random phenomenon can be interpreted as a long run proportion or long run relative frequency: the probability of the event is the proportion of repetitions on which the event would occur in a very large number of hypothetical repetitions of the random phenomenon.

The concept of long run relative frequency quantifies how often we would expect an event associated with a random phenomenon to occur if the phenomenon were repeated many, many times. The closer the probability is to 1, the more often we would expect the event to occur; the closer the probability is to 0, the less often we would expect the event to occur. Roughly, we would expect an event that has probability 0.9 to occur “90% of the time” in the long run.

Returning to rolling three fair six-sided dice, we’ll see later that the probability of a sum of 9 is 0.116 (rounded to three decimal places) and the probability of a sum of 10 is 0.125 . Even without knowing how to compute these values we can interpret them as long run relative frequencies. The probability of 0.116 means that in 11.6% of sets of three rolls of fair six-sided dice the sum is 9. The random phenomenon involves a set of 3 rolls, so we consider many sets of 3 rolls, each set resulting in a sum which is either 9 or not. If we roll three fair six-sided dice and find the sum then repeat to get many sets of 3 rolls, we would expect the proportion of sets for which the sum is 9 to be close to 0.116. Indeed this is what we observe in the simulation summarized by Figure 1.1. Likewise, in 12.5% of sets of three rolls of fair six-sided dice the sum is 10. In this sense, a sum of 10 is more likely than a sum of 9; in the long run, a greater proportion of sets of three rolls result in a sum of 10 than a sum of 9.

Example 1.2 In each of the following, write a clearly worded sentence interpreting the numerical value of the probability as a long run relative frequency in context. (Just take the numerical values—0.1, 0.078, 0.25, and 0.73—as given. We’ll see how to compute probabilities like these later.)

  1. The probability that a roll of a fair ten-sided die lands on 1 is 0.1.
  2. The probability that the largest of 5 rolls of a fair ten-sided die is at most 6 is 0.078.
  3. The probability that two flips of a fair coin both land on H is 0.25.
  4. The probability that in 100 flips of a fair coin the proportion of flips that land on H is between 0.45 and 0.55 is 0.73.

Solution 1.2. Solution to Exercise 1.2

  1. About 10% of rolls of a fair ten-sided result in a roll of 1. The phenomenon is a roll of a far ten-sided die and the event of interest is whether the die lands on 1. If we roll a fair ten-sided die many, many times, we would expect the proportion of rolls that land on 1 to be close to 0.1.
  2. In about 7.8% of sets of 5 rolls of a fair ten-sided die, the largest roll is at most 6. In other words, about 7.8% of sets of 5 rolls of a fair ten-sided die contain no rolls greater than 6. The phenomenon involves a set of 5 rolls of a ten-sided die, so we consider many sets of 5 rolls, each set resulting in a largest roll which is either at most 6 or not. (For example, if the 5 rolls are (4, 1, 3, 4, 2) then the largest roll is 4.) If we roll a fair ten-sided 5 times and find the largest roll then repeat to get many sets of 5 rolls, we would expect the proportion of sets for which the largest roll is at most 6 be close to 0.078.
  3. In about 25% of sets of two fair coin flips, both flips in the set land on H. The phenomenon involves two flips of a coin, so we consider what would happen over many sets of two flips each.
  4. In about 73% of sets of 100 fair coin flips, the proportion of H for the set is between 0.45 and 0.55. The phenomenon involves 100 coin flips, so we consider many sets of 100 coin flips, each set resulting in a proportion of H that is either between 0.45 and 0.55 or not. Imagine adding many more paths to Figure 1.3 (b), each corresponding to a set of 100 flips; we would expect 73% of paths to end in a value between 0.45 and 0.55 after 100 flips.

The relative frequency interpretation of probability is most natural in situations like coin flipping or dice rolling which we can actually physically repeat. In many contexts, the long run relative frequency interpretation, while still valid, is more conceptual and requires us to imagine many hypothetical repetitions of the random phenomenon.

Example 1.3 The weather forecast calls for a 30% chance of rain in your city tomorrow. You ask Donny Don’t to interpret the 30% as a long run relative frequency. Donny says: “it will rain in 30% of the city tomorrow”. You ask him to elaborate; he says: “Well, there are many different locations in the city. In some of the locations it will rain, in some it won’t. It will rain in 30% of the locations, and not in the other 70%. That is, rain will cover 30% of the area of the city, and the other 70% won’t have rain.” Do you agree? If not, how would you interpret the 30% as a long run relative frequency?

Solution 1.3. Solution to Example 1.3

One key to correctly interpreting probabilities is to consider the appropriate random phenomenon. Donny seems to think the random phenomenon involves selecting locations in the city. However, there is a 30% chance of rain in your city tomorrow, so the random phenomenon involves days. Yes, there is only one tomorrow, but there are—at least hypothetically—many days which have weather conditions similar to those forecast for tomorrow. On each of those days it either rains or not. What counts as rain? If we follow the U.S. National Weather Service, on any given day it rains in the city if there is accumulation over the day of at least 0.0254cm (0.01in) of rain at any point in the city. If we imagine manys days with weather conditions similar to those forecast for tomorrow, it will rain on 30% of these days, and on 70% of these days it won’t rain.

Our interpretation also sheds light on another common misinterpretation. Namely, a 30% chance of rain does not mean “it will rain for 30% of the day tomorrow; that is, it will rain for 7.2 hours tomorrow”.

A simulation involves an artificial recreation of the random phenomenon, usually using a computer. One implication of the relative frequency interpretation is that the probability of an event can be approximated by simulating the random phenomenon a large number of times and determining the proportion of simulated repetitions on which the event occurred. After many repetitions the relative frequency of the event will settle down to a single constant value, and that value is the approximately the probability of the event.

Of course, the accuracy of simulation-based approximations of probabilities depends on how well the simulation represents the actual random phenomenon. Conducting a simulation can involve many assumptions which impact the results. Simulating many flips of a fair coin is one thing; simulating the evolution of meteorological conditions over time is an entirely different story.

Example 1.4 In the first 7 games of his NBA career, Paolo Banchero attempted 60 free throws and successfully made 44. Donny Don’t says “the probability that Paolo Banchero successfully makes a free throw attempt is 44/60 = 0.733.” Do you agree? Explain.

Solution 1.4. Donny is correctly computing a relative frequency, but he is confusing the short run with the long run. A probability is a long run relative frequency. The probability that Paolo Banchero successfully makes a free throw can be interpreted as the proportion of free throw attempts that he successfully makes over many attempts. The observed relative frequency of 0.733 is only an approximation of the long run probability. And with only 60 attempts, it’s not necessarily a good approximation of the long run (even if we ignore that players can get better or worse over time).

The same considerations apply to players with many more attempts. For example, Giannis Antetokounmpo has attempted over 5000 free throws in his career and successfully made about 70%. But 0.70 is still just an approximation of Antetokounmpo’s true free throw probability since the long run includes all future attempts as well (ignoring that players can get better or worse over time). The difference is that 0.70 is likely a much better estimate of Antetokounmpo’s true free throw probability than 0.733 is of Banchero’s.

Be careful to distinguish between the short run and the long run. Observed relative frequencies based on past data (sometimes called “empirical probabilities”) are only short run approximations to theoretical probabilities which represent long run relative frequencies. The quality of the approximations depends on the extent to which what has happened is representative of all the possibilities that might happen.

A simulation models the long run. A natural question is: “how many simulated repetitions are required to represent the long run?” We’ll investigate further later. For now we’ll just provide a very rough benchmark: we can generally expect the relative frequency based on 10000 independent repetitions to be within 0.01 of the corresponding probability.

Finally, recall that contrary to colloquial uses of the word, random does not mean haphazard. Individual outcomes of a random phenomenon are uncertain, but the long run relative frequency interpretation implies a predictable pattern over a large number of (usually hypothetical) repetitions. For example, Figure 1.1 displays a clear distribution of the sum of the rolls of three fair six-sided dice after one million repetitions of the phenomenon. We don’t know what the sum will be when we roll the dice, but we can say that it’s equally likely to be 10 or 11, more likely to be 10 than 9, more likely to be 9 than 8, and so on. Also, we know that if we roll the dice many times, close to 12.5% of sets of three rolls will result in a sum of 10.

1.2.2 Subjective probability

The long run relative frequency interpretation is most natural in repeatable situations like flipping coins, rolling dice, drawing Powerballs, or randomly selecting U.S. adults (e.g., via random digit dialing). In many other situations, it is difficult to conceptualize the long run. The next Superbowl will only be played once, the 2032 U.S. Presidential Election will only be conducted once (we hope), and there was only one April 17, 2019 on which you either did or did not eat an apple. While these situations are not naturally repeatable they still involve randomness (uncertainty) and it is still reasonable to assign probabilities. At this point in time we might think that the Philadelphia Eagles are more likely than the Cleveland Browns to win the next Superbowl and that a current U.S. Senator is more likely than Dwayne Johnson to win the U.S. 2032 Presidential Election. If you’ve always been an apple-a-day person, you might think there’s a good chance you ate one on April 17, 2019; if you’re allergic to apples, your probability might be close to 0. Even when an uncertain phenomenon is not naturally repeated, it is still reasonable to quantify the relative degree of likelihood or plausibility of related events.

However, the meaning of probability does seem different in physically repeatable situations like coin flips than in single occurrences like the next Superbowl. Let’s switch sports and consider the World Series of Major League Baseball. Consider the 2022 World Series, which the Houston Astros won. As of June 17, 2022,

  • According to FiveThirtyEight, the Los Angeles Dodgers had a 20% chance of winning the 2022 World Series, and the San Diego Padres had an 8% chance.
  • According to FanGraphs, the Dodgers had a 12.4% chance of winning the 2022 World Series, and the Padres had a 9.9% chance.
  • According to gambling site Odds Shark, the Dodgers had a 20% chance of winning the 2022 World Series, and the Padres had a 7.7% chance.

Each source, as well as many others, assigned different probabilities to the Dodgers or Padres winning. Which source, if any, was “correct”?

For a fair coin flip, we could perform a simulation to verify that the probability that it lands on H is 0.5. Our simulation results would vary, but with enough repetitions we could all agree that the proportion of flips that land on H seems to be converging to 0.5. We could also agree on how to conduct the simulation: each repetition involves a fair coin flip. If we’re concerned about a particular coin being weighted or biased we can simulate the fair coin flip in some other way, such as writing H and T on two cards, shuffling well, and drawing a card. There is no ambiguity about the assumptions—two equally likely outcomes—and in the long run we would reach the same conclusion. That is, we can agree on the “rules” of a fair coin flip, and these rules determine a single value, 0.5, for the probability that it lands on H.

Now consider a future World Series, say 2030. Even though the actual 2030 World Series will only happen once, we could still perform a simulation involving hypothetical repetitions. However, simulating the World Series involves first simulating the 2030 season to determine the playoff match ups, then simulating the playoffs to see which teams make the World Series, then simulating the World Series match up itself. And simulating the 2030 season involves simulating all the individual games. Even just simulating a single game involves many assumptions; differences in opinions with regards to these assumptions can lead to different probabilities. For example, on June 17, 2022, according to FiveThirtyEight the Dodgers had a 68% chance of beating the Cleveland Guardians in their game that day, but according to FanGraphs it was 66%. Even if the differences in probabilities between sources is small, many small differences over the course of the season could result in large differences in predictions for the World Series champion. (We’re not even considering uncertainty due to any changes in the rules of baseball, MLB, or the world between now and 2030.)

Unlike physically repeatable situations such as flipping a coin, there is no single set of “rules” for conducting a simulation of a single baseball game between two teams, let alone a whole season of games or the World Series champion. Therefore, there is no single long run relative frequency that determines the probability that a certain team wins the World Series. Instead we consider subjective probability.

Definition 1.3 A subjective probability (a.k.a. personal probability) of an event associated with a random phenomenon is a number in [0, 1] representing the degree of likelihood, certainty, or plausibility a given individual assigns to the event.

As the name suggests, different individuals (or probabilistic models) might have different subjective probabilities for the same event. In contrast, in the long run relative frequency interpretation the probability of an event is agreed to be the single number that its long run relative frequency converges to.

Think of subjective probabilities as measuring relative degrees of likelihood, uncertainty, or plausibility rather than long run relative frequencies. For example, in the FiveThirtyEight forecast, the Dodgers were about 2.5 times more likely to win the 2022 World Series than the Padres (\(2.5 = 0.20 / 0.08\)). Relative likelihoods can also be compared across different forecasts or scenarios. For example, FiveThirtyEight assessed that the Dodgers were about 1.6 (\(1.6 = 0.20 / 0.124\)) times more likely to win the World Series than FanGraphs did. Also, FiveThirtyEight believed that the likelihood that a fair coin lands on H is about 2.5 (\(2.5 = 0.5 / 0.2\)) times larger than the likelihood that the Dodgers would win the 2022 World Series.

Example 1.5 Your favorite local weatherperson forecasts a 30% chance of rain tomorrow and a 60% chance of rain the next day in your city.

  1. Explain how these probabilities are subjective.
  2. You ask Donny Don’t to interpret these values as relative degrees of likelihood. Donny says: “Well, 30% is not that big, so it’s not going to rain that hard tomorrow. Also, 60% is twice is big as 30%, so it’s going to rain twice as hard two days from now as it will tomorrow”. Do you agree? Explain.
  3. Donny says: “Can’t we just look at the data from all the days with weather conditions similar to the ones forecast for tomorrow, and see how often it rained on those days to find the probability of rain tomorrow? No subjectivity about that!” How would you respond?

Solution 1.5.

  1. Weather is complicated and depends on many factors. Different models for how meteorological conditions evolve over time based on different assumptions or data can provide different forecasts. One model might predict a 30% chance of rain tomorrow; another might say 25%. There is no one single agreed upon set of “rules”—model, assumptions, data—for forecasting the weather, and therefore there is not a single agreed upon value for the probability of rain tomorrow.
  2. Probabilities measure degree of uncertainty of whether or not it will rain rather than the severity of any rain. On any single day it either rains or doesn’t; we discussed what counts as rain in Example 1.3. It is two times more likely to rain two days from now than it is tomorrow. It doesn’t matter how hard it rains, if at all, either day. Looking at it another way: if there is a 30% chain of rain then there is a 70% chance that it does not rain tomorrow. Therefore, the weatherperson is more certain—in fact, 1.167 times more certain—that it will not rain tomorrow than they are that it will rain two days from now.
  3. Donny’s idea is not terrible, but there are still a few issues. First it’s much easier said than done to identify all the days with weather conditions similar to those forecast for tomorrow. There are many variables involved in the weather; which variables do we use and what counts as “similar”? There would still be subjective choices to be made in determining an appropriate reference group of days. Second, even if we identify an appropriate reference group, the relative frequency of rain still only provides an approximation of the probability of rain. The probability measures the degree of likelihood of rain tomorrow, which is a conceptually different quantity from the past frequency of rain on similar days. (We discussed related ideas in Example 1.4.) Finally, Donny has suggested one way of approximating the probability, but there are still many other reasonable approaches which might result in different probabilities.

The chance of rain in your city tomorrow and FiveThirtyEight’s MLB predictions are outputs of probabilistic forecasts. A probabilistic forecast combines observed data and statistical or mathematical models to make predictions. Rather than providing a single prediction such as “it will rain tomorrow” or “the Los Angeles Dodgers will win the 2022 World Series”, probabilistic forecasts provide a range of scenarios and their relative likelihoods. Such forecasts are subjective in nature, relying upon the data used and assumptions of the model. Changing the data or assumptions can result in different forecasts and probabilities. In particular, probabilistic forecasts are usually revised over time as more information becomes available.

Subjective probabilities can be calibrated by weighing the relative favorability of different bets3, as in the following example.

Example 1.6 What is your subjective probability that Professor Ross (the author) has a TikTok account? Consider the following two bets, and suppose you must choose only one.

  1. You win $100 if Professor Ross has a TikTok account, and you win nothing otherwise.
  2. A box contains 40 green and 60 gold marbles that are otherwise identical. The marbles are thoroughly mixed and one marble is selected at random. You win $100 if the selected marble is green, and you win nothing otherwise.
  1. Which of the above bets would you prefer? Or are you completely indifferent? What does this say about your subjective probability that Professor Ross has a Tik Tok account?
  2. If you preferred bet B to bet A, consider bet C which has a similar setup to B but now there are 20 green and 80 gold marbles. Do you prefer bet A or bet C? What does this say about your subjective probability that Professor Ross has a Tik Tok account?
  3. If you preferred bet A to bet B, consider bet D which has a similar setup to B but now there are 60 green and 40 gold marbles. Do you prefer bet A or bet D? What does this say about your subjective probability that Professor Ross has a Tik Tok account?
  4. Continue to consider different numbers of green and gold marbles. Can you zero in on your subjective probability?

Solution 1.6. Since the bets all have the same payouts, you should prefer the one that gives you the greatest probability of winning!

  1. If you choose bet B, the probability of winning is 0.4 (which we could verify with a simulation).
    • If you prefer bet B to bet A, then your subjective probability that Professor Ross has a TikTok account is less than 0.4.
    • If you prefer bet A to bet B, then your subjective probability that Professor Ross has a TikTok account is greater than 0.4.
    • If you’re indifferent between bets A and B, then your subjective probability that Professor Ross has a TikTok account is equal to 0.4.
  2. If you choose bet C, the probability of winning is 0.2.
    • If you prefer bet C to bet A, then your subjective probability that Professor Ross has a TikTok account is less than 0.2.
    • If you prefer bet A to bet C, then your subjective probability that Professor Ross has a TikTok account is greater than 0.2.
    • If you’re indifferent between bets A and C, then your subjective probability that Professor Ross has a TikTok account is equal to 0.2.
  3. If you choose bet D, the probability of winning is 0.6.
    • If you prefer bet D to bet A, then your subjective probability that Professor Ross has a TikTok account is less than 0.6.
    • If you prefer bet A to bet D, then your subjective probability that Professor Ross has a TikTok account is greater than 0.6.
    • If you’re indifferent between bets A and D, then your subjective probability that Professor Ross has a TikTok account is equal to 0.6.
  4. Continuing in this way you can narrow down your subjective probability. For example, if you prefer bet B to bet A and bet A to bet C, your subjective probability is between 0.2 and 0.4. Then you might consider bet E corresponding to 30 gold marbles and 70 green to determine if you subjective probability is greater than or less than 0.3. At some point it will be hard to choose, and you will be in the ballpark of your subjective probability. (Think of it like going to the eye doctor: “which is better: 1 or 2?” At some point you can’t really see a difference.)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
(a) Bet B with a 0.4 probability of selecting green
(b) Bet C with a 0.2 probability of selecting green
(c) Bet D with a 0.6 probability of selecting green
Figure 1.5: The three marble bins in Example 1.6.

Of course, the strategy in the above example isn’t an exact science, and there is a lot of behavioral psychology behind how people make choices in situations like this4, especially when betting with real money. But the example provides a very rough idea of how you might discern a subjective probability of an event. The example also illustrates that probabilities can be “personal”; your information or assumptions will influence your assessment of the likelihood.

We close this section with some brief comments about subjectivity. Subjectivity is not bad; “subjective” is not a “dirty” word. Any probability model involves some subjectivity, even when probabilities can be interpreted naturally as long run relative frequencies. For example, assuming a die is fair does not codify an objective truth about the die. Instead, “fairness” reflects a reasonable and tractable mathematical model. In the real world, any “fair” six-sided die has small physical imperfections that cause the six faces to have different probabilities. However, the differences are usually small enough to be ignored for most practical purposes. Assuming that the probability that the die lands on each side is 1/6 is much more tractable than assuming the probability of a 1 is 0.1666666668, the probability of a 2 is 0.1666666665, etc. (Furthermore, measuring the probability of each side so precisely would be extremely difficult.) But assuming that the probability that the die lands on each side is 1/6 is also subjective. We might agree more easily on the probability that a six-sided die lands on 1 than on the probability that the Philadelphia Phillies win the 2030 World Series. But the fact that there can be many reasonable probability models for a situation like the 2030 World Series does not make the corresponding subjective probabilities any less valid than long run relative frequencies.

1.2.3 Which interpretation to use?

In short, both! Fortunately, the mathematics of probability works the same way regardless of the interpretation. We can—and will—use the long run relative frequency and subjective interpretations interchangeably. When we introduce a new concept or problem we will employ whichever interpretation we think helps us best understand the concept or solve the problem.

Long run relative frequency and subjective are not the only interpretations of probability. However, we will not delve further into the philosophy of probability5. You might still have questions such as:

  • If we flip a coin and cover it before observing what side it lands on, is the flip still random?
  • If we measure precisely all the features that determine the coin’s trajectory and what side it will land on (initial velocity, air resistance, etc.), is the flip still random?
  • Is the probability of heads in the previous two cases 0.5? Or 0 or 1? Does it even make sense to talk about the probability of heads if the outcome is determined?
  • What really is “true randomness”?

You can debate questions like these with your friends, but our position is that probability is applicable in any situation involving a reasonable degree of uncertainty. If you flip the coin but cover it, the uncertainty of the flip is not resolved so it still makes sense (to us) to say the probability that heads is facing up is 0.5. In practical situations you’re rarely, if ever, going to measure precisely all the features that determine the outcome, so you can still use probability to assess the degree of plausibility of various events. We are certainly ignoring some philosophical issues or questions6, but our brief introduction to instances of randomness and interpretations of probability provides sufficient background for discussing many interesting practical problems in a wide variety of applications.

1.2.4 Exercises

Exercise 1.2 In each of the following, write a clearly worded sentence interpreting the numerical value of the probability as a long run relative frequency in context. (Just take the numerical values as given for now. We’ll see how to compute probabilities like these later.)

  1. The probability of rolling doubles when you roll two fair six-sided dice is 1/6.
  2. The probability of rolling doubles on three consecutive rolls of two fair six-sided dice is 0.00463.
  3. The probability that the sum of 100 rolls of a fair six-sided die is less than 370 is 0.12.
  4. Roll a fair six-sided die until you roll a 6 three times and then stop. The probability that you roll the die at least 10 times is 0.822.
  5. The probability that a randomly selected U.S. adult uses TikTok is 0.2.

Exercise 1.3 Various sources posted odds for who would win the 2024 U.S. Presidential Election. As of June 30, 2023, the website bonus.com listed the following probabilities. (They also assigned probabilities to other candidates that aren’t included here.)

Potential candidate Probability of winning 2024 election
Joe Biden 44%
Donald Trump 29%
Ron DeSantis 19%
Gavin Newsom 11%
Kamala Harris 5%
  1. According to bonus.com as of June 30, 2023, how many times more likely was Joe Biden to win than Gavin Newsom?
  2. How many times more likely was Ron DeSantis to not win than to win?
  3. Another source listed the probability for Kamala Harris as 10%. How many times more likely was Kamala Harris to win according to this source relative to bonus.com?
  4. Say it’s October 2032 and we’re trying to predict the outcome of the 2032 U.S. Presidential election. How would a table of probabilities one month before the election compare to a table one year before the election? We obviously can’t predict the future, but in general terms what would you expect? (Hint: the bonus.com predictions were made over a year in advance of the 2024 election, and we see five candidates with not-so-small probabilities. Would you expect that to be true a month before the election?)

Exercise 1.4 Identify your subjective probability of each of the following (to the nearest 0.05 or 0.1 is fine). Explain how you arrived at your value by considering bets like those in Example 1.6.

  1. The probability that your favorite sports team will win a championship in the next ten years.
  2. The probability that you will eventually visit all 50 current U.S. states at some time in your life.
  3. The probability that there will be more than 50 U.S. states in 2050.
  4. The probability that we live in a multiverse.
  5. Choose a situation of interest to you and identify your subjective probability!

1.3 Working with probabilities

In the previous section we encountered two interpretations of probability: long run relative frequency and subjective. Fortunately, the mathematics of probability work the same way regardless of the interpretation.

1.3.1 Consistency requirements

Any probability assessment must satisfy some basic logical consistency requirements. Roughly, probabilities cannot be negative and the sum of probabilities over all possibilities must be 1 (or 100%). We will formalize these requirements in mathematical formulas later. For now, we just proceed using intuition.

Example 1.7 As of Jun 21, 2023, FiveThirtyEight listed the following probabilities for who would win the 2023 World Series.

Team Probability
Atlanta Braves 19%
Tampa Bay Rays 16%
Los Angeles Dodgers 10%
Houston Astros 7%
New York Yankees 7%
Other

According to FiveThirtyEight (as of Jun 21, 2023):

  1. Are these probabilities most naturally interpreted as long run relative frequencies or subjective probabilities? Explain.
  2. What must be the probability that the Braves do not win the 2023 World Series?
  3. What must be the probability that either the Braves or the Rays win?
  4. What must be the probability that one of the above five teams is the World Series champion?
  5. What must be the probability that a team other than the above five teams is the World Series champion? That is, what value goes in the “Other” row in the table?
  6. Donny Don’t says, “These are subjective probabilities, so I can’t use them to perform a simulation.” Explain to Donny how you could conduct a simulation that reflects these probabilities, say using a spinner (like from a kids game).
  7. What would you expect the results of 10000 repetitions of a simulation of the World Series champion to look like? Construct a table summarizing what you expect. Is this necessarily what would happen?

Solution 1.7.

  1. These probabilities are most naturally interpreted as subjective probabilities, because there is no single set of “rules”—model, assumptions, data—that determines long run relative frequencies.

  2. 81%. Either the Braves win or they don’t; if there’s a 19% chance that the Braves win, there must be a 81% chance that they do not win in order to account for 100% of the possibilities. If we think of this as a simulation with 10000 repetitions (see the last part), each repetition results in either the Braves winning or not, so if they win in 1900 of repetitions then they must not win in the other 8100.

  3. 35%. There is only one World Series champion, so if say the Braves win then no other team can win. Because “Braves win” and “Rays win” are distinct events we can add their probabilities to find the probability of the event “Braves or Rays win”. Thinking again of the simulation, the repetitions in which the Braves win are distinct from those in which the Rays win. If the Braves win in 1900 repetitions and the Rays win in 1600 repetitions, then on a total of 3500 repetitions either the Braves or Rays win.

  4. 59%. As in the previous part, we can add the five probabilities to see that the probability that one of the five teams above wins must be 59%.

  5. 41%. Either one of the five teams above wins, or some other team wins. If one of the five teams above wins in 5900 repetitions, then in 4100 repetitions the winner is not one of these five teams.

  6. These particular values are subjective, but we can still treat them as given and use them to conduct a simulation. Imagine that we construct a spinner7 like in Figure 1.6. Spinning this spinner once simulates a World Series winner according to the given probabilities. We could conduct many repetitions by spinning the spinner many times.

  7. Each repetition results in a World Series champion and in the long run we would expect the Braves would be the champion in 19%, or 1900, of the 10000 repetitions. We would expect the simulation results to look like

    Team Repetitions
    Atlanta Braves 1900
    Tampa Bay Rays 1600
    Los Angeles Dodgers 1000
    Houston Astros 700
    New York Yankees 700
    Other 4100

    Of course, there would be some natural variability from simulation to simulation, just like in the sets of 1000 coin flips in Figure 1.4. But the above counts represent about what we would expect.

Figure 1.6: Spinner representation of the subjective probabilities in Example 1.7.

Consistency does not tell us how to assign probabilities to events. Rather, consistency requires that however we assign probabilities they must fit together in a logically coherent way. In the previous example, there is no rule that says the probability that the Braves win must be 0.19; this value is subjective. However, once we have specified that the probability is 0.19, consistency requires that the probability that the Braves do not win must be 0.81.

Example 1.8 Suppose your subjective probabilities for the 2023 World Series champion satisfy the following conditions.

  • The Astros and Dodgers are equally likely to win
  • The Rays are 1.5 times more likely than the Astros to win
  • The Braves are 2 times more likely than the Rays to win
  • The winner is as likely to be among these four teams— Braves, Rays, Dodgers, Astros— as not

Construct a table of your subjective probabilities like the one in Example 1.7.

Solution 1.8. Here, probabilities are specified indirectly via relative likelihoods. We need to find probabilities that are in the given ratios and add up to 1 ( or 100%). It helps to designate one outcome as the “baseline”. It doesn’t matter which one, though it’s convenient to choose the one with the smallest probability; we’ll choose the Astros.

  • Suppose the Astros account for 1 “unit”. It doesn’t really matter what a unit8 is, but let’s say it corresponds to 1000 repetitions of a simulation. That is, the Astros win in 1000 repetitions. Careful: we haven’t yet specified how many total repetitions there are, or how many units the entire simulation accounts for. We’re just starting with a baseline of what happens for the Astros.
  • The Astros and Dodgers are equally like to win, so the Dodgers also account for 1 unit.
  • The Rays are 1.5 times more likely than the Astros to win, so the Rays account for 1.5 units. If 1 unit is 1000 repetitions, then the Rays win in 1500 repetitions, 1.5 times more often than the Astros.
  • The Braves are 2 times more likely than the Rays to win, so the Braves account for \(2\times 1.5=3\) units. If 1 unit is 1000 repetitions, then the Braves win in 3000 repetitions.
  • The four teams account for a total of \(1+1+1.5+3 = 6.5\) units. Since the winner is as likely to among these four teams as not, then “Other” also accounts for 6.5 units.
  • In total, there are 13 units which account for 100% of the probability. The Astros account for 1 unit, so their probability of winning is \(1/13=0.077\) or about 7.7%. Likewise, the probability that the Rays win is \(3/13=0.231\) or about 23.1%.
Table 1.3: Table solution for Example 1.8
Team Units Hypothetical repetitions Probability (as fraction, decimal) Probability (as percent)
Atlants Braves 3.0 3000 3/13 = 0.231 23.1%
Tampa Bay Ray 1.5 1500 1.5/13 = 0.115 11.5%
Los Angeles Dodgers 1.0 1000 1/13 = 0.077 7.7%
Houston Astros 1.0 1000 1/13 = 0.077 7.7%
Other 6.5 6500 6.5/13 = 0.500 50.0%
Total 13.0 13000 1 100.0%

You should verify that all of the probabilities are in the specified ratios. For example, the Braves are 2 times more likely (\(2 = 0.231 / 0.115\)) than the Rays to win, and the Rays are 1.5 times more likely \((1.5 \approx 0.115 / 0.077)\) than the Astros to win.

Example 1.8 illustrates one way of formulating probabilities. We start by specifying probabilities in relative terms, and then “normalize” these probabilities so that they add up to 1 (or 100%) while maintaining the ratios. As in the example, it helps to consider one outcome as a “baseline” and to specify all likelihoods relative to the baseline.

Figure 1.7 provides a visual representation of Example 1.8. The ratios provided in the problem setup are enough to draw the shape of the plot, represented by Figure 1.7 (a) without a scale on the vertical axis. The heights are equal for the Astros and the Dodgers, the height for the Rays is 1.5 times the height for the Astros, the height for the Braves is 2 times the height for the Rays, and if we stacked the bars for the Braves, Rays, Astros, and Dodgers on top of another another they would be as tall as the Other bar. Figure 1.7 (b) simply adds a (vertical) probability axis to ensure the heights of the bars sum to 1. Figure 1.7 (b) represents the “normalization” step, but it does not affect the shape of the plot or the relative heights of the bars.

(a) Relative heights without absolute scale.
(b) Heights scaled to sum to 1 to represent probabilities.
Figure 1.7: Bar chart representation of the subjective probabilities in Example 1.8.

The fact that the probabilities must sum to 1 over all possibilities might seem obvious. Other consistency requirements are more subtle.

Example 1.9 Consider a Cal Poly student who frequently has blurry, bloodshot eyes, generally exhibits slow reaction time, always seems to have the munchies, and disappears at 4:20 each day. Which of the following, A or B, has a higher probability?9 (Assume the two probabilities are not equal.)

  • A: The student has a GPA above 3.0.
  • B: The student has a GPA above 3.0 and smokes marijuana regularly.

Solution 1.9. A has the higher probability. Many people say B, associating the description of the student with the “smokes marijuana regularly” part of event B. But every student who satisfies event B also satisfies event A, so the probability for event A can’t be any smaller than that of event B.

The previous example illustrates that our psychological judgment is often inconsistent with the mathematical logic of probabilities, so be careful when interpreting probabilities!

1.3.2 Odds

The words “probability”, “chance”, “likelihood”, and “odds” are colloquially treated as synonyms. However, in the mathematical language of probability, odds provide a different way of reporting a probability. Rather than reporting probability on a 0 to 1 (or 0% to 100%) scale, odds report probabilities in terms of ratios.

Example 1.10 Continuing Example 1.7, suppose the probability that the Dodgers win the 2023 World Series is 0.10.

  1. What is the probability that the Dodgers do not win the 2023 World Series?
  2. The odds that the Dodgers win the World Series are “9 to 1 against”. What do you think this means?
  3. What are the odds that the Dodgers do not win the World Series?
  4. If the probability that the Rays win is 0.16, what are the odds for the Rays?
  5. What are the odds that either the Dodgers or the Rays win?
  6. Suppose the San Francisco Giants have 19 to 1 odds against winning. What is the probability that the Giants win the World Series?

Solution 1.10.

  1. The probability that the Dodgers win is 0.1, so the probability that they do not win is 0.9.
  2. The values in the previous part are in a 9 to 1 ratio: the probability of not winning (0.9) is 9 times greater than the probability of winning (0.1). So the odds against the Dodgers winning the World Series are 9 to 1; “against” because the Dodgers are less likely to win than to not win.
  3. The probabilities are still in the 9 to 1 ratio, but we can say that the odds are 9 to 1 in favor of the Dodgers not winning. We could also say the odds are 1 to 9 in favor of the Dodgers winning, but odds are typically reported with the larger value first—9 to 1 instead of 1 to 9.
  4. The probability that the Rays win is 0.16 and that they don’t win is 0.84, and \(0.84/0.16 = 5.25\). So the odds are 5.25 to 1 against the Rays winning. Odds are often reported as whole numbers, so we could say the odds are 21 to 4 against the Rays winning (since 21/4 = 5.25).
  5. The probability that either the Dodgers or the Rays win is 0.16 + 0.10 = 0.26, so the probability that neither of these teams wins is 0.74. The ratio of these values is \(0.74/0.26 = 2.85\), so there are 2.85 to 1 odds against the winner being either the Dodgers or the Rays (or 57 to 20 in whole numbers). (Notice that 2.85 is not related in a simple way to the previous values 9 and 5.25.)
  6. The odds tell us that the probability that the Giants do not win is 19 times greater than the probability that they do win. Let the event that the Giants win account for 1 “unit” so that the event that they do not win accounts for 19 units, for a total of 20 units. Then10: the probability that the Giants win is \(1/20 = 0.05\). Note that the probability of not winning, \(19/20 = 0.95\), is 19 times greater than the probability of winning.

Definition 1.4 The odds of an event is a ratio involving the probability that the event occurs and the probability that the event does not occur. Odds can be expressed as either “in favor” of or “against” the event occurring, depending on the order of the ratio.

\[ \begin{aligned} \text{odds in favor of an event} & = \frac{\text{probability that the event occurs}}{\text{probability that the event does not occur}} \\ & \\ \text{odds against an event} & = \frac{\text{probability that the event does not occur}}{\text{probability that the event occurs}}\end{aligned} \]

In many situations odds are typically reported as odds against. While the odds of an event is a just a single number, odds are often reported as a ratio of whole numbers, e.g., 11 to 1, 7 to 2.

As discussed at the end of Section Section 1.2.2 bets can be used to discern probabilities or odds.

Example 1.11 Ron and Leslie agree to the following bet. They’ll ask Professor Ross if he has a TikTok account. If he does, Leslie will pay Ron $200; if not, Ron will pay Leslie $100. (Neither has any direct information about whether or not Professor Ross has a TikTok account.)

  1. Given this setup, which of the following is being judged as more likely: that Professor Ross has a TikTok account, or that he does not? Why?
  2. What are this bet’s odds?
  3. Ron and Leslie agree that this is a fair bet, and neither would accept worse odds. What is their subjective probability that Professor Ross has a TikTok account?
  4. Suppose they were to hypothetically repeat this bet many times, say 3000 times. Given the probability from the previous part, how many times would you expect Leslie to win? To lose? What would you expect Leslie’s net dollar winnings to be? In what sense is this bet “fair”? (Remember: Leslie’s winnings are Ron’s losses and vice versa.)

Solution 1.11.

  1. The larger potential payout corresponds to the less likely event. So they think Professor Ross is more likely to not have a TikTok account than to have one.
  2. The payouts are in a 2 to 1 ratio, so the odds that Professor Ross has a TikTok account are 2 to 1 against.
  3. The odds that Professor Ross has a TikTok account are 2 to 1 against, so Professor Ross is twice as likely to not have a TikTok account than to have one. This corresponds to a subjective probability11 that Professor Ross has a TikTok account of 1/3 (and a probability that he does not have one of 2/3).
  4. The probability that Leslie wins is 2/3, so you would expect her to win in 2000 of the 3000 repetitions. She wins $100 each time she wins, so you would expect her to win a total of $200,000 on games she wins. The probability that she loses is 1/3, so you would expect her to lose in 1000 of the 3000 repetions. She loses $200 each time, so you would expect her to lose a total of $200,000 on the games she loses. So you would expect Leslie’s net winnings to be 0, and likewise for Ron. The bet is fair in the sense that neither party is expected to profit or lose in the long run.
Winner Expected number of repetitions Leslie’s winnings per repetition ($) Leslie’s expected total winnings ($)
Leslie 2000 100 200,000
Ron 1000 -200 -200,000
Total 3000 NA 0

The previous example illustrates that the odds of a fair bet on whether or not an event will occur, determined by the ratio of the payouts, imply a probability for the event.

\[\begin{align*} \text{probability that event occurs} & = \frac{\text{odds in favor of the event}}{1+\text{odds in favor of the event}}\\ & \\ & = \frac{1}{1+\text{odds against the event}} \end{align*}\]

We have defined odds as a ratio of probabilities; these are sometimes called “fractional odds”. But odds can be reported in other ways. In particular, “moneyline odds” (a.k.a., “American odds”) are expressed in terms of the net profit on a 100 dollar bet12. For example, in Example 1.7 the moneyline odds for the Dodgers are +900. This means that someone who bets 100 dollars on the Dodgers to win the World Series would receive 100+900 dollars if the Dodgers actually win, for a net profit of +900 dollars after subtracting the initial stake of 100 dollars. A $100 bet at +900 moneyline odds results in a profit of $900 if the bet is won or a loss of the initial $100 stake otherwise; the amounts 900 and 100 are in a 9 to 1 ratio (against winning), implying a probability of \(1/(1+9) = 0.10\) of winning the bet.

1.3.3 Why do we need consistency?

Regardless of the interpretation probabilities must follow basic logical consistency requirements. If these requirements are mistakenly not satisfied, bad things can happen.

Example 1.12 Donny Don’t thinks the Dodgers have a pretty good chance to win the World Series. He thinks their only real competition is the Yankees. The following are Donny’s subjective probabilities for which team will win the World Series.

Team Probability
Los Angeles Dodgers 0.50
New York Yankees 0.25
Other 0.10
  1. What is wrong with Donny’s probabilities?
  2. What are Donny’s odds that the Dodgers win? (Consider only Donny’s probability that the Dodgers win13.)
  3. Would Donny agree to a bet where he pays you $100 if the Dodgers win but you pay him $100 if the Dodgers do not win?
  4. What are Donny’s odds that the Yankees win? Would Donny agree to a bet where he pays you $150 if the Yankees win but you pay him $50 if the Yankees do not win?
  5. What are Donny’s odds that a team other than the Dodgers or Yankees wins? Would Donny agree to a bet where he pays you $180 if an other team wins but you pay him $20 if the winner is either the Yankees or Dodgers?
  6. Suppose you and Donny agree to make all of the bets in the three previous parts. Consider your net profit for each of the potential outcomes (Dodgers win, Yankees win, other wins). What do you notice? Who would you rather be in this situation: you or Donny?

Solution 1.12.

  1. Donny’s probabilities do not add up to 1. If we had listed the odds for every outcome (see below) instead of the probabilities, his mistake would have been less obvious.

  2. Donny’s odds that the Dodgers win are \(\frac{0.5}{0.5}=1\), or even odds.

  3. Donny believes that the Dodgers are equally likely to win as to not win so, yes, he would agree to this bet with even payouts.

  4. Donny’s odds that the Yankees do not win are \(\frac{0.75}{0.25}=3\), or 3 to 1 odds against the Yankees winning. Donny believes that the Yankees are 3 times more likely to not win than to win. Since the payouts are in a 3 to 1 ratio with the larger payout corresponding to the Yankees winning (the less likely event), then Donny would agree to this bet.

  5. Donny’s odds that an other team does not win are \(\frac{0.9}{0.1}=9\), or 9 to 1 odds against an other team winning. Donny believes that an other team is 9 times more likely to not win than to win. Since the payouts are in a 9 to 1 ratio with the larger payout corresponding to an other team winning (the less likely event), then Donny would agree to this bet.

  6. Given Donny’s odds for each outcome, he would agree to each of these bets.

    • If the Dodgers win, you win the first bet but lose the other two, so your net profit is 100 - 50 - 20 = 30.
    • If the Yankees win, you win the second bet but lose the other two, so your net profit is 150 - 100 - 20 = 30
    • If an other team wins, you win the third bet but lose the other two, so your net profit is 180 - 100 - 50 = 30.

    Regardless of the outcome, you are guaranteed to earn a net profit of $30, and Donny is guaranteed to lose a net of $30. That’s free money for you with no risk, and pretty bad business on Donny’s part.

The situation in Example 1.12 is known as a “Dutch book”. A Dutch book14 is a set of probabilities and bets which guarantees a profit, regardless of the outcome of the gamble. Probabilities that fail to satisfy logical consistency requirements allow for the possibility of Dutch books. The fact that no one should ever want to get caught in a Dutch book, like Donny was in the previous problem, is one justification of why even probabilities should satisfy logical consistency requirements.

1.3.4 Exercises

Exercise 1.5 Various sources posted odds for who would win the 2024 U.S. Presidential Election. As of June 30, 2023, the website bonus.com listed the following probabilities.

Potential candidate Probability of winning 2024 election
Joe Biden 44%
Donald Trump 29%
Ron DeSantis 19%
Gavin Newsom 11%
Kamala Harris 5%
  1. According to bonus.com, what is the probability that either Donald Trump or Ron DeSantis wins the 2024 election?
  2. According to bonus.com, what is the probability that a candidate other than these five wins the 2024 election?
  3. According to bonus.com, is the probability that Joe Biden wins the Democratic nomination greater than, less than, or equal to 44%? Why?
  4. According to bonus.com, what are the odds against Kamala Harris wining the 2024 election?
  5. Suppose that a source gives Dwayne Johnson 500 to 1 odds of winning. What is the probability that Dwayne Johnson wins?

Exercise 1.6 Suppose that at some point your subjective probabilities for who would win the 2024 U.S. Presidential Election satisfied the following.

  • Joe Biden is 5 times more likely to win than Kamala Harris, and no other Democratic candidate has a chance of winning
  • The Democratic candidate and the Republican candidate are equally likely to be the winner
  • Donald Trump is twice as likely to win as any other Republican candidate.

Create a table of your subjective probabilities.

1.4 Probabilities, proportions, and percentages

It is helpful to think of probabilities as proportions—fractions or decimals—or percentages. When dealing with percentages (or proportions or probabilities) be sure to ask “percent of what?” Thinking in fraction terms, be careful to identify the correct reference group which corresponds to the denominator.

Example 1.13 The following two phrases contain exactly the same words, just in different orders. Which is larger, the numerical value of 1 or 2?

  1. The percentage of men that are greater than six feet tall who play in the National Basketball Association (NBA).
  2. The percentage of men that play in the National Basketball Association (NBA) who are greater than six feet tall.

Solution 1.13. The value of the percentage in (2) is much larger.

  1. There are over a billion men in the world who are greater than six feet tall, only a few hundred of whom play in the NBA. The percentage of men greater than six feet tall who play in the NBA is pretty close to 0.
  2. There only a few hundred men who play in the NBA, almost all of whom are greater than six feet tall. The percentage of men who play in the NBA that are greater than six feet tall is pretty close to 100%.

Think in terms of fractions. The corresponding fractions would have the same numerator—number of men who are both greater than six feet tall and play in the NBA—but vastly different denominators.

\[\begin{align*} (1): & \quad \frac{\text{number of men who are greater than six feet tall and play in the NBA}}{\text{number of men who are greater than six feet tall}}\\ (2): & \quad \frac{\text{number of men who are greater than six feet tall and play in the NBA}}{\text{number of men who play in the NBA}} \end{align*}\]

When working with multiple percentages (or proportions or probabilities), it is helpful to construct hypothetical tables of counts.

Example 1.14 Are Americans in favor of free tuition at public colleges and universities? Suppose that15

  • 83% of Democrats are in favor of free tuition
  • 60% of Independents are in favor of free tuition
  • 39% of Republicans are in favor of free tuition

Also suppose that16

  • 32% of Americans are Democrats
  • 42% of Americans are Independents
  • 26% of Americans are Republicans

We’ll use this information to investigate the following questions, as well as a few others.

  • What percentage of Americans are in favor of free college tuition?
  • What percentage of Americans who are in favor of free college tuition are Democrats?
  1. Donny Don’t answers the second question: “That’s easy. We’re told that 83% of Americans in favor of free college tuition are Democrats.” Is Donny necessarily correct? Explain without doing any calculations.

  2. For the next few parts, consider a hypothetical group of 10000 Americans and assume the percentages provided apply to this group. How many people in the group are Democrats?

  3. How many Americans in the group are Democrats who are in favor of free college tuition?

  4. Fill in the counts in each cell of the following “two-way” table.

    Democrat Independent Republican Total
    In favor of free tuition
    Not in favor of free tuition
    Total 10000
  5. What percentage of Americans in this group who are in favor of free college tuition are Democrats? Answer as unreduced fraction, a decimal (proportion), and a percent.

  6. Suppose we had started with a hypothetical group of 100,000 Americans. How would the table of counts change? Would the answer to the previous part change?

  7. Now answer the original question: What percentage of Americans who are in favor of free college tuition are Democrats? Hint: do we need to know how many Americans there are?

  8. What percentage of Americans who are Democrats are in favor of free college tuition? Answer as unreduced fraction, a decimal (proportion), and a percent.

  9. What percentage of Americans are Democrats in favor of free college tuition? Answer as unreduced fraction, a decimal (proportion), and a percent.

  10. Compare the unreduced fractions for the previous three parts. What is the same? What is different?

  11. What percentage of Americans are in favor of free college tuition? Answer as unreduced fraction, a decimal (proportion), and a percent.

  12. Suppose that we were only told that 61.9% of Americans overall support free tuition, and that we not given the values 83%, 60%, 39%. Would we be able to complete the two-way table?

Solution 1.14.

  1. Donny is confusing two different percentages, which refer to two different groups.

    • We are given that 83% of Democrats are in favor of free college tuition. This percentage applies to Democrats; among Democrats what percentage are in favor of free college tuition?
    • What we want is the percent of Americans in favor of free tuition who are Democrats. This percentage applies to Americans in favor of free tuition; among Americans in favor of free tuition what percentage are Democrats?
  2. We are given that 32% of Americans are Democrats. Of the 10000 Americans, 32%, that is 3200, are Democrats. (\(10000 \times 0.32 = 3200\))

  3. We are given that 83% of Democrats are in favor of free college tuition. Out of the 3200 Democrats, 83%, that is 2656 are in favor of free tuition. (\(3200 \times 0.83 = 2656\))

  4. We fill in the total count for each party first, then we determine the number who are in favor of free tuition within each party. For example, of the 10000 Americans, 42%, that is 4200, are Independent, and 60% of the 4200 Independents are in favor of free tuition. (\(4200 \times 0.6 = 2520\))

    Democrat Independent Republican Total
    In favor of free tuition 2656 2520 1014 6190
    Not in favor of free tuition 544 1680 1586 3810
    Total 3200 4200 2600 10000
  5. Look at the “in favor” row of the table. Out of 6190 Americans in this group who are in favor of free college tuition, 2656 are Democrats. Since \(\frac{2656}{6190}\approx 0.429\), about 42.9% of Americans in this group who are in favor of free college tuition are Democrats.

  6. If we had started with a hypothetical group of 100,000 Americans for which the given percentages applied, then the count in every cell in the table would be 10 times greater. However, ratios and percentages would still be the same. The answer to the previous part would not change; it would still be \(\frac{26560}{61900} =\frac{2656}{6190}\approx 0.429\).

  7. Now we are interested in Americans in general rather than the 10000 Americans in our hypothetical group. But as the previous part illustrates, the relative percentages will be the same regardless of the size of the group, assuming that the percentages provided in the setup apply to the group. We don’t need to know how many Americans there are; we can just start with a nice round number like 10000 and construct a hypothetical table of counts representing the proper percentages. Since we’re assuming the percentages provided in the setup apply to Americans, we can say that 42.9% of Americans who are in favor of free college tuition are Democrats.

  8. We were told that the percentage of Americans who are Democrats that are in favor of free college tuition is 83%. But we can also look at the Democrat column in the table: \(\frac{2656}{3200} = 0.83\). Pay careful attention to the difference in wording between this part and the previous one.

  9. Out of 10000 Americans, 2656 are Democrats in favor of free college tuition, so \(\frac{2656}{10000}= 26.56\%\) of Americans are Democrats in favor of free college tuition.

  10. There are subtle but important differences in wording between the percentages of interest in the previous three parts. Note that the numerator is the same in each part: 2656, the number of Americans in the group who are both Democrats and in favor of free tuition. But the denominators are different, each corresponding to a different reference group

    • the percentage of Americans who are in favor free tuition… (denominator of 6190)
    • the percentage of Americans who are Democrats… (denominator of 3200)
    • the percentage of Americans… (denominator of 10000)
  11. Out of 10000 Americans, 6190 are in favor of free college tuition, so 61.9% of Americans are in favor of free college tuition.

  12. Even if 61.9% of Americans overall support free tuition, it would not be safe to assume that 61.9% of Democrats support, 61.9% of Independent support, and 61.9% of Republicans support. We would expect support to vary by party, but without such information we would not be able to complete the two-way table.

Two-way tables (a.k.a., contingency tables) of counts are a useful tool for probability problems dealing with two “dimensions” (like political party and free tuition support). For the purposes of constructing the table and computing related probabilities, any value can be used for the hypothetical17 total count18.

Figure 1.8 provides a visual representation of Example 1.14. The mosaic plot in Figure 1.8 (a) has three bars, each representing a political party. The widths of the bars are scaled based on the proportions of Americans within each party. The breaks within each bar are scaled based on the proportions who do and do not support free tuition within the party. The mosaic plot in Figure 1.8 (b) has the roles of the dimensions reversed, and displays how party affiliation varies based on support for free tuition or not.

(a) Support for free tuition within party.
(b) Political party affiliation based on support for free tuition or not.
Figure 1.8: Mosaic plots for Example 1.14.

In Example 1.14, we needed information about support for free tuition within in each party to fill in the table. That is, it was not enough to know that 61.9% of Americans overall support free tuition. In general, knowing probabilities of individual events alone is not enough to determine probabilities of combinations of them.

Example 1.15 Suppose19 that 47% of American adults20 have a pet dog and 25% have a pet cat.

  1. Start constructing a two-way table. What are the two “dimensions”?
  2. Donny Don’t says, “72% (which is 47% + 25%) of American adults have a pet dog or a pet cat.” Is that necessarily true? Under what circumstance (however unrealistic) would this be true? Construct a two-way table for this scenario.
  3. Given only the information provided, what is the smallest possible percentage of American who adults have a pet dog or a pet cat (or both)? Under what circumstance (however unrealistic) would this be true? Construct a two-way table for this scenario.
  4. Donny Don’t says that 11.75% (which is 47% \(\times\) 25%) of Americans have both a pet dog and a pet cat. Explain to Donny why that’s not necessarily true. Without further information, what can you say about the percentage of American adults who have both a pet dog and a pet cat?
  5. For the remaining parts, suppose that 14% of American adults have both a pet dog and a pet cat. Construct a corresponding two-way table.
  6. What is the percentage of American adults who have a pet dog or a pet cat (or both)?
  7. Donny Don’t says, “I thought ‘or’ means ‘add’. In Example 1.7 I added 19% and 16% to get the probability that the Braves or the Rays win. Why can’t I add 47% and 25% to get the percentage who have a pet dog or a pet cat?” Explain the difference between the two examples, and help Donny correct his error.
  8. Donny Don’t says, “Wait, you told me that the percentage who have a pet dog or pet cat includes those who have both, so why are you telling me to subtract 14% after I add 47% and 25%?” Does Donny have a point? Explain.
  9. What percentage of American adults who have a pet cat also have a pet dog? Is it 47%?
  10. What percentage of American adults who do not have a pet cat have a pet dog? Is this the same value as in the previous part?

Solution 1.15.

  1. There is a cat dimension—has cat or not—and a dog dimension—has dog or not. The table has four interior cells.

    Has dog No dog Total
    Has cat
    No cat
    Total
  2. Donny’s conclusion isn’t necessarily true because some people have both a pet dog and a pet cat. It’s theoretically possible that 72% have a pet dog or a pet cat, but this would only be true if absolutely no Americans have both a pet dog and a pet cat (which is obviously not realistic). The two-way table corresponding to Donny’s claim is

    Has dog No dog Total
    Has cat 0 25 25
    No cat 47 28 75
    Total 47 53 100
  3. The situation in the previous part corresponds to the largest possible value, 72%, which occurs when the percentage who have both a dog and cat is as small as possible (0%). Now we consider the reverse situation. The percentage who have both a dog and cat can’t be greater than the percentage who have a cat, 25%. It is theoretically possible for the percentage who have both a dog and cat to be 25%, but only if every person who has a cat also has a dog, which isn’t realistic. The two-way table would be

    Has dog No dog Total
    Has cat 25 0 25
    No cat 22 53 75
    Total 47 53 100

    Thus the smallest possible percentage of American adults who have a pet dog or a pet cat is 47%.

  4. In the first two parts of this problem we have provided two theoretically possible (though unrealistic) scenarios of how Donny’s claim would be false: if no Americans who have a pet cat have a pet dog, and if all Americans who have a pet cat also have a pet dog. Obviously, somewhere between 0% and 100% of Americans who have a pet cat also have a pet dog, but what is this percentage? Donny’s claim would only be true if exactly 47% of American adults who have a pet cat also have a pet dog. (Equivalently, his claim would be true if exactly 25% of American adults who have a pet dog also have a pet cat.) But all we are given is that 47% of American adults in general have a pet dog. The likelihood of having a pet dog could plausibly change based on whether or not the adult has a cat. We need more information about the relationship between pet dog and pet cat ownership before we can determine what percentage of American adults have both. Without further information, all we can say is that between 0% and 25% of Americans have both a pet dog and a pet cat.

  5. If 14% of American adults have both a pet dog and a pet cat the two-way table is

    Have dog No dog Total
    Have cat 14 11 25
    No cat 33 42 75
    Total 47 53 100
  6. 58% of American adults have a pet dog or a pet cat (58 = 14 + 11 + 33). In other words, 42% of of American adults have neither a pet dog nor a pet cat.

  7. The difference between the two examples is that it’s not possible for both the Braves and Rays to win the World Series in the same year, but it is possible for an American adult to have both a pet dog and a pet cat. By adding 47% and 25%, Donny has double-counted the 14% who have both a dog and a cat. Donny can correct for the double-counting by subtracting 14% from his 72%: 47 + 25 - 14 = 58.

  8. Donny is correct that “or” includes both, so we do want to count those who have both a pet dog and a pet cat. The problem with adding 47% and 25% is that it double-counts both. Think of 47% as 14% + 33% and 25% as 14% + 11%; when we add 47% and 25% we add the 14% twice. We only want to count those who have both a pet dog and a pet cat once, which is why we subtract 14%.

  9. Out of the 25 (hypothetical) adults who have a pet cat, 14 also have a pet dog, and \(\frac{14}{25} = 0.56\). So 56% of American adults who have a pet cat also have a pet dog. American adults who have a pet cat are more likely than American adults in general to have a pet dog.

  10. Out of the 75 (hypothetical) American adults who do not have a pet cat, 33 have a pet dog, and \(\frac{33}{75} = 0.44\). So 44% of American adults who do not have a pet cat have a pet dog. American adults with pet cats are more likely than American adults without pet cats to have a pet dog.

(a) Proportion who have a pet dog for those with and without pet cats.
(b) Proportion who have a pet cat for those with and without pet dogs.
Figure 1.9: Mosaic plots for Example 1.15.

We can treat probabilities like proportions or percentages, so what’s the difference? Remember that probabilities measure likelihoods of events corresponding to random (uncertain) phenomena. The probability of an event can be interpreted as a long run proportion. For example, if we randomly select an American adult what is the probability that they have a pet dog? We can imagine repeatedly selecting American adults; if 47% of American adults have a pet dog, then the proportion of randomly selected adults that have a pet dog will converge to 0.47 in the long run. A probability represents a theoretical long run value. A proportion or percentage typically represents an observed short run value. In Example 1.15, we assumed 47%, 25%, and 14% applied to all American adults, but the values actually come from a random sample of just hundreds of American adult respondents.

In Example 1.13, switching the places of the words “greater than six feet tall” and “play in the NBA” resulted in two very different percentages. Without going into a grammar lesson, pay careful attention to how probabilities (or proportions or percentages) are worded. Think carefully about what the ordering of the words represents, and look out for words like “if” or “given” which signify information that influences probabilities.

Example 1.16 A NY Times article titled “When They Warn of Rare Disorders, These Prenatal Tests are Usually Wrong” investigated the efficacy of noninvasive prenatal screenings (NIPS, usually blood tests) for microdeletions (small missing pieces of chromosomes) that cause a wide range of conditions. The results of the screenings are used to provide genetic counseling for pregnant women. The article claims the screenings are in widespread use: “One large test maker, Natera, said that in 2020 it performed more than 400,000 screenings for one microdeletion—the equivalent of testing roughly 10 percent of pregnant women in America.”

We’ll investigate the screening for 22q11.2 deletion syndrome, a.k.a., DiGeorge syndrome, a disorder caused when a small part of chromosome 22 is missing. Medical problems associated with DiGeorge syndrome include heart defects, poor immune system function, a cleft palate, complications related to low levels of calcium in the blood, and delayed development with behavioral and emotional problems.

Suppose that if we randomly select a pregnant woman who is screened21

  • The probability that the baby actually has DiGeorge syndrome is 0.00025.
  • If the baby has DiGeorge syndrome, the probability that the test returns a positive result22 is 0.9.
  • If the baby does not have DiGeorge syndrome, the probability that the test returns a (false) positive is 0.0026.

We’ll investigate a few things, but our main question of interest is: If the screening for a randomly selected pregant woman returns a positive result23, what is the probability that the baby actually has DiGeorge syndrome?

  1. Before proceeding, make a guess for the probability in question; do you think it is closest to 0.1, 0.3, 0.5, 0.7, or 0.9?
  2. Donny Don’t says, “A baby either has DiGeorge syndrome or not so 0.90 and 0.0026 should add up to 1, and 0.0026 should really be 0.1.” Explain to Donny why 0.9 and 0.0026 don’t need to add to 1, and what 0.1 represents in this context.
  3. Considering a hypothetical population of screenings, interpret the probabilities as percents in context.
  4. Construct a hypothetical two-way table of counts.
  5. Compute and interpret the probability that the test returns a positive result. (For this and the remaining parts, express your answer first as an unreduced fraction based on the table, then as a decimal, and interpret the value in words as a percent.)
  6. Compute and interpret the probability the test result is correct.
  7. The value in the previous part seems pretty high. However, explain why we should not assess the effectiveness of the screening based on the probability that the test is correct alone. Hint: consider a separate test that never returns a positive result; what would be the probability that this test is correct?
  8. Recall the original question: If the screening for a randomly selected pregnant woman returns a positive result, what is the probability that the baby actually has DiGeorge syndrome? Compute and interpret this probability.
  9. In light of your original guess, is the previous answer surprising? Explain why the probability is so low. Hint: consider the hypothetical counts in the table.
  10. If the screening for a randomly selected pregnant woman returns a positive result, how many times more likely is it for the baby to not have DiGeorge syndrome than to have it?
  11. Compare the probability of having DiGeorge syndrome before and after the positive test. How much more likely is it for a baby who tests positive to have DiGeorge syndrome than one for whom the test result is unknown?
  12. If the screening for a randomly selected pregnant woman does not return a positive result, what is the probability that the baby does not have DiGeorge syndrome?
  13. NIPS are in widespread use, but what if the tests were only used for pregnant women with known risk factors? Suppose that among pregnant women who have a known risk factor for DiGeorge syndrome24 the probability that the baby has DiGeorge syndrome is 10 times greater, 0.0025 instead of 0.00025. If the screening for a randomly selected pregnant woman with this risk factor returns a positive result, what is the probability that the baby actually has DiGeorge syndrome? How does this compare to the value in part 8?

Solution 1.19.

  1. We don’t know what you guessed, but many people guess 0.7 or 0.9. Afterall, it seems like the test is positive for most babies that have DiGeorge syndrome, and positive for only a small percentage of babies that don’t have it. But this argument ignores one important piece of information: most babies do not have DiGeorge syndrome. We’ll see the influence this has below.

  2. The probabilities do not need to add to 1 because they apply to different groups: 0.9 to babies with DiGeorge syndrome, and 0.0026 to babies without DiGeorge syndrome. What Donny really needs to consider is this: Among babies with DiGeorge syndrome, the test result is either positive or not. If 0.9 is the probability that a baby with DiGeorge syndrome tests positive, then 0.1 is the probability that a baby with DiGeorge syndrome does not test positive; both probabilities apply to babies with DiGeorge syndrome. Likewise, if 0.0026 is the probability that a baby without DiGeorge syndrome tests positive, then 0.9974 is the probability that a baby without DiGeorge syndrome does not test positive25.

  3. Considering a hypothetical population of babies (of pregnant women who are screened):

    • 0.025% of babies have DiGeorge syndrome
    • 90% of babies with DiGeorge syndrome test positive
    • 0.26% of babies without DiGeorge syndrome test positive

    Be very careful with 0s. For example, mistaking 0.025 percent for 0.025 can have a huge impact.

  4. There are two dimensions: whether or not the baby has DiGeorge syndrome, and whether or not the test is positive. Assuming 1000000 babies (of pregnant women who are screened), 0.025% or 250 have DiGeorge syndrome and 999750 do not. Of the 250 who have DiGeorge syndrome, 90% or 225 test positive (\(225 = 250 \times 0.9\)). Of the 999750 who do not have DiGeorge syndrome, 0.26% or 2599 test positive (\(2599 = 999750 \times 0.0026\)).

    Has DiGeorge Does not have DiGeorge Total
    Positive 225 2599 2824
    Not positive 25 997151 997176
    Total 250 999750 1000000
  5. Imagine we have one million cards and we write “positive” on 2824 of them. Shuffle the cards, select one and note if it is positive or not, then replace the card and repeat. If we repeat this process many times, the proportion of selections that return a positive result will converge to \(\frac{2824}{1000000} = 0.0028\). In practice, we randomly select a pregnant woman who is screened and see if the test result is positive. In the long run, 0.28% of pregnant women who are screened test positive for DiGeorge syndrome.

  6. The test is correct for 225 babies who have DiGeorge syndrome and test positive and for 997151 babies who do not have DiGeorge syndrome and do not test positive. \(\frac{225 + 997151}{1000000} = 0.9974\). The test result is correct for 99.74% of pregnant women who are screened for DiGeorge syndrome.

  7. A screening that never returned a positive result would be correct for all the babies without DiGeorge syndrome and incorrect for all the babies with it, so the probability that this screening is correct is just the probability that a baby does not have DiGeorge syndrome, 0.99975, which is even greater than the value in the previous part. Any screening is going to divide the participants into two groups—those who test positive and those who do not—and we want to consider how effective the test is within each of these groups, not just its overall accuracy.

  8. Look at the “positive” row of the table. Among the 2824 babies who test positive, 225 have DiGeorge syndrome, so the probability that a baby who tests positive has DiGeorge syndrome is \(\frac{225}{2824} = 0.0797\). 7.97% of babies who test positive have DiGeorge syndrome.

  9. The value from the previous part seems low to many people. Only 7.97% of babies who test positive actually have DiGeorge syndrome? The counts in the table help us see why this value is so low. It is true that the test is correct for most babies with DiGeorge syndrome (225 out of 250) and incorrect only for a small proportion of babies without DiGeorge Syndrome (2599 out of 999750). But since relatively few babies have DiGeorge syndrome, the sheer number of false positives (2599) swamps the number of true positives (225). A high percentage of the positive tests are due to babies who do not have DiGeorge syndrome; that is, most of the positives are false positives. See Figure 1.10 for an illustration.

  10. The probability that a baby who tests positive does not have DiGeorge syndrome is \(\frac{2599}{2824} = 0.9203\). A baby who tests positive is 11.55 times more likely to not have DiGeorge syndrome than to have it. (\(\frac{0.9203}{0.0797} = 11.55\))

  11. Before observing the test result, the probability that a baby has DiGeorge syndrome is 0.00025. The probability that a baby who tests positive has DiGeorge syndrome is \(0.0797\). A baby who tests positive is about 319 times more likely to have DiGeorge syndrome than a baby for whom the test result is not known (\(\frac{0.0797}{0.00025} = 319\)). So while 0.0797 is still small in absolute terms, the probability of having DiGeorge syndrome given a positive test is much larger relative to the probability of having DiGeorge syndrome before the test result is known.

  12. Look at the “not positive” row of the table. Among the 997176 babies who do not test positive, 997151 do not have DiGeorge syndrome, so the probability that a baby who does not test positive does not have DiGeorge syndrome is \(\frac{997151}{997176} = 0.999975\). In contrast to the positives, almost all of the negative results are true negatives.

  13. Redo the table with 0.00025 replaced by 0.0025.

    Has DiGeorge Does not have DiGeorge Total
    Positive 2250 2594 4844
    Not positive 250 994906 995156
    Total 2500 997500 1000000

    Among pregnant women with this risk factor, the probability that a baby actually has DiGeorge syndrome given a positive test is \(\frac{2250}{4844} = 0.464\). The probability of having a baby with DiGeorge syndrome given a positive test is 5.83 times greater among those with this risk factor than for pregnant women in general. (\(0.464 / 0.0797 = 5.83\).)

Remember to ask “percentage of what”? For example, the percentage of babies who have DiGeorge syndrome that test positive is a very different quantity than the percentage of babies who test positive that have DiGeorge syndrome.

Likewise, always ask “probability of what”? For example, the probability that a baby who has DiGeorge syndrome tests positive is a very different quantity than the probability that a baby who tests positive has DiGeorge syndrome; in the first probability we are given that the baby has DiGeorge syndrome, in the second that the baby tests positive. Probabilities are often conditional on information; look out for words like “if” or “given” which signify this information. Revising or changing the order of information will usually change probabilities.

“Posterior” conditional probabilities (e.g., probability of DiGeorge syndrome given a positive test) can be highly influenced by the original unconditional “prior” probabilities (e.g. probability of DiGeorge syndrome), sometimes called the base rates. Example 1.16 and Figure 1.10 illustrate that when the base rate for a condition is very low and the test for the condition is less than perfect there can be a relatively high probability that a positive test is a false positive. The last part of Example 1.16 illustrates how changing the prior unconditional probability (base rate) influences the posterior conditional probability.

Figure 1.10: Illustration of Example 1.16. Each dot represents the baby of a randomly selected pregnant woman who is screened; there are 4000 dots. Only one of these 4000 babies has DiGeorge syndrome (4000 * 0.00025 = 1), represented by the dark blue dot in the top left corner. The other 3999 dots represent babies without DiGeorge syndrome. Suppose the baby with DiGeorge syndrome tests positive. Among the 3999 babies without DiGeorge syndrome, 11 test positive (roughly 0.26% of 3999); these false positives are represented by the light blue dots in the first row. There are 12 positive results, only 1 of which is a true positive. That is, the probability that a positive test result is a true positive is 1/12 = 0.083. (The values aren’t quite the same as in Example 1.16 due to some rounding, but the picture conveys the idea.)
(a) Conditioning on DiGeorge Syndrome status.
(b) Conditioning on test result.
Figure 1.11: Mosaic plots for Example 1.16.

People have a tendency to ignore base rates; you probably did if your original guess in Example 1.16 was 0.7 or 0.9. Don’t neglect the base rates when evaluating probabilities! We will discuss the role that base rates play and how to revise probabilities in light of new information in much more detail later.

We close this section with a brief tangent relating to the discussion in Section 1.2.3. In Example 1.16 there is uncertainty due to the random selection, uncertainty about whether the test will be positive or not, and uncertainty that someone who tests positive actually has the condition. You might consider these three different kinds of randomness, with three different interpretations of corresponding probabilities. For example, you might interpret the probability that a randomly selected person has the condition (0.00025) as a long run relative frequency; however, once the person is selected and tests positive they either have the condition or not—we just don’t know for sure—so you might interpret the probability that they have the condition given a positive test (0.0797) differently. The point is: how we interpret the probabilities does not affect how we solve the problem. The probabilities involved in Example 1.16 “fit together” in the same way regardless of the interpretation; given the context and values 0.00025, 0.9, and 0.0026 we must arrive at 0.0797. Furthermore, to make sense of the value 0.0797 we used both long run relative frequency and relative degrees of likelihood interpretations. We will treat many examples like Example 1.16; we will generally not distinguish between different types of randomness, and we will use interpretations of probability interchangeably.

1.4.1 Exercises

Exercise 1.7 In each of the following, which is greater: (a) or (b)? Or are they equal? Or is there not enough information to decide?

  1. Surfing
    1. The probability that a randomly selected Californian likes to surf.
    2. The probability that a randomly selected American is a Californian who likes to surf
  2. Cal Poly alums
    1. The probability that a California resident is a Cal Poly alum.
    2. The probability that a Cal Poly alum is a California resident

Exercise 1.8 Continuing Example 1.15.

  1. Is the overall percentage of American adults who have a pet cat closer to the value in part 9 or part 10? Why do you think that is?
  2. What percentage of American adults who have a pet dog also have a pet cat?
  3. What percentage of American adults who do not have a pet dog have a pet cat?

Exercise 1.9 Continuing Example 1.15. Now suppose that 11.75% of American adults have both a pet cat and a pet dog (as Donny claimed was necessarily true). Redo Example 1.15 and Exercise 1.8 under this assumption. What is true in this scenario that wasn’t true in Example 1.15?

Exercise 1.10 Suppose that you have applied to two graduate schools, A and B. Your subjective probability of being accepted is 0.6 for school A and 0.7 for school B.

  1. What is the largest possible probability of being accepted by both schools? Under what scenario (however unrealistic) would this be true? Explain.
  2. What is the smallest possible probability of being accepted by both schools? Under what scenario (however unrealistic) would this be true? Explain.
  3. Explain why the probability of being accepted by both schools is not necessarily 0.42.
  4. For the remaining parts, suppose your subjective probability of being accepted at both schools is 0.55. If you are accepted at school A, what is your probability of also being accepted at school B?
  5. If you are accepted at school A, what is your probability of not being accepted at school B?
  6. If you are not accepted at school A, what is your probability of being accepted at school B?
  7. If you are accepted at school B, what is your probability of also being accepted at school A?
  8. If you are not accepted at school B, what is your probability of being accepted at school A?
  9. How much more likely are you to be accepted at school A if you are accepted at school B than if you are not accepted at school B?
  10. How much more likely are you to be accepted at school A if you are accepted at school B compared to before receiving the decision from school B?

1.5 Conditioning on information

A probability is a measure of the likelihood or degree of uncertainty or plausibility of an event. A “conditional” probability revises this measure to reflect any additional information about the outcome of the underlying random phenomenon. In Example 1.16 the probability that a baby has DiGeorge syndrome is 0.00025, but if the screening returns a positive result then the probability increases to 0.0797. Always look out for words like “if” or “given” which signify information that influences probabilities.

Example 1.17 In each of the following parts, which of the two probabilities, (a) or (b), is greater, or are they equal? You should answer conceptually without attempting any calculations.

  1. Imagine that you randomly select, from birth records, a person who was born in 1950.

    1. The probability that a person born in 1950 lives to age 100.
    2. The probability that a person born in 1950 lives to age 100 given that they are alive in 2045.
  2. Imagine the 2032 U.S. Presidential Election (we’re doing so as of this writing in 2023). Assume (a) and (b) below are not 0.

    1. The probability that Dwayne “The Rock” Johnson wins the 2032 U.S. Presidential Election.
    2. The probability that Dwayne “The Rock” Johnson wins the 2032 U.S. Presidential Election given that he does not win the nomination of the Democratic or the Republican Party.

Solution 1.17.

  1. The probability in (b) is greater. Someone who has already lived to age 95 has a better chance of living at least 5 more years than a person has of living from birth to age 100. Think in fraction terms. The denominator in (a) is all people born in 1950; the denominator in (b) is all people born in 1950 who are alive in 2045. It’s the same numerator in each case—people born in 1950 who live until age 100—but (b) has a much smaller denominator, so the fraction corresponding to (b) is larger.

  2. This is more subjective, but our assessment is that the probability in (a) is greater. As of this writing (2023), the 2032 election is still many years away so there is a great deal of uncertainty about who will even run let alone win (to say nothing about uncertainty regarding changes in the U.S. and the world that might happen before 2032 and affect the election). Dwayne Johnson’s name has been tossed around in the media as a potential presidential candidate, so let’s say he has some non-zero probability of winning, represented by (a) (which could be very small). As of this writing, we would assess the probability of someone other than the Democratic or Republican nominee winning a U.S. presidential election to be very small, and some pretty major changes would need to occur in order for us to change that assessment. So the information that Dwayne Johnson is the nominee of neither party would lead us to decrease his probability of winning the election.

In a sense, all probabilities are conditional upon some information, even if that information is vague (“well, it has to be one of these possibilities”). Be careful to clearly identify what information is reflected in probabilities, and don’t make assumptions. In part 1 of Example 1.17 if we’re randomly selecting a person from 1950 birth records, then we shouldn’t assume that the person is alive today when evaluating the probability in (a); the selected person could have died between 1950 and now. That is, “the probability that a person born in 1950 lives to age 100” is not the same as “the probability that a person born in 1950 who is alive today lives to age 100”. In part 2 of Example 1.17, we shouldn’t assume that Dwayne Johnson actually runs for president in 2032; the value of our probability should reflect that uncertainty. That is, “the probability that Dwayne Johnson wins the 2032 U.S. Presidential Election” is not the same as “the probability that Dwayne Johnson wins the 2032 U.S. Presidential Election given that he declares himself a candidate”.

Example 1.18 Consider a group of 5 people: Harry, Fleur, Viktor, Cedric, Angelina. Suppose each of their names is written on a slip of paper and the 5 slips of paper are placed into a hat. The papers are mixed up and 2 are pulled out, one after the other without replacement. (That is, the first paper is not added back to the hat before selecting the second.)

  1. What is the probability that Harry is the first name selected?
  2. What is the probability that Harry is the second name selected?
  3. If you were asked question (2) before question (1), would your answer change? Should it?
  4. If Fleur is the first name selected, what is the probability that Harry is the second name selected?
  5. If Harry is not the first name selected, what is the probability that Harry is the second name selected?
  6. If Harry is the first name selected, what is the probability that Harry is the second name selected?
  7. Construct a hypothetical table corresponding to the results of the draws. Hint: one dimension represents the result of the first draw which is Harry or not, and the other dimension represents the second draw.
  8. If Fleur is the second name selected, what is the probability that Harry was the first name selected?

Solution 1.18.

  1. The probability that Harry is the first name selected is 1/5, which is an answer we think most people would agree with. There are 5 names which are equally likely to be the first one selected, 1 of which is Harry.

  2. The probability that Harry is the second name selected is also 1/5. Many people might answer this as 1/4, since after selecting the first person there are now 4 names left. But we show and discuss below that the unconditional probability is 1/5.

  3. Your answer to question (2) certainly shouldn’t change depending on whether we ask question (1) first. But perhaps after seeing question (1) you are implicitly assuming that Harry has not been selected first? But there is nothing in question (2) that gives you any additional information about what happened on the first card.

  4. If Fleur is the first name selected, the probability that Harry is the second name selected is 1/4. We think most people find this intuitive. If Fleur is first, there are 4 cards remaining, equally likely to be the next card, of which 1 is Harry.

  5. 1/4, similar to the previous part. If Harry is not selected first, there are 4 cards remaining, equally likely to be the next card, of which 1 is Harry.

  6. If Harry is the first name selected, the probability that Harry is the second name selected is 0 since the cards are drawn without replacement.

  7. Here is a two-way table of 1000 hypothetical repetitions. Harry is selected first in \(1000\times 1/5 = 200\) repetitions, in which case he can’t be selected second. Among the 800 repetitions where Harry is not selected first, he is selected second in \(800\times 1/4 = 200\) repetitions. Harry is selected second in 200 of the 1000 total repetitions, so the probability that Harry is selected second is \(200/1000 = 1/5\).

    Harry first Harry not first Total
    Harry second 0 200 200
    Harry not second 200 600 800
    Total 200 800 1000
  8. If Fleur is the second name selected, the probability that Harry was the first name selected is 1/4. It doesn’t really matter what is “first” and what is “second”, but rather the information conveyed. In part 4, what’s important is that you know that one of the cards selected was Fleur, so the probability that the other card selected is Harry is 1/4. But this part conveys the same information.

Be careful to distinguish between conditional and unconditional probabilities. A conditional probability reflects additional information about the outcome of the random phenomenon. In the absence of such information, we must continue to account for all the possibilities. When computing probabilities, be sure to only reflect information that is known. Especially when considering a phenomenon that happens in stages, don’t assume that when considering what happens second that you know what happened first.

In Example 1.18, the question “if Harry is not the first name selected, what is the probability that Harry is the second name selected?” involves a conditional probability, since we are given additional information about the outcome; it is no longer possible that Harry was the first name selected. The question “What is the probability that Harry is the second name selected?” involves an unconditional probability. The words “the probability that Harry is the second name selected” alone do not imply that Harry was not selected first; we still need to account for the possibility that Harry was selected first.

Imagine shuffling the five cards and putting two on a table face down. Now point to one of the cards and ask “what is the probability that THIS card is Harry?” Well, all you know is that this card is one of the five cards, each of the 5 cards is equally likely to be the one you’re pointing to, and only one of the cards is Harry. Should it matter whether the face down card you’re pointing to was the first or second card you laid on the table? No, the probability that THIS card is Harry should be 1/5, regardless of whether you put it down first or second.

Now turn over one other card that you’re not pointing to, and see what name is on it. The probability that the card you’re pointing to is Harry has now changed, because you have some information about the outcome of the shuffle. If the card you turned over says Harry, you know the probability that the card you’re pointing to is Harry is 0. If the card you turned over is not Harry, then you know that the probability that the card you’re pointing to is Harry is 1/4. It is not “first” or “second” that matters; it is whether or not you have obtained additional information by revealing one of the cards.

Another way of asking the question is: Shuffle the five cards; what is the probability that Harry is the second card from the top? Without knowing any information about the result of the shuffle, all you know is that Harry should be equally likely to be in any one of the 5 positions, so the probability that he is the second card from the top should be 1/5. It is only after revealing information about the result of the shuffle, say the top card, that the probability that Harry is in the second position changes.

We often start with a probability for an event and then revise it whenever additional information becomes available. The original, unconditional probability is called a “prior probability” or “base rate”; the revised, conditional probability is called a “posterior probability”. We will discuss the role that base rates play and how to revise probabilities in light of additional information in much more detail later. But remember: Don’t neglect the base rates when evaluating probabilities!

Example 1.19 Within both the colleges of Agriculture and Architecture at Cal Poly, about 49% of admitted students are female, about 84% of admitted students went to high school in CA, and the median GPA of admitted students is about 4.1.

An orientation group of 100 newly admitted Cal Poly students includes 75 students in Agriculture and 25 students in Architecture. A student is randomly selected from this group. The selected student is Maddie, who is female, went to high school in CA, and had a high school GPA of 4.1.

  1. If you are trying to decide which college Maddie is in, is the information that she is female, went to high school in CA, and had a high school GPA of 4.1 helpful? Why?
  2. Donny Don’t says, “The information about Maddie applies equally well to Agriculture or Architecture and doesn’t help us decide which college she’s in, so it’s just 50/50. Given the information about Maddie, the conditional probability that she is in Agriculture is 0.5.” Do you agree? If not, what is the conditional probability that Maddie is in the college of Agriculture given the information about her? Hint: what was the last sentence before this example!

Solution 1.19.

  1. The information tells us that Maddie would be pretty typical for either group, so it doesn’t help us decide.
  2. Donny has neglected the base rate. The group has 75 students in Agriculture and 25 students in Architecture. If we randomly select a student from this group, the unconditional probability that the selected student is in Agriculture is 0.75 (the base rate). Yes, the information about Maddie applies equally well to either college, but that means we have no reason to revise our probability from the base rate. The conditional probability that Maddie is in Agriculture given the information about her is still 0.75.

We use the terminology “unconditional” and “conditional” probability, but any probability is conditional on some information. A better way to think about it might just be “before” and “after”. When new information becomes available we revise our probability. The unconditional (prior) probability is the probability before the revision, reflecting any information that was previously available. The conditional (posterior) probability is the probability after the revision, updated to reflect the newly available information. Probabilities are often updated sequentially as more information becomes available, with the conditional (posterior) probability after one piece of information is received becoming the unconditional (prior) probability before the next.

Do not think of “unconditional” as “based on no information”. Any probability should reflect as much relevant information as possible, even if it plays the role of an unconditional probability.

Example 1.20 Marge takes a home pregnancy test which turns out positive. She decides to perform an analysis like in Example 1.16 to find the conditional probability that she is actually pregnant given the positive test. She knows she’ll need a base rate—that is, an unconditional probability that she is pregnant before the positive test—so she Googles “what percent of women are pregnant?” Information from the CDC suggests that 10.2% of American women aged 15–44 are pregnant at any point in time. Is 0.102 an appropriate value for Marge to use as her base rate? Or is it too high or too low? How will this influence her conditional probability that she is actually pregnant given the positive test? Explain. Hint: did Marge Google the right question?

Solution 1.20. The value 0.102 is too low for Marge to use as a base rate, and so her conditional probability that she is actually pregnant given the positive test will also be too low. The idea is that a woman who takes a pregnancy test is more likely to be pregnant than a woman in general, simply because many women who take pregnancy tests do so because they suspect they might be pregnant26.

If we randomly select an American woman aged 15-44 to take a pregnancy test, then 0.102 would be an appropriate base rate. But we are not told that Marge is a randomly selected woman, so we should not assume that she is. What we do know is that “Marge took a pregnancy test”. Why? From our perspective, it seems more plausible that she took the test because she suspected she might be pregnant as opposed to just “randomly” deciding to take it. And if there is a reason for her to suspect she might be pregnant, then it’s more likely that she actually is and our base rate should reflect that, resulting in a value greater than 0.102. It should be even clearer from Marge’s perspective; she knows why she took the test (e.g., missed period, morning sickness, etc.) and her personal base rate should reflect her information.

Now we’re not saying it would be easy to determine what an appropriate base rate is, but it should definitely be greater than 0.102. Whatever the prior probability of pregnancy, it will influence the posterior probability of pregnancy given a positive test. Knowing that the test is positive will lead us to revise our probability of pregnancy upward, but if we start with a prior probability that is too low, then our posterior probability will also be too low. (See the last part of Example 1.16 for a related example.)

1.5.1 Exercises

Exercise 1.11 In each of the following, which is greater: (a) or (b)? Or are they equal? Or is there not enough information to decide? Answer without doing any computations.

  1. Shuffle a standard deck of playing cards (52 cards, 4 of which are aces) and deal 5 cards without replacement.
    1. The probability that the first card dealt is an ace.
    2. The probability that the firth card dealt is an ace.
  2. Shuffle a standard deck of playing cards (52 cards, 4 of which are aces) and deal 5 cards without replacement.
    1. The probability that the first card dealt is an ace.
    2. The probability that the fifth card dealt is an ace if the first card dealt is an ace.
  3. Randomly select a college student.
    1. The probability that the selected student went surfing yesterday.
    2. The probability that the selected student went surfing yesterday if they student attends Cal Poly.
  4. Both ballerinas and football players are graceful and nimble. A group of people contains both some ballerinas and some football players. A person is randomly selected from this group; the person is graceful and nimble.
    1. The probability that the selected person is a ballerina.
    2. The probability that the selected person is a football player.

1.6 Probability of what?

A probability takes a value in the sliding scale from 0 to 1 (or 0% to 100%). Throughout the book we will study how to compute probabilities in many situations. But don’t just focus on computation. Always remember to interpret probabilities properly. This section covers a few ideas to keep in mind when interpreting probabilities.

Example 1.21 In each of the following parts, which of the two probabilities, a or b, is greater, or are they equal? You should answer conceptually without attempting any calculations.

  1. Flip a coin which is known to be fair 10 times.

    1. The probability that the results are, in order, HHHHHHHHHH.
    2. The probability that the results are, in order, HHTHTTTHHT.
  2. Flip a coin which is known to be fair 10 times.

    1. The probability that all 10 flips land on H.
    2. The probability that exactly 5 flips land on H.

Solution 1.21.

  1. Many people would say the probability in (b) is larger, but the probabilities in (a) and (b) are equal27. The sequence in (b) seems to look “more random”. However, the probability of seeing that particular sequence—H then H then T then H then T…—is the same as seeing the sequence H then H then H then H then H… If the coin is fair and the flips are independent, all possible sequences of flips are equally likely. Think of it this way: choose any flip, say the third. Then that flip is equally likely to be H (as in the third flip for (a)) or T (as in the third flip for (b)). No matter which flip it is, or the results of the other flips, any flip is equally likely to be H or T.

    Of course, our response assumes that the coin is fair. If the coin is known to be fair then the sequences in (a) and (b) are equally likely. However, if we actually observed the sequence in (a) we might suspect that the coin is actually not fair. There is an important difference between assumption and observation.

  2. The probability in (b) is larger. Contrast this to the previous part. There is only one sequence which results in 10 heads, HHHHHHHHHH. However, there are many sequences28 which result in exactly 5 heads—HHHHHTTTTT, HTHTHTHTHT, TTHHTHTHHT, etc—of which HHTHTTTHHT is just one possibility.

Pay close attention to the differences in the two parts in Example 1.21. The first part involves probabilities of the particular outcome sequence. The second part involves more general “events” that the particular outcome sequence might satisfy. The following provides another example of this “particular” versus “general” dichotomy.

Example 1.22 In each of the following parts, which of the two probabilities, a or b, is greater, or are they equal? You should answer conceptually without attempting any calculations.

  1. In the Powerball lottery there are roughly29 300 million possible winning number combinations, all equally likely.

    1. The probability you win the next Powerball lottery if you purchase a single ticket, 4-8-15-16-42, plus the Powerball number, 23.
    2. The probability you win the next Powerball lottery if you purchase a single ticket, 1-2-3-4-5, plus the Powerball number, 6.
  2. Continuing with the Powerball

    1. The probability that the numbers in the winning number are in a row.
    2. The probability that the numbers in the winning number are not in a row.

Solution 1.22.

  1. Many people would say the probability in (a) is larger, since the sequence in (a) looks “more random”, but the probabilities in (a) and (b) are equal. Since the outcomes are equally likely, the probability that any single sequence is the winning number is (roughly) 1/300,000,000. If you don’t believe this, ask yourself: Why would the Powerball conduct its drawing in such a way that some numbers are more likely to be winners than others? And if some numbers were more likely than others, why wouldn’t people know about this?
  2. The probability in (b) is larger. Contrast this to the previous part. There are only a handful of winning numbers for which the numbers are in a row: 1 through 6, 2 through 7, 3 through 8, etc. However, almost all of the 300 million possibilities do not have numbers in a row.

When interpreting probabilities, be careful not to confuse “the particular” with “the general”.

“The particular:” A very specific event, surprising or not, often has low probability.

  • For a fair coin, observing the particular sequence HHTHTTTHHT in 10 flips is just as likely as observing HHHHHHHHHH.
  • The probability that the winning powerball number is 4-8-15-16-42-(23) is exactly the same as the probability that the winning powerball number is 1-2-3-4-5-(6).
  • The probability that you get a text from your best friend at 7:43pm two weeks from today inviting you to dinner at your favorite pizza place after you’ve just ordered pizza from there is probably pretty small. None of these items — getting a text, having a friend invite you to dinner, ordering pizza from your favorite pizza place — is unusual, but the chances of them all combining in this way at this particular time are fairly small.

“The general:” While a very specific event often has low probability, if there are many like events their combined probability can be high.

  • There are many possible sequences of 10 coin flips which result in 5 heads.
  • For almost all of the possible Poweball combinations the numbers are not in order.
  • The probability that some time in the next month or so a friend texts a dinner invitation is probably fairly high.

Example 1.23 Which of the following two probabilities is greater, or are they equal? You should answer conceptually without attempting any calculations.

  1. The probability that you win the next Powerball lottery if you purchase a single ticket.
  2. The probability that someone wins the next Powerball lottery. (FYI: especially when the jackpot is large, there are hundreds of millions of tickets sold.)

Solution 1.23. The probability in (2) is much greater. (This is an understatement.)

  • The probability that a specific powerball ticket is the winning number is about 1 in 300 million. So if you buy a single ticket, it is extremely unlikely that you will win.
  • However, if hundreds of millions of powerball tickets are sold, the probability that someone somewhere wins is pretty high.

We elaborate on these ideas below.

The probability that you win the next Powerball lottery if you purchase a single ticket is about 1 in 300 million. Let’s put this number in perspective. There are about 260 million adults (over age 18) in the U.S.30 Suppose that the name of every adult in the U.S. is written on a 3x5 index card. These 260 million cards stacked would stretch about 62 miles high; that’s commonly referenced as the distance from the earth to where space begins. The stack would also weigh about 400 tons, about as much 4 blue whales. Suppose we shuffle the cards—much easier said than done—and select one. The probability that your name is on the selected card is about 1 in 260 million. The chances that your next Powerball ticket is the winning number are a little less likely than this31.

However, if hundreds of millions of Powerball tickets are sold, the probability that someone somewhere wins is pretty high. For example, if 500 million tickets are sold then there is a roughly 80% chance that at least one ticket has the winning number (under certain assumptions).

Even if an event has extremely small probability, given enough repetitions of the random phenomenon, the probability that the event occurs on at least one of the repetitions is often high32.

Consider the headline of this news article from 2010: “Man mauled by bear after lightning strike”. We certainly feel sorry for this poor man, but just how unlikely is such an occurrence? Let’s look a little closer.

The headline seems to imply that the man got struck by lightning and then, while he was trying to reach safety, a bear attacked. But the mauling occurred four years after the lightning strike. Getting mauled by a bear and struck by lightning within one’s lifetime is certainly much more likely than both happening on the same day.

“Getting struck by lightning” is often colloquially used to describe a rare event, but how unlikely is it? One study estimates that about 250,000 people in the world are struck by lightning each year, and the National Weather Service estimates that the probability that you get struck by lightning within your lifetime is 1/15,000. Still not very likely, but maybe not as rare as you might think.

Getting mauled by a bear is much less likely than being struck by lightning. There are only about 40 bear attacks of humans each year. However, if the headline had been “Man bitten by shark after lightning strike” or “Man attacked by mountain lion after lightning strike” or “Man trampled by moose after lightning strike” it probably would have been equally newsworthy. Thus we should account for all similar animal attacks, not just bear attacks, when assessing the likelihood.

The probability that you get struck by lightning and mauled by a bear today is certainly very small. But the probability that someone somewhere within their lifetime gets both struck by lightning and attacked by an animal is orders of magnitude higher. In general, even though the probability that something very specific happens to you today is often extremely small, the probability that something similar happens to someone some time is often quite high.

When something surprising happens, don’t just consider the probability of that particular outcome. Rather, consider all the other possible outcomes that would have been equally surprising if they had occurred, and consider the probability that at least one of them would happen (which often turns out to be not so small). From this perspective, most coincidences turn out to be much more probable than they seem at first.

When assessing a probability, always ask “probability of what”? Does the probability represent “the particular” or “the general”? Is it the probability that the event happens in a single occurrence of the random phenomenon, or the probability that the event happens at least once in many occurrences? Keep these questions in mind when assessing numerical probabilities. Remember that something that has a “one in a million chance” of happening to you today will happen to about 7000 people in the world every day.

1.6.1 How likely is “likely”?

Consider each of the following statements (presented in no particular order). If you were to assign a numerical value to the probability of rain tomorrow in each case, what would it be?

  • It is likely that it will rain tomorrow.
  • It is probable that it will rain tomorrow.
  • There is little chance that it will rain tomorrow.
  • It is highly unlikely that it will rain tomorrow.
  • We doubt that it will rain tomorrow.
  • There is a very good chance that it will rain tomorrow.
  • It is almost certain that it will rain tomorrow.
  • It is improbable that it will rain tomorrow.
  • It will probably not rain tomorrow.
  • It is highly likely that it will rain tomorrow.
  • It is almost certain that it will not rain tomorrow.
  • It will probably rain tomorrow.
  • We believe that it will rain tomorrow.
  • There is a better than even chance that it will rain tomorrow.

In a study conducted in the 1960s (Barclay et al. (1977)), twenty-three military officers were asked to provide numerical probabilities for a similar set of statements (“It is almost certain that the Soviets will invade Czechoslovakia”, “It is highly likely that the Soviets will invade Czechoslovakia”, etc.) For most of the statements there was considerable variability in the responses. For example,

  • Probabilities assigned to “almost certain” ranged from 0.75 to 0.99.
  • Probabilities assigned to “highly likely” ranged from 0.50 to 0.99.
  • Probabilities assigned to “likely” ranged from 0.30 to 0.90.
  • Probabilities assigned to “probable” ranged from 0.25 to 0.90.

Recent similar studies have produced comparable results that exhibit wide variability in the numerical values people associate with words describing probabilities. Studies like these provide evidence of differences in how people perceive probability.

One way to avoid ambiguity is to provide numerical values of probability rather than just vague words like “likely” or “probable”. However, people can still perceive numbers differently. An event that has a probability of 0.4 is four times more likely than an event with a probability of 0.1, but how likely is either event? Depending on their background, people might interpret a probability of 0.4 differently. Someone familiar with baseball knows that 0.4 would be an extremely high value for the probability that a particular batter successfully gets a hit in at bat, while someone familiar with basketball knows that 0.4 would be extremely low value for the probability that a particular player successfully scores on a free throw attempt. An audience that routinely encounters probabilities close to 0 will perceive a probability of 0.4 differently than one that commonly deals with probabilities around 0.5. When reporting probabilities, it is helpful to provide some benchmarks from a context more familiar to the audience to provide a sense of scale33.

For example, for people from California you might provide benchmarks based on county populations. If you randomly select a single California resident (about 39 million people) there is, roughly, a

  • 25% chance they are from Los Angeles County (about 9.7 million people)
  • 8% chance they are from Orange County (about 3.2 million people)
  • 0.7% chance they are from San Luis Obispo County (about 300 thousand people)
  • 0.1% chance they are from Calaveras County (about 46 thousand people)

Providing a few values in this manner can help the audience gauge the magnitude of a probability like 0.2 or 0.0134.

Be sure to keep in mind “the particular versus the general”. When reporting the value of a probability, provide enough contextual detail so that the audience can distinguish “the particular from the general”. If the probability of interest represents “the particular”, then provide benchmarks in terms of “the particular”; likewise for “the general”. In the California county example, we could use the values provided for a single randomly selected resident to benchmark “particular” probabilities (what is the probability this happens to me?). For “general” probabilities we could revise in terms like “if we randomly select 100 CA residents, there is a 50% chance that at least one resident is from San Luis Obispo County”.

So how likely is “likely”? We hope you see that there is no clear answer to this question. When communicating probabilities, our best advice is to:

  • Report numerical values instead of ambiguous words.
  • Provide enough contextual detail to identify “probability of what?”. In particular, be careful to distinguish “the particular from the general”.
  • Provide the value of a few helpful benchmark probabilities in a familiar context to provide a sense of scale.
  • Remember that despite your best efforts, people might still perceive probabilities differently.

1.6.2 Exercises

Exercise 1.12 Create your own analogy for how unlikely that a single ticket wins the Powerball lottery. How would you describe a 1 in 300 million chance?

Exercise 1.13 In each of the following, which is greater: (a) or (b)? Or are they equal? Or is there not enough information to decide?

  1. Election interference
    1. The probability that Russian agents successfully interfere with the 2024 U.S. Presidential election through posts on Facebook with the goal of helping the Republican candidate get elected.
    2. The probability that non-U.S. actors attempt to interfere with the 2024 U.S Presidential election.
  2. Roll a six-sided die which is known to be fair 10 times.
    1. The probability that the results are, in order, 1223334444.
    2. The probability that the results are, in order, 4614253226.
  3. Roll a six-sided die which is known to be fair 10 times.
    1. The probability that the results are, in order, 1234561234.
    2. The probability that you roll each of the six faces at least once.

Exercise 1.14 Search online to find some benchmark probabilities (e.g., 0.25, 0.1, 0.01, 0.001, 0.0001, etc.) in a context that is interesting and familiar to you.

1.7 “Expected” value

We are often interested in numerical values associated with a random phenomenon. If we flip a coin 100 times we might be interested in the number of flips which land on heads or the longest streak of heads. Forecasting tomorrow’s weather, we might be interested in the high temperature or amount of precipitation. Predicting the next Superbowl, we might be interested in the total number of points scored or the margin of victory.

When dealing with uncertain numerical quantities, we often ask: what value do we expect? In this section we’ll introduce how we might answer this question. We’ll also give a first warning to be careful about what we mean by “expected” values.

Example 1.24 This is a very simplified example illustrating the basic idea of how insurance works. Every year an insurance company sells many thousands of car insurance policies to drivers within a particular risk class. Each policyholder pays a “premium” of $1000 at the start of the year, and the insurance company agrees to pay for the cost of all damages that occur during the year. Suppose that each policy incurs damage of either $0, $5000, $20000, or $50000 with the following probabilities.

Amount of damage ($) Profit ($) Probability
0 1000 0.910
5000 -4000 0.070
20000 -19000 0.019
50000 -49000 0.001

The insurance company’s profit on a policy at the end of the year is the difference between the premium of $1000 and any damage paid out. For example, a policy that incurs no damage results in a profit of $1000; a policy that incurs $5000 in damage results in a profit of -$4000 (that is, a loss of $4000) for the insurance company.

  1. Interpret the probabilities 0.91, 0.07, 0.019, and 0.001 as long run relative frequencies in this context.
  2. Compute the probability that a policy results in a positive profit for the insurance company.
  3. Imagine 100,000 hypothetical policies. How many of these policies would you expect to result in a profit of $1000? -$4000? -$19000? -$49000?
  4. What do you expect the total profit for these 100,000 policies to be?
  5. What do you expect the average profit per policy for these 100,000 policies to be?
  6. Compute the probability that a policy has a profit equal to the value from part 5.
  7. Compute the probability that a policy has a profit greater than the value from part 5.
  8. Is the value from part 5 the most likely value of profit for a single policy?
  9. Is the value from part 5 the profit you would expect for a single policy?
  10. Explain in what sense the value from part 5 is “expected”.

Solution 1.24.

  1. If the insurance company sells many such policies, 91% of policies will incur $0 in damage and result in a profit of $1000, 7% of policies will incur $5000 in damage and result in a profit of -$4000, etc.

  2. In this scenario a policy results in a positive profit for the insurance company only if it incurs no damage, so the probability is 0.91.

  3. Over many policies, we would expect 91% of policies to result in a profit of $1000, so we would expect \(100000\times 0.91 = 91000\) of these policies to result in a profit of $1000. Continue in a similar manner to complete the “expected number of policies” column in the table below.

  4. We expect 91000 polices to each result in a profit of $1000, for a total expected profit from these policies of \(91000 \times 1000 = 91000000\). Continue in a similar manner to complete the “expected total profit” column in the table below. The expected total profit for all 100000 policies is $22,000,000.

    Amount of damage ($) Net profit ($) Probability Expected number of policies Expected total profit ($)
    0 1000 0.910 91000 91,000,000
    5000 -4000 0.070 7000 -28,000,000
    20000 -19000 0.019 1900 -36,100,000
    50000 -49000 0.001 100 -4,900,000
    Total NA 1 100000 22,000,000
  5. The expected total profit for these 100000 policies is $22,000,000, so the expected average profit per policy is $220. (\(\frac{22000000}{100000} = 220\))

  6. The probability that a policy has a profit equal to $220 is 0. In this scenario, the only possible values of profit are 1000, -4000, -19000, and -49000.

  7. The probability that a policy has a profit greater than $220 is 0.91. Over many policies, 91% of policies have a profit greater than the expected average profit per policy.

  8. No! Not only is $220 not the most likely value, it’s not even a possible value of the profit of a policy.

  9. No! It’s not even possible for a single policy to have a profit of $220.

  10. Over many policies, we expect the average profit per policy to be $220. That is, $220 is the long run average profit per policy.

A single policy either results in a positive profit for the insurance company or not. For a group of policies we can compute the relative frequency of a positive profit: count the number of policies with a positive profit and divide by the total number of policies. The probability that a policy results in a positive profit can be interpreted as a long run relative frequency over many policies.

But there is more to the profit on a policy than whether it is positive or not; we are also interested in the amount of profit. For a group of policies we can compute the average profit: add up the values of the profits and divide by the total number of policies. The long run average value over many policies is called the “expected value” of profit.

Be careful: the term “expected value” is somewhat of a misnomer. The expected value is not necessarily the value we expect on a single repetition of the random phenomenon. In Example 1.24 the expected value of profit is $220, but it is not possible for a single policy to have a profit of $220. Rather, $220 is the average profit per policy we expect to see in the long run over many policies. A probability can be interpreted as a long run relative frequency; an expected value can be interpreted as a long run average value.

Example 1.25 Continuing Example 1.24. We considered what we would expect for 100000 hypothetical policies, but what about an unspecified large number of policies?

  1. Imagine that we have recorded the profit for each of a large number of policies (not necessarily 100000). Explain in words the process by which you would compute the average profit per policy. (In other, more general, words: how do you compute an average of a list of numbers?)
  2. Given that the profit of any policy is either 1000, -4000, -19000, or -49000, how could we simplify the calculation of the sum in the previous part? Write a general expression for the average profit per policy in this scenario.
  3. What do you think the expression in the previous part converges to in the long run?
  4. Explain how the value in the previous part is a “probability-weighted average value”.
  5. Compute the expected value of damage (not profit) as a probability-weighted average value.
  6. Interpret the value from the previous part as a long run average value in this context.
  7. How is the expected value of profit related to the expected value of damage? Does this make sense? Why?

Solution 1.25.

  1. Compute an average in the usual way: add up all the values and divide by the number of values. If there were 100000 policies, we would add up the 100000 values of profit and divide by 100000; this is basically what we did in part 5 of Example 1.24).
  2. The profit of any policy is either 1000, -4000, -19000, or -49000, so when we add up the profits of many policies we’re adding the same values over and over. If 91000 values are equal to 1000, then we add \(1000 + 1000 + \cdots\), 91000 times; in other words, the contribution to the sum for the policies with a profit of $1000 is \(1000\times 91000\). For a general number of policies, the contribution to the sum of the policies with a profit of 1000 is \(1000\times\) number of policies with a profit of 1000. The average profit per policy can be expressed as \[ {\scriptscriptstyle \frac{1000\times \text{number with profit of 1000} + (-4000)\times \text{number with profit of -4000} + (-19000) \times \text{number with profit of -19000} + (-49000) \times \text{number with profit of -49000} }{\text{total number of policies}} } \]
  3. Divide through by the total number of policies \[ {\scriptstyle 1000\times \frac{\text{number with profit of 1000}}{\text{total number of policies}} + (-4000)\times \frac{\text{number with profit of -4000}}{\text{total number of policies}} + (-19000) \times \frac{\text{number with profit of -19000}}{\text{total number of policies}} + (-49000) \times \frac{\text{number with profit of -49000}}{\text{total number of policies}} } \] The fractions in the expression above are the relative frequencies of each value of profit. In the long run (over many policies), the relative frequencies will converge to the respective probabilities; for example, \(\frac{\text{number with profit of 1000}}{\text{total number of policies}}\) will converge to 0.91. Therefore, the long run average profit per policy is \[ 1000\times 0.910 + (-4000)\times 0.070 + (-19000) \times 0.019 + (-49000) \times 0.001 = 220 \]
  4. The profit for each policy is either 1000, -4000, -19000, or -49000. However, when computing the average profit per policy we can’t simply average these four values since many policies will have a profit of 1000 and very few will have a profit of -49000. In a sense, 1000 will have more “weight” in the average profit per policy than -49000 does. Therefore, we multiply each possible value of profit by its corresponding probability and then add to get a “probability-weighted average value” which reflects how likely each possible value is.
  5. Multiply each value of damage by its corresponding probability and then sum. \[ 0\times 0.910 + 5000\times 0.070 + 20000 \times 0.019 + 50000 \times 0.001 = 780 \]
  6. Over many policies we expect the long run average damage per policy to be $780.
  7. The expected value of profit is $1000 minus the expected value of damage: \(220 = 1000 - 780\). The profit on any policy is the difference between the premium of $1000 and the amount of damage, so it makes sense that the average profit per policy is $1000 minus the average damage per policy.

The previous example illustrates that the long run average value is also the probability-weighted average value. That is, we multiplied each possible value by its corresponding probability and then summed. Interpreting an expected value as a probability-weighted average value might be more natural in situations involving subjective probabilities.

Example 1.26 As of this writing there are currently fifty U.S. states, with Hawaii being the last new state admitted (in 1959). How many new states will be admitted in the next twenty years? Sam assesses that 0 new states is most plausible, and 10 times more plausible than 1 new state, which is 10 times more plausible than 2 new states, which is 10 times more plausible than 3 new states, and more than 3 states has negligible plausibility. What is Sam’s expected value of the number of new states? How do you interpret this value?

Solution 1.26. First compute Sam’s subjective probabilities. Let 3 new states represent 1 “unit”, then 2 new states represents 10 units, 1 new state 100 units, and 0 new states 1000 units, for a total of 1111 units. Rescale the values so they sum to 1 to obtain Sam’s subjective probabilities.

Number of new states Units Probability (as fraction, rounded decimal)
0 1000 1000/1111 = 0.9001
1 100 100/1111 = 0.0900
2 10 10/1111 = 0.0090
3 1 1/1111 = 0.0009
Total 1111 1

Now compute Sam’s expected value as a probability-weighted average value \[ 0\left(\frac{1000}{1111}\right) + 1\left(\frac{100}{1111}\right) + 2\left(\frac{10}{1111}\right) + 3 \left(\frac{1}{1111}\right) = \frac{123}{1111} = 0.1107 \]

This does not mean that Sam expects 0.11 new states; it’s not possible to have 0.11 new states. Rather, 0.11 represents an average of the possible values of the number of new states, weighted to reflect the relative plausibilities of the possible values.

We will see other interpretations of expected values later. In particular, we will see in what sense an expected value can be interpreted as a “best guess” of an uncertain random quantity.

Returning to Example 1.24, the insurance company’s profit is the policyholder’s loss. Most policyholders pay the $1000 premium and incur no damage. Furthermore, the expected value of the loss for a policyholder is $220. Why are people willing to buy insurance despite this? Individuals live in the short run; any individual is either going to incur damage or not. Insurance is protection against the risk of a large loss. Even though the probability of occurrence is small, incurring a large amount of damage like $50000 would have serious financial consequences for most individuals. Many people are willing to trade a sure but relatively small monetary loss like $1000 to protect against an unlikely but serious loss like $50000.

On the other hand, insurance companies operate in the long run. Over many policies, an insurance company is virtually guaranteed an average profit of $220 per policy. The insurance company will lose, and lose big, on some policies. But these losses are more than offset in the long run by the relatively small profits on the large number of policies that incur no damage.

1.7.1 Exercises

Exercise 1.15 A roulette wheel has 18 black spaces, 18 red spaces, and 2 green spaces, all the same size and each with a different number on it. Suppose you bet $1 on black. If the wheel lands on black, you win your initial bet back plus an additional $1; otherwise you lose the money you bet. That is, your net winnings are either +1 or -1 dollar.

  1. Compute the probability-weighted average value of your net winnings.
  2. Is the value in the previous part the net winnings you would expect on a single bet?
  3. Explain in what sense the value from the first part is “expected”.

Exercise 1.16 A roulette wheel has 18 black spaces, 18 red spaces, and 2 green spaces, all the same size and each with a different number on it. Suppose you bet $1 on 7. If the wheel lands on 7, you win your initial bet back plus an additional $35; otherwise you lose the money you bet. That is, your net winnings are either +35 or -1 dollar.

  1. Compute the probability-weighted average value of your net winnings.
  2. Is the value in the previous part the net winnings you would expect on a single bet?
  3. Explain in what sense the value from the first part is “expected”.

Exercise 1.17 Compare Exercise 1.15 and Exercise 1.16. Are the two $1 bets — bet on black versus bet on 7 — identical? In what way are these betters the same? In what ways are they different?

1.8 A brief introduction to simulation

Here’s a seemingly simple problem. Flip a fair coin four times and record the results in order. For the recorded sequence, compute the proportion of the flips which immediately follow a H that result in H. What value do you expect for this proportion? (If there are no flips which immediately follow a H, i.e. the outcome is either TTTT or TTTH, discard the sequence and try again with four more flips.)

For example, the sequence HHTT means the first and second flips are heads and the third and fourth flips are tails. For this sequence there are two flips which immediately followed heads, the second and the third, of which one (the second) was heads. So the proportion in question for this sequence is 1/2.

So what value do you expect for this proportion? We think it’s safe to say that most people would answer 1/2. After all, it shouldn’t matter if a flip follows heads or not, right? We would expect half of the flips to land on heads regardless of whether the flip follows H, right? We’ll see there are some subtleties lurking behind these questions.

To get an idea of what we would expect for this proportion, we could conduct a simulation: flip a coin 4 times and see what happens. Table 1.4 displays the results of a few repetitions; each repetition consists of an ordered sequence of 4 coin flips for which the proportion in question is measured. (Flips which immediately follow H are in bold.)

Table 1.4: Simulated outcomes for 10 sets of four flips of a fair coin, each set with at least one flip following a flip of H.
Repetition Outcome Flips that follow H H that follow H Proportion of H followed by H
1 HHTT 2 1 0.5
2 HTTH 1 0 0
discarded TTTH 0 NA try again
3 HTHT 2 0 0
4 THHH 2 2 1
5 HHTT 2 1 0.5
6 HHHT 3 2 0.667
7 HTTH 1 0 0
8 THHT 2 1 0.5
9 THTT 1 0 0
10 HHHH 3 3 1

Table 1.5 and Figure 1.12 summarize the results of these 10 repetitions of the simulation.

Table 1.5: Table of observed values of the proportion of H followed by H and their frequencies for the ten sets of coin flips in Table 1.4.
Proportion of H following H Frequency Relative frequency
0.0000 4 0.4
0.5000 3 0.3
0.6667 1 0.1
1.0000 2 0.2
Figure 1.12: Dot plot: Each dot represents the proportion of H followied by H for a set of four coin flips in Table 1.4

We can keep repeating the above process to investigate what happens in the long run. Rather than actually flipping coins, we use a computer to run a simulation. Figure 1.13 summarizes the results of 1,000,000 successful repetitions of the simulation, after discarding the sequences with no flips following H. (We will see how to program, run, and summarize simulations like this in later chapters.) While you can’t see the individual “dots” like in Figure 1.12 each dot would represent a sequence of 4 coin flips (with at least one flip following a H) and the value being plotted is the proportion of H followed by H for that sequence. The results would look like those in Table 1.4, albeit a table with 1,000,000 rows (after discarding rows with no flips immediately following H.)

(a) Table of observed values and frequencies.
(b) Spike plot: heights of the spikes represent the relative simulated relative frequencies of each possible value of proportion of H followied by H for a set of four coin flips
Figure 1.13: Proportion of flips immediately following H that result in H for 1,000,000 sets of 4 coin flips, each set having at least one flip immediately following H. For example, the proportion of H followed by H is 0 in 429,123 of the sets.

We asked the question: what would you expect for the proportion of the flips which immediately follow a H that result in H? That depends on how we define what’s “expected”. If we are interested in the value that is most likely to occur when we flip a coin four times, then the answer is 0: we see that in the long run a little over 40% of the sets resulted in a proportion of 0, while only about 30% of sets resulted in a value of 1/2. We see that Figure 1.13 (b) is not centered at 1/2; a higher percentage of repetitions resulted in a proportion below 1/2 than above 1/2. We think that most people would find this surprising.

Another way to interpret “expected” is as “average”; in particular, expected value can be interpreted as the long run average value. After 1,000,000 repetitions, each involving a set of four fair coin flips, we have 1,000,000 simulated values of the proportion of H following H. We could then average these values: add up all the values and divide by 1,000,000.

\[ {\scriptscriptstyle \frac{0\times 429123 + (1/2)\times 285906 + (2/3) \times 71414 + 1 \times 213557}{1000000} = 0.404 } \]

It turns out that the long run average value is 0.405, which is not 1/2. Again, we think most people find this surprising.

A reminder: the term “expected value” is somewhat of a misnomer. We are not saying that if we flip a coin four times we would expect the proportion of H following H for that set of flips to be 0.405. In fact, in any single set of four fair coin flips the only possible values for the proportion of H followed by H are 0, 1/2, 2/3, and 1. So in a set of four coin flips it’s not possible to see a proportion of 0.405. Rather, 0.405 is the average value of the proportion of H followed by H that we would expect to see in the long run over many sets of four fair coin flips.

The simulation provides evidence that, counter to our intuition, we would expect the proportion of H followed by H to be less than 0.5. So what is happening here? We will return to this example several times to investigate these results more closely. We’ll leave it as a mystery for now, but observe that:

  • The study of probability can involve some subtleties and our intuition isn’t always right.
  • Simulation is an effective way of investigating probability problems, and can reveal interesting and surprising patterns.
  • There is a difference between (1) the probability that a flip following H lands on H and (2) the proportion of flips following H which result in H in a fixed sequence of fair coin flips35.
  • In a fixed number of fair coin flips, the proportion of flips following H which result in H is “expected” to be less than the true probability of H, even though the trials are independent.

1.8.1 Exercises

Exercise 1.18 In a group of \(n\) people, what is the probability that at least two people in the group people have the same birthday?

  1. Consider \(n=30\): what do you think the probability that at least two people in a group of 30 people share a birthday is: 0-20%, 20-40%, 40-60%, 60-80%, 80-100%?
  2. How large do you think \(n\) needs to be in order for the probability that at least two people share a birthday to be larger than 0.5?
  3. We’ll save the answer to these questions for later, but they turn out to be unintuitive to many people, and simulation can shed some light. Explain how, in principle, you might perform a simulation using cards or slips of paper to estimate the probability that at least two people have the same birthday when \(n=30\). You can make some simplifying assumptions: Ignore multiple births and February 29 and assume that the other 365 days are all equally likely36.

1.9 Why study coins, dice, cards, and spinners?

Many probability problems involve “toy” situations like flipping coins, rolling dice, shuffling cards, or spinning spinners. These situations might seem unexciting, or at least not very practically meaningful. However, coins and spinners and the like provide familiar, concrete situations which facilitate understanding of probability concepts. Furthermore, simple situations often provide insight into real and complex problems. The following is just one illustration.

Many basketball players and fans alike believe in the “hot hand” phenomenon: the idea that making several shots in a row increases a player’s chances of making the next shot. However, the consensus conclusion of thirty years of studies on the hot hand, beginning with the seminal study Gilovich, Vallone, and Tversky (1985), had been that there is no statistical evidence that the hot hand in basketball is real. As a result, many statisticians regularly caution against the “hot hand fallacy”: the belief that the hot hand exists when, in reality, the degree of streaky behavior typically observed in sequential data is consistent with what would be expected simply by chance in independent trials.

The idea behind studies like Gilovich, Vallone, and Tversky (1985) is essentially the following. Consider a player who attempts 100 shots and makes 50%. If there is no hot hand, then we might expect the player to make 50% of shots both on attempts that follow hit streaks— usually considered three (or more) made attempts in a row—and on other attempts. Therefore, a success rate of 50% on both sets of attempts provides no evidence of the hot hand.

However, recent research of Miller and Sanjurjo (2018a), Miller and Sanjurjo (2018c), Miller and Sanjurjo (2018b) concludes that previous studies on the hot hand in basketball, starting with Gilovich, Vallone, and Tversky (1985), have been subject to a bias. After correcting for the bias, the authors find evidence in favor of the hot hand effect in basketball shooting, suggesting the hot hand fallacy is not a fallacy after all. One interesting aspect of these studies is that Miller and Sanjurjo’s methods are simulation-based.

Miller and Sanjurjo (2018a) introduced the coin flipping problem in Section Section 1.8) to illustrate the idea behind their research and the bias in previous studies. Consider again a player who attempts 100 shots and makes 50%. Even if there is no hot hand, Miller and Sanjurjo show that we would actually expect the player to have a shooting percentage of strictly less than 50% on the attempts which followed streaks, and strictly greater than 50% on the other attempts. The reason is similar to what we observed in the the coin flipping problem in Section 1.8: in a fixed number of trials, the proportion of H on trials following H is expected to be less than the true probability of H, even though the trials are independent. Therefore, for the example player a success rate of 50% on both sets of attempts actually provides directional evidence in favor of the hot hand. Properly acccounting for this bias leads to substantially different statistical analyses (i.e., p-values) and conclusions.

1.9.1 Exercises

Exercise 1.19 Find the value of some probability in a real world situation of interest to you. Then describe how this situation could be modeled with coins, dice, cards, or spinners. How would you use your “toys” to simulate the situation and approximate the probability of interest?

1.10 Chapter exercises

Exercise 1.20 True or false.

  1. Probability can be used to assess the likelihood or plausibility of an event associated with a phenomenon that only happens once.
  2. Probability can be used to assess the likelihood of an event associated with a phenomenon that happended in the past.
  3. The subjective and long run relative frequency interpretations of probability can be used interchangeably.
  4. Suppose the probability that a randomly selected CP student has an internship this summer is 0.2, and the probability that a randomly selected CP student is taking at least one course this summer is 0.4. True false: the probability that a randomly selected CP student either has an internship or is taking at least one class (or both) must be 0.6.
  5. Suppose the probability that a randomly selected CP student has an internship this summer is 0.2, and the probability that a randomly selected CP student is taking at least one course this summer is 0.4. True false: the probability that a randomly selected CP student both has an internship and is taking at least one class must be 0.08.

Exercise 1.21 Short answer.

  1. Your subjective probability that the price of regular unleaded gasoline at your favorite gas station in SLO stays above $5 per gallon throughout the next month is 0.95. How many times more likely than not is it for the price to stay above $5 per gallon through the next month?
  2. It is 3 times more likely than not that the high temperature tomorrow will be greater than 80 degrees F. What is the probability that the high temperature tomorrow will be greater than 80 degrees F?
  3. Laszlo, Nadja, and Nandor are having a tournament. Guillermo thinks that Nandor is 3.5 times more likely to win than Nadja, and Nadja is 2 times more likely to win than Lazlso. Find Guillermo’s probability that Nandor wins.

Exercise 1.22 In each of the following, which of a or b is strictly greater? Or are they equal? Or is there not enough information to decide? Explain your reasoning.

  1. Surfing Californians

    1. The probability that a randomly selected Californian likes to surf
    2. The probability that a randomly selected American is a Californian who likes to surf
    3. a and b are necessarily equal
    4. There is not enough information provided to decide whether a or b is strictly greater
  2. Cal Poly graduates

    1. The probability that a randomly selected Cal Poly graduate is a California resident
    2. The probability that a randomly selected California resident is a Cal Poly graduate
    3. a and b are necessarily equal
    4. There is not enough information provided to decide whether a or b is strictly greater
  3. Flips a coin which is known to be fair six times and record the results in sequence

    1. The probability that the first five flips are, in order, HHHHH
    2. The probability that the first six flips are, in order, HTTHTH
    3. a and b are necessarily equal
    4. There is not enough information provided to decide whether a or b is strictly greater
  4. I catch my child hiding in the bathroom washing chocolate off her face. I ask what her what she’s doing, and she says she just had to go the bathroom.

    1. The probability that my child just went to the bathroom.
    2. The probability that my child just went to the bathroom and has secretly been eating chocolate cake.
    3. a and b are necessarily equal
    4. There is not enough information provided to decide whether a or b is strictly greater
  5. Among American workers, 40% have a college degree and 10% belong to a labor union.

    1. 4%
    2. The percentage of American workers who have a college degree and belong to a labor union.
    3. a and b are necessarily equal
    4. There is not enough information provided to decide whether a or b is strictly greater

Exercise 1.23 Suppose

  • 85% of people have a dominant right hand.
  • 75% of people with a dominant right hand have a dominant right eye.
  • 55% of people who do not have a dominant right hand have a dominant right eye.
  1. What percent of people with a dominant eye have a dominant right hand?
  2. What percent of people with a dominant right eye do not have a dominant right hand?
  3. What percent of people without a dominant right eye have a dominant right hand?
  4. What percent of people with a dominant right hand have a dominant right eye?
  5. What percent of people have a dominant right eye and a dominant right hand?

  1. The Grand Duke of Tuscany posed this problem to Galileo, who published his solution in 1620. However, unbeknownst to Galileo, the same problem had been solved almost 100 years earlier by Gerolamo Cardano, one of the first mathematicians to study probability (David 1955).↩︎

  2. Or maybe not. We are considering heads and tails as the only outcomes, but what about a coin landing on its edge? It can happen, but the probability is very small; Murray and Teare (1993) estimates the probability that a U.S. nickel lands on its edge to be about 0.000167. Furthermore, there is evidence (Diaconis, Holmes, and Montgomery (2007), Aldous (2023), Bartoš et al. (2024)) that a coin is slightly more likely to land the same way it started. That is, if the coin starts facing heads up, the probability that it lands facing heads up is slightly greater than 0.5, but the difference is small. Quoting Section 7 of Diaconis, Holmes, and Montgomery (2007): “The classical assumptions of independence with probability 1/2 are pretty solid.” We hope you agree that assuming two equally likely outcomes is reasonable for practical purposes.↩︎

  3. We do not advocate gambling. We merely use gambling contexts to motivate probability concepts.↩︎

  4. There is a vast literature on how people make decisions when faced with uncertainty. Kahneman (2012) provides an excellent introduction.↩︎

  5. Diaconis and Skrms (2018) provides a nice introduction.↩︎

  6. For example, we are not distinguishing between “aleatoric variability” and “epistemic uncertainty”.↩︎

  7. This is the first of many spinners in this book. Our purpose in using spinners is not to advocate pie charts for summarizing data. Rather, we think spinners provide a concrete representation of probability distributions that helps facilitate understanding of difficult concepts.↩︎

  8. We can also solve this problem using algebra. Let \(x\) be the probability, as a decimal, that the Astros are the winner. (Again, it doesn’t matter which team is the baseline.) Then \(x\) is also the probability that the Dodgers are the winner, \(1.5x\) for the Rays, and \(3x\) for the Braves. The probability that one of the four teams wins is \(x + x + 1.5x + 3x = 6.5x\), so the probability of Other is also \(6.5x\). The probabilities in decimal form must sum to 1, so \(1 = x + x + 1.5x + 3x + 6.5x = 13x\). Solve for \(x=1/13\) and then plug in \(x=1/13\) to find the other probabilities; e.g., \(3x = 3(1/13) = 0.231\) for the Braves.↩︎

  9. This example is inspired by the famous “Linda problem” of Tversky and Kahneman (1982) which they used to illustrate the “conjunction fallacy”.↩︎

  10. You could also solve this with algebra. Let \(x\) be the probability that the Giants win, so \(19x\) is the probability that they don’t win. The probabilities must sum to 1, so set \(x + 19x = 1\) and solve for \(x\).↩︎

  11. Technically, Ron and Leslie could still have different subjective probabilities. Leslie would not agree to worse odds, but she would accept better if Ron offered them. For example, given a potential loss of $200, Leslie would also agree to a potential payout from Ron of $125 rather than $100. That is, Leslie would accept odds of 1.6 to 1 against (\(200/125 = 1.6\)), corresponding to a subjective probability of \(1/(1 + 1.6) = 0.385\). So Leslie’s subjective probability that Professor Ross has a TikTok account is at least 1/3. Similarly, Ron’s subjective probability that Professor Ross has a TikTok account is at most 1/3.↩︎

  12. Technically this is only true if the moneyline odds are positive, which is the case when the probability of winning the bet is less than 0.5. Negative moneyline odds, which occur when the probability of winning the bet is greater than 0.5, represent how much money must be wagered in order to receive a net profit of $100. For example, moneyline odds of -900 indicate that you must bet $900 to receive $1000, for a net profit of 1000-900 = 100. A bet at -900 moneyline odds results in a profit of $100 if the bet is won or a loss of the initial $900 stake otherwise; the amounts 900 and 100 are in a 9 to 1 ratio (in favor of winning), implying a probability of \(9/(1+9) = 0.90\) of winning the bet.↩︎

  13. We’re assuming Donny’s probability that the Dodgers don’t win is 50%. But if Donny’s probabilities don’t add to 100% why would we expect him to obey other consistency requirements? A fair question, but the point is that bad things can happen even if just one of the consistency requirements is violated. If we assume instead 35% as Donny’s probability that the Dodgers don’t win, we can construct a scenario similar to the one in this solution which guarantees us a sure profit with no risk.↩︎

  14. “Book” in the sense of a bookie taking bets, as opposed to a Dutch-language novel like De ontdekking van de hemel.↩︎

  15. These values are based on a study by the Pew Research Foundation conducted in January 2020.↩︎

  16. These values are based on surveys by Gallup, but the values change somewhat over time.↩︎

  17. Careful: we are only claiming that the total does not matter when constructing hypothetical tables. When collecting real data, the sample size matters a great deal. For example, a random sample of 1000 Americans provides a more precise estimate of the population proportion of all Americans who support free tuition than a sample of 100 Americans does. The Pew Research study was based on a random sample of over 12000 Americans.↩︎

  18. You can only run into problems if you round. Suppose we had started with a group size of 100. Then the top left cell in the table would have been 26.56. If we had rounded this to 27, our answers would change. So when dealing with a hypothetical table of counts, don’t round. If you are uncomfortable with decimal counts, just add a few zeros to your total count and try again↩︎

  19. This example is adapted from an article by Allan Rossman.↩︎

  20. The values in this example are based on a Washington Post article which uses data from the 2018 General Social Survey. An earlier Washington Post article discusses discrepancies in estimates of pet ownership.↩︎

  21. These values come from the NY Times article and a related study.↩︎

  22. This is called the sensivity of the test.↩︎

  23. We are treating “not positive” as “negative” but in practice there could also be inconclusive tests.↩︎

  24. DiGeorge syndrome isn’t the greatest example since most cases result from a purely random deletion on chromosome 22. However, DiGeorge syndrome can be inherited from a parent who has it.↩︎

  25. This is called the specificity of the test↩︎

  26. Pregnancy tests are included as part of health screenings for other reasons, but most women who take home pregnancy tests do so for pregnancy related reasons.↩︎

  27. And both equal to \(\left(\frac{1}{2}\right)^{10} =\frac{1}{1024}\)↩︎

  28. 252 out of 1024 possibilities in fact↩︎

  29. The exact count is 292,201,338. We will see how to compute this number later.↩︎

  30. Source: U.S. Census Bureau.↩︎

  31. The statistician Ron Wasserstein has provided several fanciful perspectives on the likelihood of winning the Powerball lottery.↩︎

  32. For an interesting investigation of this idea check out the Infinite Monkey Theorem Experiment at the site The Pudding.↩︎

  33. This idea was inspired by Randall Munroe.↩︎

  34. For probabilities closer to 1, we could report the chances of not being from these counties.↩︎

  35. The probability that a flip following H lands on H is 0.5. This example shows that the proportion of flips following H which result in H in a fixed sequence of coin flips is a “biased estimator” of the probability that a flip following H lands on H.↩︎

  36. Which isn’t quite true.↩︎