Statistics #04 — Introduction to Probability

Basic concepts such as Union, Intersection, and Conditional Probabilities, as well as Visualization with Venn Diagrams and Probability Trees

Rafael Bastos
Towards Data Science

--

Photo by Simon Berger on Unsplash

Table of Contents

  1. What is Probability
  2. How to calculate and display probabilities
  3. Visualizing probabilities with a Venn Diagram
  4. Intersection and Union
  5. Conditional Probabilities
  6. Visualizing probabilities with a Probability Tree
  7. Conclusion

Today, we’ll talk about probability. If you are starting your studies in statistics, or if you just want to recall some basic concepts, this series is for you!

What is Probability?

In one sentence, probability is how likely something is to happen.

For instance, when you check the weather forecast app in the morning before leaving home, and the app shows you that it’s going to be a sunny day with only a 5% chance of rain, you probably won’t bring an umbrella or a raincoat, because it’s unlikely to rain. It’s not impossible though, because weather forecasts aren’t exact all the time and even if it shows 0% of rain chance, it can rain. They are predictions made by collecting large amounts of data about temperature, humidity, wind, etc, to determine how the atmosphere conditions might evolve in the future.

So, probability can help people and companies make smart decisions considering the predicted outcomes, based on data collected and past experiences.

How to calculate and display probabilities

First of all, let’s check one of the simplest examples where we can think probabilistically: flipping a coin.

When you flip a coin, there are only two possible outcomes (events): heads or tails. With a standard coin, these two events are equally likely to occur, that is, we have a 50% chance of getting heads and a 50% chance of getting tails.

In this example, these two possible events can be represented as:

  • P(heads) — Probability of the coin landing on heads
  • P(tails) — Probability of the coin landing on tails

If we want to find P(heads), for instance, we need to divide the number of ways heads can occur by the total number of outcomes, as follows:

Probabilities can be represented as fractions, decimals, or percentages, as long as the total of all probabilities equals 1 or 100%. In this case, we can say that P(heads) = 1/2 or 0.5 or 50%.

To generalize, consider the following equation:

Where

  • P(A) is the probability of event A occurring;
  • n(A) is the number of ways A can occur;
  • n(S) is the total number of outcomes.

The total number of outcomes is also called the sample space, represented here by the letter S.

Another common notation found in every statistics book is the complementary event (A).

P(A′) is the probability that event A doesn’t happen and can be displayed as:

In our case, the probability of heads not happening is 50% as well.

Visualizing probabilities with a Venn Diagram

A simple way to visualizing probabilities is using a Venn diagram. Venn diagrams can help us visualize the probabilities of each occurrence, as well as their intersections and the sample space.

Image by author

Notice that the Venn diagram above visually provides us with valuable pieces of information about our probabilities. It shows two circles, representing the probabilities of getting heads or tails after flipping a coin. Observe that the two circles are separated from each other, because there is no intersection between these two events, that is, a coin can’t land with both heads and tails upwards.

It means that these two events (heads and tails) are mutually exclusive.

Observe that the sample space is also represented in our diagram. Here, heads and tails cover all the events within the entire sample space and there’s no other possible event besides them, represented by zero. It means that heads and tails are collectively exhaustive events.

Since we talked about mutually exclusive events, let’s now show a diagram where the events intersect.

Suppose we are picking a random card from a deck of cards. What’s the probability that the card picked is a King of a red suit? Let’s consider a deck containing 54 cards divided into 4 suits — Spades and Clubs (Black), Hearts and Diamonds (Red). Each suit contains cards numbered from 2 to 10, a Jack, a Queen, a King, and an Ace. In addition, we have two uncolored jokers.

So, how can we draw a Venn diagram to help us visualize the probability of getting a King of a red suit (Hearts or Diamonds)?

Image by author

This case is a bit more complex than the coin example. 4 out of 54 cards are Kings and 26 out of 54 are from red suits. However, these events can occur simultaneously and we have 2 possibilities to pick a card that is both a King and from a red suit, representing roughly 4% of the possibilities.

Notice that this time we have a value outside the circles (0.481). It represents the probability of the card picked being neither a King nor from a red suit.

Intersection and Union

To understand the concepts of intersection and union, let’s keep working with the last example. We already know P(King) and P(Red), as shown below:

We can write the probability of the card being a King and Red (the intersection between King and Red) as follows:

With this information in hand, we can determine another value. The probability of the card being King or Red (the union of A and B):

Conditional Probabilities

Here’s where things start to get more interesting! When we apply a condition to some probability, we are trying to determine the probability of an event A occurring, given that another event B has already occurred.

A conditional probability can be displayed as P(A|B), and we can find it using the equation below:

Going further, we now have another way to find the intersection between two events:

or

Since P(A∩B) = P(B∩A).

Visualizing probabilities with a Probability Tree

When we are dealing with conditional probabilities, it can get tricky to visualize them with a Venn diagram. That’s when a probability tree comes in handy. The basic structure of a probability tree is as follows:

Image by author

Now, let’s think about the card deck exercise we did earlier, but this time we are changing it a bit. Suppose we already know we’re holding a Red card. What’s the probability that this card is a King? And how can a probability tree help us visualize it?

First of all, let’s apply the equation for conditional probabilities. Notice that we are trying to measure P(King) given that the card is from a red suit.

Now let’s check the probability tree:

Image by author

Notice how the probability tree can help us understand the probabilities we’re dealing with. Bear in mind that it was a really simple example, and the values are rounded for better understanding. Depending on the events you are trying to predict, the tree representation might have several different additional branches.

Conclusion

I hope this article helped you get a quick grasp of probabilities, how to calculate them, and how you can take advantage of Venn diagrams and probability trees to visualize the probabilities you are working with.

This is not an exhaustive material on probability theory. If you want to go deeper into the subject, a step further would be to study other key theories, such as the Law of total probability and Bayes’ theorem.

To summarize, let’s review what we saw today.

Calculating probability

  • The probability of an event A happening is the number of ways in which A can occur, divided by the total number of possible outcomes.
  • Mutually exclusive events cannot occur at the same time (like heads and tails).
  • The complement of an event A is the event not happening. The probability of A not occurring is denoted by P(A′).

Intersection

  • The probability of events A and B occurring is the intersection of A and B.
  • It is represented by P(A∩B).
  • If A and B are mutually exclusive events, P(A∩B)=0.

Union

  • The probability of events A or B occurring is the probability of the union of A and B.
  • It is represented by P(A∪B).

Conditional Probability

  • It’s the probability that event A occurs, given that event B has already occurred.
  • It is represented by P(A|B).

Visualizing Probabilities

  • Venn Diagrams are great to visualize probabilities. It can display the probabilities of all events in the sample space, as well as their unions and intersections.
  • Probability Trees are more useful for visualizing complex probabilities, for instance, when conditions apply.

Reference

[1] Griffiths, D. Head First Statistics: A Brain-Friendly Guide. O’Reilly, 2008.

--

--