By the end of this lesson, students will be able to:
Identify and define discrete and continuous random variables given a real-world application.
Define and identify properties of discrete and continuous probability distributions.
Create a discrete probability distribution and graph it.
Calculate the area of continuous distributions utilizing geometric area formulas with a given graph. (Calculus approach could be done depending on the math prerequisite of the course.)
Compute and label incomplete axes with values utilizing their understanding that the area of all probability distributions is equal to one.
Getting Started:
Duration: - 75 minutes
Materials:
Handouts with exercises and problems
Computer, projector, and screen
whiteboard and markers
Introduction:
Begin class with the Notice and Wonder activity. State the objectives of the lesson. Then, give a brief historical connection.
Notice and Wonder:
Have students look at two probability distributions and ask them what they notice about the graphs and what they wonder. If students have never done this type of activity, you will need to provide a longer wait time for the students to respond. Be accepting of all responses.
Sample Student Response:
Both graphs have the year on the x-axis. Both graphs have the same scale on the y-axis. The right graph is a darker color. Both graphs have the same number of bars. The graph on the left has the bars touching. The left graph is a histogram, and the right is a bar graph. I wonder if the right graph is a bar graph because it looks like both axes are quantitative. Maybe someone will think that the left graph can graph continuous data and the right graph is discrete data. I wonder what this graph is counting in each of the years. There is no title, and the y-axis is not labeled.
Historical Context:
Girolamo Cardano
Girolamo Cardano was a 16th-century Italian professor of mathematics and medicine, but he was also a gambler. He could have been the father of probability if his work had been published when he first recorded his ideas. Instead, his work was published centuries later. He was perhaps the first to think about assigning a number from 0 to 1 to the probability of an outcome.
Jakob Bernoulli
Jakob Bernoulli
This was lucky for Jakob Bernoulli, a Swiss mathematician, who then received the credit for introducing the idea of representing complete certainty as one and probability as a number between zero and one with his publication of Ars Conjectandi (posthumous, 1713).
Main Content:
A probability distribution is a statistical function describing the probability (likelihood) of obtaining all possible values a random variable can take. Typically, we use graphs or tables as visual representations of probability distributions.
There are two types of probability distributions based on the two types of random variables: discrete and continuous.
Figure 5 Overview of Probability Distributions
Discrete Random Variable
A discrete random variable,X, has possible values that can be given in an ordered list. The probability mass function is the probability distribution pi of X lists the values and their given probabilities.
The probabilities pi must satisfy two requirements:
1. Every probability pi is a number on the interval [0,1].
2. p1+p2+p3 … = 1
Value of X
x1
x2
x3
…
Probability
p1
p2
p3
…
Discrete Random Variable: Example 1:
A discrete probability distribution for countries with significant volcanic eruptions can be represented with a bar graph and/or a table.
Figure 6 Volcanic Eruptions Worldwide 2000 - 2023
Continuous Random Variable
A continuous random variable X takes all values in an interval of numbers. A density curve describes the probability distribution of X. The probability of any event is the area under the density curve and above the values of X that make up the event.
The total area of a continuous probability distribution is equal to 1. Because the probability is equal to the area under the curve, all continuous probability distributions assign a probability of 0 to every individual outcome. Only intervals of values have a positive probability.
Continuous Random Variable: Normal Distribution
Gauss (Wittmann & Oreshina,2009)
A continuous distribution you may be familiar with from the introductory statistics course you had before this course is the Normal Distribution. The probability density function is:
Where the parameters \({\sigma}\), is the standard deviation, and µ is the mean. This function is used when graphing a normal distribution.
Carl Friedrich Gauss
Sometimes, the Normal distribution is called a Gaussian distribution because Carl Friedrich Gauss invented it.
Calculation and Practice:
Normal Distribution Problem
Figure 8
Question 1: Based on the graph in Figure 8, what is the probability of having a value between 1 and 2 standard deviations above the mean by looking at the area under the curve for this region.
Solution: In this case, it is approximately .1359 or 13.59% chance.
Question 2: What is the probability of having a value between two standard deviations below the mean and one standard deviation above the mean?
Calculation and Practice: Question 2 Solution
Figure 9
Solution: .1359+ .3413 +.3413 =.8185
Continuous Random Variable: Uniform Distribution
Figure 10
Another common continuous distribution is the Uniform Distribution (a,b).
The area between a and b is = 1.
Calculation and Practice: Question 3
Question 3 What is the area of a Uniform (0,1) distribution pictured below?
Figure 11: Uniform(0,1)
Calculation and Practice: Question 3 Solution
Solution: Area of Rectangle = base × height = 1. (Remember the area is defined to be 1. Even though you do not know the height of the rectangle, you still know its area. In a few problems we will use this idea to solve for the missing dimension.)
Calculation and Practice: Question 4 & Solutions
Question 4 Using a Uniform [0,1] distribution, what would the probability be that the random number was:
Figure 12: Uniform(0,1)
Part a) between .2 and .6?
Solution Area of Rectangle= base × height = (.6-.2)(1-0) =.4 × 1 =.4 is the probability
Part b) greater than .75?
Solution Area of Rectangle= base × height =(1-.75)( 1-0) =.25 × 1 =.25 is the probability.
Part c) less than .3 or greater than .9?
Solution Area of Rectangle= base × height But here we have 2 regions so that would mean we need to sum the areas of both regions. Sum or the Area of Two Rectangles = (.3-0)(1-0) + (1-.9)(1-0) = .3 + .1 =.4 is the probability
Normalizing Constant
Normalizing Constant - is a value that ensures that a probability density function has a probability of 1. This constant could be a scalar value, equation, or function. Every probability distribution that doesn’t sum to 1 will have a normalizing constant.
Sometimes the calculation of the normalizing constant will be easy to compute. Below, we will look at this idea utilizing the Uniform Distribution. - We know that the area of this rectangle created by the Uniform distribution equals 1.
Calculation and Practice: Question 5 & Solution
Question 5 Compute the normalizing constant when we have U(a,b).
Solution: We know that Area of Rectangle = base x height
Area of Rectangle = 1
base x height = 1
base = (b-a), this is substituted in the equation
(b-a) x height = 1
Solve this equation for the height of this rectangle.(In this case the height is the normalizing constant)
height = 1/(b-a)
Normalizing constant= 1/(b-a)
Figure 13: Uniform(0,1)
Calculation and Practice: Question 6 part a & Solution
Question 6: Given the graph below, where the random number generator will generate numbers on the following interval [3,8].
Part a) Construct this Uniform distribution and scale both axes to ensure the area =1.
Figure 14: Uniform(3,8)
Solution - Note that the y-axis has been scaled.
Calculation and Practice: Question 6 part b & Solution
Part b) What is P(X<5 or X>7) for the Uniform[3,8]? Show a graph and explain your answer.
Solution
We can calculate the area of the blue sections:
Area = (5-3)(.2) + (8-7)(.2)
Area = .4 + .2
Area =.6
Figure 16: P(X<5 or X>7) on a Uniform(3,8)
Example:
The density curve for the sum X of two random numbers that are generated on the interval [0,1].
a) What interval would be used for the density curve of X?
Solution
[0,2] - This is due to the fact that it is possible that the two numbers your are summing are both 1. 1+1=2 which would be the maximum of the interval.
Example:
Figure 17: triangle (a,b)
Provided is a graph of the density curve of X that is a triangle distribution symmetric about one and has a height of one. Then, calculate the area. Have students draw a picture.
Solution
Figure 18: triangle (0,2)
Area = .5(base)(height)
Area = .5(2unit)(1unit)
Area= 1 unit2
Question 7
Given a triangular distribution below where the height of the triangle =2/3 units. What is P(x<1)? ______ Explain your answer.
Figure 19: triangle (0,3)
Solution
Area of the blue triangle = .5(base)(height) = .5(1)(2/3) = 1/3 unit2
Area of the white triangle = .5(base)(height) = .5(2)(2/3) = 2/3 unit2
Total area of triangle = Area of blue + white triangle = (1/3) + (2/3) = 1 unit2
Continuous vs. Discrete in Practice
Some variables are continuous but are sometimes treated in practice as though they were discrete. One example would be the age of students in a class. Age is often reported as a discrete value when surveyed, even though we know that age is a factor of time that is continuous.
Graphs of the distribution of ages of statistics students are shown below. Which graph shows age as a discrete random variable, and which shows it as a continuous random variable?
Figure 20:
Figure 21:
Age Problem
Lets look at the data table for the previous graphs. Adding the counts gives us the total number of students in the class, which is 195 students.
To calculate the probability of each age, divide the count of each age by the total number of students.
Graph the following distributions to notice what the change in the standard deviation does to the shape of the distribution: N(0, 1) N(0, 2) N(0, 10) What do you notice happens to the shape of the distribution as the standard deviation increases?
Solution When the mean stays constant and the standard deviation increases we see that the graphs stay centered at the same value which is 0 in this problem. The standard deviation causes the graph to flatten.
Can you have a negative standard deviation? Try it out to check your answer. Graph to notice what the change in the Mean does to the shape of the distribution: N(0,1) N(2, 1) N(10, 1) What do you notice happens to the distributions shape as the mean increases? Can you have a negative mean? Try it to check your answer.
Facilitate a class discussion to review the example problems, reinforce key concepts, and answer any questions the students have.
Homework:
Have them work on the assessment questions in this plan.
Formative Assessment:
Evaluate students based on their participation in discussions, their ability to solve example problems, and their performance on the assigned homework.
Assessment:
In forensic accounting, faked numbers in tax returns, invoices, expense account claims, and other financial records display patterns that aren’t present in legitimate records. Some of these “fakes” are easy to spot, for instance, if there are many rounded numbers. But Benford’s law tells that the first digits of numbers in legitimate records often follow the following distribution:
What type of probability distribution (discrete or continuous) is represented here in this table?
Solution: A Discrete Probability Distribution
Consider these events and calculate their probabilities of occuring using the table above:
A = {The first digit is a 5.}
B = {The first digit is a 3 or less.}
C = {The first digit is greater than 7.}
Solutions
i. P(A) = .079
ii. P(B) =.301+.176+.125 =.602
iii. P(C) =.051+.046 = .097
Make a graph of this distribution.
Solution
Given a probability distribution in the table below of the number of significant volcanic eruptions between 2010 and 2023 worldwide.
Is this a discrete or continuous probability distribution?
Explain why you made this choice.
Graph this distribution.
Table 4: The Number of Signigicant Volcanic Eruptions in the World from 2010-2022
Solutions
Some students may say discrete, and some say continuous. This would mean students may display a bar graph (Figure 22) or a histogram(Figure 23).
Typically I would say this continuous because the one variable, year, is a unit of time. I would have used a histogram to plot this data as seen in Figure 23.
Figure 22
Figure 23
Conclusion:
Have the students summarize what they learned today. Make sure to do some re-voicing of what students are saying. Also, ask students to explain what one of their peers said.
If you have time, give the students an Exit Ticket and include some questions that check for understanding. You may ask students to tell you what they are confident they understand and then list a topic or concept they are still grappling with to understand. This will help you plan for any necessary review at the beginning of the next class.