## Probability

### Children categories

normalpdf(x,?,?)

** Command Syntax**

invNorm(*percentile*[,*?*, *?*])

**Menu Location**

Press:

- 2ND DISTR to access the distribution menu
- 3 to select invNorm(, or use arrows.

**Example:** Suppose X represents the scores on an exam and is normally distributed with mean 85 and standard deviation 10.

We want to find the value of x such that 90% of the distribution is below it. This means we are looking for the 90th percentile.

1) Press 2nd DISTR 3. This brings up InvNorm(

2) Type .90 comma 85 comma 10

3) Press ) button.

4) Press ENTER

The value you get is the 90th percentile for X.

**Menu Location**

Press:

- 2ND DISTR to access the distribution menu
- 2 to select normalcdf(, or use arrows.

normalcdf( start, end, mean, std_dev)

**For the standard normal distribution: normalcdf(-1,1**

**For the normal distribution with mean 10 and std. dev. 2.5:** normalcdf(5,15,10,2.5

Disclaimer: I did not create nor do I own these videos. I have simply embedded them, courtesy of YouTube.

Disclaimer: I did not create nor do I own these videos. I have simply embedded them, courtesy of YouTube.

Command: invNorm(.75,46,2.7)

Disclaimer: I did not create nor do I own these videos. I have simply embedded them, courtesy of YouTube.

**"SIGMA" means the SUM:**

**The "sum of what?" The sum of a set of values:**

EXAMPLE:

(1 + 2 + 3 + 4 + 5) = 15

\[ \sigma \]

**This symbol, the lowercase "sigma," represents the **POPULATION STANDARD DEVIATION**. **

**The formula to calculate the population standard deviation is:**

**The Greek lowercase letter for "M" (pictured above on the right) is pronounced as "mew."**

**This symbol represents the mean of a data set. **

**X Bar (pictured below under Sample Mean) is simply the mean of a given set of sample values. **

**As you will notice, "X Bar" is the same as the POPULATION MEAN, merely reexpressed. **

**Read **Sample Mean vs. Population Mean** for more information. **

** **

{module [80]}The distribution of a statistic is officially called the sampling distribution of the statistic.

Broken down a little bit further, the distribution of a statistic is all possible values of the statistic for samples of any given size. Try not to get too crazed by all the fancy lingo when first starting out in a Stat course. Check out our section on What's with the Greek? for more definitions broken down.

** **

** **

**OK, so "population" doesn't exactly merit a "wordy definition" on its own. But when we think of "population" we often think of the U.S. population - such as is recorded by the U.S. Census. **

** **

** **

**This is not too far off-the-mark. According to Wikipedia: "A population can be defined as including all people or items with the characteristic one wishes to understand."**

** **

** **

**More simply put, a statistical POPULATION is the POOL from which a SAMPLE can be drawn. **

** **

**POPULATIONS can often be large, making studies overly complex, time-consuming and expensive. This is why we draw a SAMPLE and go to great lengths to find a SAMPLE that is REPRESENTATIVE of the POPULATION. This yields more time-efficient studies conducted on a SAMPLE instead of the entire POPULATION. **

**A SAMPLE is a sub-set of the **POPULATION**.**

**A SAMPLE is drawn to represent the population, negating the need to conduct an extensive census. **

**An example of a sample would be:**

You decide you want to take a survey of the student body at your school. Without a team of helpers, it will be nearly impossible to survey EVERYONE in a short period of time. So instead, you decide to draw a SIMPLE RANDOM SAMPLE, which you determine is representative of the population.

**Studying and drawing CONCLUSIONS from a sample would be a heck of a lot easier than trying to survey every person (and study every person) in the Population. **

** So...while I was cruising the net for more learning resources, I found this on a Wiki which defined the MEAN as:**

**Yeeeeah. In a technical sense, this is correct, but if I saw it in a book without knowing what the squigglies were, I would most certainly freak out. **

**THIS is exactly the same thing, and a whole lot easier to conceptualize:**

**If you are given 3 numbers, add them up and divide by 3:**

**If you are given 4 numbers, add them up and divide by 4:**

**And so on. The MEAN is the TOTAL SUM of all values you are given, divided by the NUMBER of values you are given. **

**So this:**

**Technically means: 1 times the SUM of all the values you are given, divided by the number of values you are given. (Somehow it's easier to think about in English. Check out What's with the Greek? later.)**

**The MEDIAN is always the number **BANG** in the middle of any number set. **

**If you have: **

**25, 50, 75**

**The MEDIAN is 50. **

**With an ODD number of items, the MEDIAN will always be the number directly in the center. **

**If you have:**

**25, 50, 75, 100**

**In an EVEN number of items, the MEDIAN will always be the AVERAGE of the two central numbers. **

**Here, the MEDIAN is (50+75)/2 = 62.5**

**The MODE of a data set is simply the number that appears the most often. **

**For example, in this set: [1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17] - The mode is 6. This is a UNIMODAL set, and looks like this:**

In the set: [1, 1, 2, 4, 4] - There are TWO modes (1 and 4), making this set BIMODAL, which looks like this:

**For sets where there are more than TWO modes, the set is called MULTIMODAL. **

Both the mean and the median are measures of center.

If you have a symmetrical set of data -- IF THE NUMBERS IN THE SET ARE EVENLY SPACED -- the mean and the median will be EXACTLY THE SAME.

**Here is WHY:**

If you have a data set: 25, 50, 75

MEAN = (25 + 50 + 75) = 150 / 3 = 50

**MEDIAN = 50 (the number bang in the center)**

**Both values are the same. **

When dealing with skewed data sets (when the numbers are NOT evenly spaced), it is better to use the median to express the center. It is RESISTANT to extreme values.

**Here is WHY:**

If you have a data set: 20, 50, 100

MEAN = (20 + 50 + 100) / 3 = 56.6666666

**MEDIAN = 50**

If we make this set even more extreme: 10, 50, 150

MEAN = (10 + 50 + 150) / 3 = 53.333333

**MEDIAN = 50**

**No matter how we change the values in this set, if the middle number is 50, the MEDIAN will be 50. ALWAYS. **

**The mean is SENSITIVE to change by every value, and therefore should only be used where the data is normally distributed. **

I always remembered this by memorizing that we are all "__sensitive__ to __mean__ [people]" - but whatever works for you!

The SAMPLE MEAN is the mean of a sample.

The POPULATION MEAN is the mean of the population.

Different symbols make the distinction between these two, although the formulas are exactly the same.

**The ONLY difference between the two is that the SAMPLE MEAN is referred to as "x bar" whereas the POPULATION MEAN is referred to as "mew." BOTH are found by calculating the SUM of all values you are given and dividing by "N," the number of total values. **

We can often use x bar, the SAMPLE MEAN, to draw conclusions about the mean of the entire population.

The sample standard deviation is almost the same formula as Lowercase Sigma = Population Standard Deviation, except that the denominator calls for N - 1 (one less than the number of values given).

Question:

What's the difference between sample standard deviation and population standard deviation? When do we use N-1 and when N in the denominator?

The Standard Deviation is the SQUARE ROOT of the VARIANCE.

The official formula for the POPULATION Standard Deviation is:

Notice this:

**The SAMPLE MEAN and the POPULATION MEAN are found using exactly the same formula. **

**The ONLY difference between the two is that the SAMPLE MEAN is referred to as "x bar" whereas the POPULATION MEAN is referred to as "mew." BOTH are found by calculating the SUM of all values you are given and dividing by "N," the number of values you are given. **

"Variance" seems like a term unrelated to standard deviation, but in fact - the variance is just the SD^2 (Standard Deviation, Squared).

The reverse is therefore true. The Standard Deviation is the SQUARE ROOT of the variance.

;

The EMPIRICAL RULE, otherwise known as the 68.26-95.44-99.74 RULE, says the following:

**1) 68.26% of all observed data values will fall between ONE standard deviation to the RIGHT or LEFT of the mean. **

**2) 95.44% of all observed data values will fall between TWO standard deviations to the RIGHT or LEFT of the mean. **

**3) 99.74% of all observed data values will fall between THREE standard deviations to the RIGHT or LEFT of the mean. **

This is what the illustrated version of the Empirical Rule looks like:

EXAMPLE:

**If we are told that the mean of our data is 100, and the standard deviation is 10, then we know the following:**

**1) 68.26% of our data will fall between 90 and 110. **

**2) 95.44% of our data will fall between 80 and 120. **

**3) 99.74% of our data will fall between 70 and 130. **

It is very important that you understand what a Z-score represents as well as how to obtain a Z-score manually, by hand.

HOWEVER, you should also know how to get around your TI-83 or 84 series calculator. Use it to find a Z-Score and the AREA under a curve.

And here is another, more comprehensive overview of Z-Scores:

**Sometimes we need a standardized scale to measure a value's distance from the center. **

**A Z-score indicates how many STANDARD DEVIATIONS a value is from the mean. **

**The official formula is:**

**So let's say the MEAN is 100 and the Standard Deviation is 15. **

**If you are given a value of 132, you just plug that into the formula above. **

**132 - 100 = 32**

**32 / 15 = 2.133 **

**VOILA - Your Z-Score is 2.133**

This is the standard type of table you will see in most Statistics Textbooks.

If you are allowed to use a calculator for calculating Z-scores and areas under the curve, I suggest you glance at this to get familiar with what it is, and MOVE ON.

**If you are NOT allowed to use a calculator, it would be a good idea to get friendly with this table - and FAST. During an exam, the last thing you want to be worrying about is figuring out how to find your way around this thing!**

This is a great video because it gives walk-throughs of z-score calculations from homework problems. You may not have these exact problems, but the same concepts can be applied to your own work!

These examples rely on the Z-Score Formula:

MEMORIZE this formula, make sure you know it COLD!

If you do not know what the "m-like" symbol or the "o" with a tail are, check out What's with the Greek?

**NOTE: If a sample size is greater than 30, it is USUALLY (though not always) large enough to prove the Central Limit Theorem true. **

{module [79]}

**I admit: "sampling distribution of the sample mean" sounds a little creepy, not only because the term is too long-winded for its own good, but also because it feels like you're running in an endless loop. **

The best way to explain this one is to give an example:

**Suppose you have a population of 5 basketball players: **

**A, B, C, D and E. **

**Let us suppose that their respective heights are:**

** 76, 78, 79, 81 and 86**

**If we had a sample size of 2, then we would be able to derive the following combinations of these players and their heights:**

SAMPLE (size 2) | HEIGHTS | X - Bar Values |

A, B | 76, 78 | 77.0 |

A, C | 76, 79 | 77.5 |

A, D | 76, 81 | 78.5 |

A, E | 76, 86 | 81.0 |

B, C | 78, 79 | 78.5 |

B, D | 78, 81 | 79.5 |

B, E | 78, 86 | 82.0 |

C, D | 79, 81 | 80.0 |

C, E | 79, 86 | 82.5 |

D, E | 81, 86 | 83.5 |

**The X-bar column values represent the Sampling Distribution of the Sample Mean, because they are the MEAN of the values for each SAMPLE.**

**Now let's try a different sample size. Let's try a sample size of 4.**

SAMPLE (size 4) | HEIGHTS | X - Bar Values |

A, B, C, D | 76, 78, 79, 81 | 78.50 |

A, B, C, E | 76, 78, 79, 86 | 79.75 |

A, B, D, E | 76, 78, 81, 86 | 80.25 |

A, C, D, E | 76, 79, 81, 86 | 80.50 |

B, C, D, E | 78, 79, 81, 86 | 81.00 |

**The X-bar column values represent the Sampling Distribution of the Sample Mean, because they are the MEAN of the values for each SAMPLE.**

** **

**And that's all it is!**

Suppose we are given a certain quantity of an ideal gas at some fixed temperature, and we want to know what sort of distribution of velocities to associate with this gas.

That is, given a range of velocities, \[\Delta v = v_\beta - v_\alpha, \]

what is the number of molecules, \[\Delta n, \]with velocities in the region of phase space \[\Delta v= \Delta v_x \Delta v_y \Delta v_z? \]