Friday, 22 January 2010 23:54

Mean vs. Median - Which is Better?



Both the mean and the median are measures of center. 

If you have a symmetrical set of data -- IF THE NUMBERS IN THE SET ARE EVENLY SPACED -- the mean and the median will be EXACTLY THE SAME. 

Here is WHY:

If you have a data set: 25, 50, 75

MEAN = (25 + 50 + 75) = 150 / 3 = 50

MEDIAN = 50 (the number bang in the center)

Both values are the same. 

When dealing with skewed data sets (when the numbers are NOT evenly spaced), it is better to use the median to express the center.  It is RESISTANT to extreme values. 


Here is WHY:

If you have a data set: 20, 50, 100

MEAN = (20 + 50 + 100) / 3 = 56.6666666



If we make this set even more extreme: 10, 50, 150

MEAN = (10 + 50 + 150) / 3 = 53.333333



No matter how we change the values in this set, if the middle number is 50, the MEDIAN will be 50.  ALWAYS.  

The mean is SENSITIVE to change by every value, and therefore should only be used where the data is normally distributed. 

I always remembered this by memorizing that we are all "sensitive to mean [people]" - but whatever works for you!

Friday, 22 January 2010 00:00

Mu = Mean



The Greek lowercase letter for "M" (pictured above on the right) is pronounced as "mew."

This symbol represents the mean of a data set. 

Friday, 22 January 2010 23:22

68-95-99.7 (Empirical) Rule



The EMPIRICAL RULE, otherwise known as the 68.26-95.44-99.74 RULE, says the following:


1) 68.26% of all observed data values will fall between ONE standard deviation to the RIGHT or LEFT of the mean. 

2) 95.44% of all observed data values will fall between TWO standard deviations to the RIGHT or LEFT of the mean. 

3) 99.74% of all observed data values will fall between THREE standard deviations to the RIGHT or LEFT of the mean. 

This is what the illustrated version of the Empirical Rule looks like:


If we are told that the mean of our data is 100, and the standard deviation is 10, then we know the following:

1) 68.26% of our data will fall between 90 and 110. 

2) 95.44% of our data will fall between 80 and 120. 

3) 99.74% of our data will fall between 70 and 130. 


Disclaimer: I did not create nor do I own these videos.  I have simply embedded them, courtesy of YouTube.

Friday, 22 January 2010 23:12

What is the Central Limit Theorem?



NOTE: If a sample size is greater than 30, it is USUALLY (though not always) large enough to prove the Central Limit Theorem true. 

Friday, 22 January 2010 23:00

What is a Sample?


A SAMPLE is a sub-set of the POPULATION.

A SAMPLE is drawn to represent the population, negating the need to conduct an extensive census. 

An example of a sample would be:

You decide you want to take a survey of the student body at your school.  Without a team of helpers, it will be nearly impossible to survey EVERYONE in a short period of time.  So instead, you decide to draw a SIMPLE RANDOM SAMPLE, which you determine is representative of the population. 

Studying and drawing CONCLUSIONS from a sample would be a heck of a lot easier than trying to survey every person (and study every person) in the Population

Friday, 22 January 2010 21:50





OK, so "population" doesn't exactly merit a "wordy definition" on its own.  But when we think of "population" we often think of the U.S. population - such as is recorded by the U.S. Census. 



This is not too far off-the-mark.  According to Wikipedia:  "A population can be defined as including all people or items with the characteristic one wishes to understand."



More simply put, a statistical POPULATION is the POOL from which a SAMPLE can be drawn. 


POPULATIONS can often be large, making studies overly complex, time-consuming and expensive.  This is why we draw a SAMPLE and go to great lengths to find a SAMPLE that is REPRESENTATIVE of the POPULATION.  This yields more time-efficient studies conducted on a SAMPLE instead of the entire POPULATION. 

Friday, 22 January 2010 21:36

Z-Score Example Problems


Disclaimer: I did not create nor do I own these videos. I have simply embedded them, courtesy of YouTube.

This is a great video because it gives walk-throughs of z-score calculations from homework problems. You may not have these exact problems, but the same concepts can be applied to your own work!


These examples rely on the Z-Score Formula:

MEMORIZE this formula, make sure you know it COLD!

If you do not know what the "m-like" symbol or the "o" with a tail are, check out What's with the Greek?

Friday, 22 January 2010 20:51

Calculating a Z-Score


Sometimes we need a standardized scale to measure a value's distance from the center. 

A Z-score indicates how many STANDARD DEVIATIONS a value is from the mean. 


The official formula is:


So let's say the MEAN is 100 and the Standard Deviation is 15. 


If you are given a value of 132, you just plug that into the formula above. 

132 - 100 = 32

32 / 15 = 2.133 


VOILA - Your Z-Score is 2.133


Friday, 22 January 2010 20:22

X Bar = Sample Mean



X Bar (pictured below under Sample Mean) is simply the mean of a given set of sample values.

As you will notice, "X Bar" is the same as the POPULATION MEAN, merely reexpressed. 

Read Sample Mean vs. Population Mean for more information. 


Friday, 22 January 2010 20:11

What is the Mode?



The MODE of a data set is simply the number that appears the most often. 

For example, in this set:  [1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17] - The mode is 6.  This is a UNIMODAL set, and looks like this:


In the set: [1, 1, 2, 4, 4] - There are TWO modes (1 and 4), making this set BIMODAL, which looks like this:


For sets where there are more than TWO modes, the set is called MULTIMODAL.