## Statistics (29)

### Children categories

Suppose we are given a certain quantity of an ideal gas at some fixed temperature, and we want to know what sort of distribution of velocities to associate with this gas.

That is, given a range of velocities, \[\Delta v = v_\beta - v_\alpha, \]

what is the number of molecules, \[\Delta n, \]with velocities in the region of phase space \[\Delta v= \Delta v_x \Delta v_y \Delta v_z? \]

Question:

What's the difference between sample standard deviation and population standard deviation? When do we use N-1 and when N in the denominator?

{module [79]}

The sample standard deviation is almost the same formula as Lowercase Sigma = Population Standard Deviation, except that the denominator calls for N - 1 (one less than the number of values given).

**I admit: "sampling distribution of the sample mean" sounds a little creepy, not only because the term is too long-winded for its own good, but also because it feels like you're running in an endless loop. **

The best way to explain this one is to give an example:

**Suppose you have a population of 5 basketball players: **

**A, B, C, D and E. **

**Let us suppose that their respective heights are:**

** 76, 78, 79, 81 and 86**

**If we had a sample size of 2, then we would be able to derive the following combinations of these players and their heights:**

SAMPLE (size 2) | HEIGHTS | X - Bar Values |

A, B | 76, 78 | 77.0 |

A, C | 76, 79 | 77.5 |

A, D | 76, 81 | 78.5 |

A, E | 76, 86 | 81.0 |

B, C | 78, 79 | 78.5 |

B, D | 78, 81 | 79.5 |

B, E | 78, 86 | 82.0 |

C, D | 79, 81 | 80.0 |

C, E | 79, 86 | 82.5 |

D, E | 81, 86 | 83.5 |

**The X-bar column values represent the Sampling Distribution of the Sample Mean, because they are the MEAN of the values for each SAMPLE.**

**Now let's try a different sample size. Let's try a sample size of 4.**

SAMPLE (size 4) | HEIGHTS | X - Bar Values |

A, B, C, D | 76, 78, 79, 81 | 78.50 |

A, B, C, E | 76, 78, 79, 86 | 79.75 |

A, B, D, E | 76, 78, 81, 86 | 80.25 |

A, C, D, E | 76, 79, 81, 86 | 80.50 |

B, C, D, E | 78, 79, 81, 86 | 81.00 |

**The X-bar column values represent the Sampling Distribution of the Sample Mean, because they are the MEAN of the values for each SAMPLE.**

** **

**And that's all it is!**

{module [80]}The distribution of a statistic is officially called the sampling distribution of the statistic.

Broken down a little bit further, the distribution of a statistic is all possible values of the statistic for samples of any given size. Try not to get too crazed by all the fancy lingo when first starting out in a Stat course. Check out our section on What's with the Greek? for more definitions broken down.

This is the standard type of table you will see in most Statistics Textbooks.

If you are allowed to use a calculator for calculating Z-scores and areas under the curve, I suggest you glance at this to get familiar with what it is, and MOVE ON.

**If you are NOT allowed to use a calculator, it would be a good idea to get friendly with this table - and FAST. During an exam, the last thing you want to be worrying about is figuring out how to find your way around this thing!**

Both the mean and the median are measures of center.

If you have a symmetrical set of data -- IF THE NUMBERS IN THE SET ARE EVENLY SPACED -- the mean and the median will be EXACTLY THE SAME.

**Here is WHY:**

If you have a data set: 25, 50, 75

MEAN = (25 + 50 + 75) = 150 / 3 = 50

**MEDIAN = 50 (the number bang in the center)**

**Both values are the same. **

When dealing with skewed data sets (when the numbers are NOT evenly spaced), it is better to use the median to express the center. It is RESISTANT to extreme values.

**Here is WHY:**

If you have a data set: 20, 50, 100

MEAN = (20 + 50 + 100) / 3 = 56.6666666

**MEDIAN = 50**

If we make this set even more extreme: 10, 50, 150

MEAN = (10 + 50 + 150) / 3 = 53.333333

**MEDIAN = 50**

**No matter how we change the values in this set, if the middle number is 50, the MEDIAN will be 50. ALWAYS. **

**The mean is SENSITIVE to change by every value, and therefore should only be used where the data is normally distributed. **

I always remembered this by memorizing that we are all "__sensitive__ to __mean__ [people]" - but whatever works for you!