It is your seventh week in statistics class. Have you ever asked
yourself why you are here? Amongst the frustration of calculating
probabilities, etc. you may have wondered what really is the point.
Why do statistics?
We use statistics to make inferences about a population. We
use statistics to be able to support our claims, and to refute
claims that we want to challenge.
Recall that, a parameter tells us something about a population [e.g. mu ( ), sigma ( )] while a statistic tells us something about a sample [e.g. x-bar, s]
We have already learned how to calculate a mean,
a standard deviation and we have discussed what these descriptive
statistics tell us about our data. Recently we have been learning
about probability. When we talk about different types of statistical
inference, these concepts come together to inform our procedures.
Probability allows us to take chance variation into account and
so we can substantiate our conclusions by doing probability calculations.
The test statistics that we use (such as z, and later t, and
F) have unique sampling distributions which allow us to compare
the probability of the occurrence of our sample data with what
is believed to be true about the population. (Sampling distributions
for these test statistics are listed in table in the back of the
Moore and McCabe book.) Keep in mind that distributions are really
a way of summarizing a bunch of different probabilities for number
of occurrences for a specific variable, as we saw last week in
the homework assignment concerning the distribution of male and
The two most common types of statistical inference are significance testing and confidence intervals. These two methods of making inferences allow us to make claims or dispute claims based on collected data. These two types of statistical inferences have different goals:
Significance testing- used to assess the evidence provided by our data so that we can make some claim about a population.
Confidence intervals-used to estimate a population parameter
Assumptions we will hold for these procedures:
Significance Testing (Z-statistic)
We are interested in investigating whether the IQ's of Yale Students
are higher than the general American population. We take a random
sample of 35 Yale students, measure their IQ's, and find that
the mean=107. We know that IQ has a normal distribution and the
mean (mu )=100 and sigma =15.
Procedure for carrying out significance test:
Null hypothesis ( ) (h-not)- the statistical hypothesis that we test. We are assessing the strength of the evidence of our data against the null hypothesis.
H : u=100
Alternative hypothesis (H ) or (H ) -statement that specifies
what we think is true. An expression that is incompatible with
the null hypothesis.
In this example we have a one-sided alternative hypothesis. Usually it is appropriate to have a one-sided hypothesis when we have some strong a priori reason for making a one-sided claim. (We know that Yale students have above average SAT scores so their IQ's are probably higher as well).
If we do not have a good a priori reason for making a one-sided hypothesis it is best to go with a two-sided hypothesis.
H :u=100 (there is a difference but we are not sure in which
(Note: with a one-sided hypothesis we do a one-tailed test which
generally has more power (ability to detect desired effect) than
a two-tailed test. More on this later.
We know its distribution (see table A). It allows us to measure
the compatibility between our data and the null hypothesis.
(in this example we know the population standard deviation, but
in cases where it is unknown, for large samples (n> 25 or 30)
we can uses the sample standard deviation as an estimate.
Look up the probability of getting a Z-score of 2.76 in Table
.9971 Here we are doing a 1-tailed test in the > direction,
so we know which half of the distribution we are looking at.
P-value is 1-.9971= .0029 Z-crit for alpha .05= 1.64
.0029 signifies that in 29 times out of 10,000 chances we would find x=107 under a mu of 100. In other words, our observed outcome from our data is highly unlikely under the supposition that the null hypothesis is true. This is far from what we would expect if the null hypothesis were true.
So we can conclude then that our observed data are more probable in terms of the alternative hypothesis being true.
(Important Note: Because we are always testing the null hypothesis,
it is never correct to state that we have proved the alternative
hypothesis. We have only rejected the null hypothesis.)
The smaller the P-value the stronger the evidence against the null hypothesis.
So how much evidence against the null hypothesis do we need before
we can reject it?
Our evidence relates to a significance level, alpha ( ) that
has been predetermined before we began analyzing our data. This
is customarily .05, but it depends on circumstances and the degree
of certainty which we desire.
The alpha level is the same thing as Type I error. If our alpha
level is .05 then 5 times out of 100 we will reject the null hypothesis
when we shouldn't have.
Back to our example then: we compare our calculated P-value .0029 and see that it is less than our predetermined alpha level of .05, so we can reject the null hypothesis and conclude that our data is significant at the .05 level.
Also our calculated Z-score (2.76) is > the Z-crit (1.64) which
confirms as well that we should reject the null hypothesis.
4.) State a conclusion
What does "significant" mean (Moore & McCabe, p.459)-
it means signifying something, namely that Yalies' IQ' are significantly
greater than the general population at the .05 level. We have
rejected the null hypothesis.
Example 2: Two-sided hypothesis/ two-tailed test
An educational researcher is interested in investigating whether
third-graders in the Hartford school district perform differently
on the Degree of Reading Power test that the national average
for third graders, which is 32. Degree of Reading Power (DRP)
scores are recorded for a random sample of 44 third-grade students
and the mean score is 35.1. DRP scores are approximately normal
and the standard deviation for this school district is known to
be 11. The researcher will work with an alpha level of .05.
Z=1.868, here because it is a two sided test, we must find the
alpha level and multiply it by 2. So our P-value is 2( 1-.9693)=
2* .0307= .0614
.0614< .05, so we fail to reject the null hypothesis.
Here the Z-crit scores are + or - 1.96
If the researcher had reason to believe that the students DRP
scores were higher than the national mean then we would do a one-tailed
The Z-crit would be 1.64
Our calculated Z was 1.868 which is > 1.64, and our P-value
no longer needs to be multiplied by 2 so it is .0307 which is
< .05. In this case then we can reject the null hypothesis
and conclude that children in this district have a mean score
that is higher than the national mean.
Let's switch over now to talk about confidence intervals, keeping
this last example in mind.
We use confidence intervals to make estimates
about population parameters. So far in the examples we have looked
at there has been a known mean for the population. If the mean
for a population is unknown we can make an educated guess with
a certain amount of confidence based on our sample data.
When thinking about confidence intervals
it is important to keep in mind the 68-95.99.7 rule. On
the most basic level this rule signifies that 68% of our data
will fall within 1 standard deviation of the mean, 95% will fall
within 2 standard deviations of the mean and 99.7% will fall within
3 standard deviations of the mean, for a normally distributed
Think of the IQ example. IQ has a population mean of 100 and
a standard deviation of 15. So the range of IQ's 85-115 account
for 68% of all IQ scores.
This rule is easily applied to confidence intervals. If we don't
know a population mean but we have a sample mean we can use a
confidence interval to say something about where the true population
mean may fall. This rule states that for a normally distributed
variable, there is probability of .95 that our sample mean will
be within 2 standard deviation of the population mean.
So the form is: estimate + or - a margin of error.
The margin of error designates how accurate we believe our guess to be based on the variability of the estimate.
When using the Z-statistic the margin of error = the Z-value *
sigma/ square root of n
Let's use our example of third-graders DRP scores to see how this
Imagine that we don't know that nation mean for DRP scores, but we can predict with a certain amount of confidence what it might be based on our sample data.
Remember that mean=35.1 and the sigma=11.
Let's calculate a 95% confidence interval for the mean reading
For a 95% confidence interval my Z-values will be + or - 1.96
CI: mean + or - Z * sigma/square root of n
CI: 35.1 -12.9948, 35.1 + 12.9948
CI: (22.1, 48.1) We conclude that we are 95% confident that
the true mean of the population falls within 22.1 and 48.1. (
As we know from the information we were given by the national
database, the population mean is actually 32.)
As Moore and McCabe try to explain (p.445) as well as Pollard, there is a hair-splitting distinction that is made when trying to understand what confidence intervals mean.
-When we say that we are 95% confident that the mean DRP score
lies between 22.1 and 48.1 this does not signify that it is 95%
probable that the true mean lies within this range, rather it
signifies that this method produces the correct interval, for
which the true mean lies within, in 95% of all possible samples.
Still confused? Try thinking about it this way:
(Sticking with the reading scores example) Say the
researcher took another sample of 44 children from the district
and found their mean DRP score to be 18. If we constructed a
95% confidence interval for the population mean based on this
sample data it would come out to be (5, 31). We would say that
the true population mean lies within this interval, but really
we know that the true mean in 32. So this would be one of the
5% of the samples for which the confidence interval estimate would
not contain the true mean.
Some things to keep in mind about confidence intervals:
Properties of confidence intervals :
-as C-level decreases, margin of error decreases
-as sample size (n) increases, the lower your margin of (chance) error, the higher your confidence
-as population standard deviation decreases, the
lower your margin of error
You can also determine how large your sample size
should be to construct a confidence interval for a specified margin
of error for a normal mean:
n= (Z* sigma/ m)squared
Example 3: How much corn do I need?
Crop researchers are interested in estimating the average amount of bushels of corn that a new variety of corn they are planting will yield. Cost is an important factor, so they want to know how many plots of corn they need to plant to be able to estimate the mean yield of bushels of corn within 4 bushels per acre with 90% confidence. Assume that sigma is 10.
n= (1.645* 10/4)squared
So they need 17 plots of corn to estimate the mean
yield within 4 bushels of corn per acre with 90% confidence.
Confidence intervals are useful because they are concerned with both the level of confidence and the margin of error. The significance level like the confidence level says how reliable a method is in repeated use.
But having high confidence (say 99%) is not very
valuable if the interval is so wide that it includes most values
of the parameter. Similarly, a test with a small alpha level
(say .01) is not very useful if it almost never rejects the null
hypothesis. What we need to be concerned with then is power.
Power is the ability,
for a fixed alpha level, that the significance test will reject
the null hypothesis in favor of a particular alternative value
of the parameter.
Power is directly related to Type II error, which,
if you recall, is failing to reject the null hypothesis, when
we should have rejected it.
Mathematically, power = 1- Type II Error
High power is what we want. The standard for power
is usually .80, or 80% power. (Note: so desirable levels of Type
II Error will be no more than .20, or 20%.
How do we calculate power?
A SRS of 500 Connecticut high school students'
SAT scores are taken. A teacher believes that the mean will be
no more than 450, because that is the mean score for the North
Eastern US. If the population standard deviation is 100 and the
test rejects the null hypothesis at the 1% level of significance,
determine whether this test is sufficiently sensitive (has enough
power) to be able to detect an increase of 10 points in the population
Steps for calculating power:
The alternative of interest is u=460 at the 1% level
-use the Z-statistic
Z= x-u/(sigma/sq. root of n)
Substitute the Z-score based on the appropriate alpha
P(x> 460.4 when u=460) = P[x-u/(sigma/sq. root
P(460.4-460/4.47) = P(Z> .0894) = 1-.5319= .4681
Here we have a power of 46.81%. This test is not
very sensitive to a 10-point increase in the mean score. (Really
this isn't surprising since the standard deviation is 100)
So we have a power that we are not happy with, how
do we increase power?
There are several ways to increase power:
(Note: Now that you have the formula for calculating power, you can actually decide a prior on how much power you want and plug it into the equation to figure out the sample size that you will need to have the certain level of power. This is very economical because adding subjects can add a lot of expense to research)
One last thing: The Relationship Between Confidence
Intervals and Significance Testing
Suppose that for college students in the Ivy League
I want to know how many hours of TV per day (on average) a person
watches. I have no idea what the population mean might be or
the standard deviation and I want to construct a 99% confidence
interval for the population mean.
I take a sample of 105 Yale Students and find a mean
of 3.2 hours per day and a standard deviation of .8 hours per
The 99% CI is 3.2 + or - .201 so (3.0, 3.4)
Say your roommate comes along and agrees with your
claim, but tells you that one of her professors did a study and
found that the mean was 3.5. She definitely thinks that the mean
hours watched are different than this value.
Here the H :u=3.5
H : u = 3.5
Because the hypothesized value falls outside the
confidence interval we just computed, we can say that we can reject
H at the 1% significance level (alpha= .01)
Your other friend comes along and claims that she did a similar study and found that her sample mean was 3.1.
Here the H :u=3.1
We cannot reject the null hypothesis here (for alpha
level =.01) because 3.1 lies within the 99% confidence interval.