Before introducing the mean, there is another term that you should know. A set of data is called a distribution. In the remainder of these notes, the term, distribution, will be used to describe a set of data or numbers that have been collected and that represent different quantified observations of phenomena.

- Add up all of the values (e.g., test scores)
- Divide the sum in Step 1 by the number of values (e.g., number of tests)

The mathematical formula looks like this: |

The first step of adding all the numbers happens quite frequently in statistical calculations; so statisticians have chosen to use the uppercase Greek letter sigma, ∑, to represent summation. Just remember the S in Sigma and the S in Summation to link the symbol with the process. Also n represents the number of values that we're adding. The symbols used for the mean include M and X, pronounced X-bar.

Now let's use the formula with a small distribution of test scores. Even though this example will only involve a few scores, the procedure for calculating the mean is the same no matter how many scores there are.

Scores: 7, 5, 8, 9, 7, 6, 9, 8, 7, 4

Step 1. Add the values. ∑X = 7 + 5 + 8 + 9 + 7 + 6 + 9 + 8 + 7 + 4 = 70

Step 2. Divide by the number of values. X = 70/10 = 7

If you have Excel, you can use it to calculate the mean. First, enter the numbers in a column - one number in each cell. Let's say that the numbers are in cells A1 through A10. Then, enter the following formula in cell A11 (or any other empty cell): = average(A1:A10)

That should be fairly straightforward. So, why isn't the mean sufficient for describing central tendency for all distributions? There are two reasons. The first reason is that the first step in the process, namely adding the values, doesn't always make sense. Think of a variable that represents ethnicity. Different ethnic classifications are arbitrarily assigned different numerical values. African American may be assigned a 1, while Asian is assigned a 3, and Caucasian is assigned a 5. Because these assignments are arbitrary, the actual values, as quantities, mean nothing. So adding them together would make no sense. Calculating a mean score for ethnicity would result in a meaningless number. Means make no sense for variables measured at a nominal level.

The second reason involves the mean's sensitivity to extreme values. What does this refer to? Recall the bricks and board example (or think of a teeter-totter). Placing a new brick near the middle (balancing point) has only a small effect on the balance, but if you placed a new brick at one of the far ends, it would have a great effect on the balance. This effect is larger when there is a small number of bricks. The brick on the far end is called an outlier. The presence of outliers makes the mean inaccurate as a measure of the center. An alternative measure of central tendency is needed in situations with outliers.

- Sort the values from smallest to largest
- If there is an odd number of values, choose the middle value. If there is an even number of values, choose the middle two and average them - add them and divide by 2.

Step 1. Sort the values. 4, 5, 6, 7, 7, 7, 8, 8, 9, 9

Step 2. Because there are 10 scores, pick the middle two scores, which are 7 and 7. Add them, 7 + 7 = 14, and then divide by 2, 14/2 = 7.

If you have Excel, you can use it to calculate the median. First, enter the numbers in a column - one number in each cell. Let's say that the numbers are in cells A1 through A10. Then, enter the following formula in cell A11 (or any other empty cell): = median(A1:A10)

The median is insensitive to extreme values. In the example, you could replace one of the 9s with 100 and the median would not change. Likewise, you could replace the 4 with a 0 and the median would not change either. This illustrates the insensitivity of the median to extreme values. There are situations when neither the mean nor the median is appropriate to use. Because the first step in determining the median requires that the values be sorted, the measurement level of the variable must be at the ordinal level or higher. Like the case with the mean, reporting a median score for a nominal variable is meaningless.

- List the unique values that occur.
- Tally the number of occurrences for each value - the one with the largest tally is the mode.

Step 1. List the unique values that occur.

Values |

4 |

5 |

6 |

7 |

8 |

9 |

Step 2. Tally the occurrences

Values | Tally (frequency) |

4 | 1 |

5 | 1 |

6 | 1 |

7 | 3 |

8 | 2 |

9 | 2 |

So, what is the mode? Is it 7 or 3? The mode is the most frequently occurring value, which is 7. The number of times that the modal value occurs is 3. So, in this example, the mode is 7. Notice that for this distribution the mean, median, and mode are all the same. This happens in certain situations but in most cases these three descriptive statistics will be different. Notice that in this distribution there is only one mode. Sometimes there are two modes or even more. In the context of education, you may have heard a group of students referred to as being bimodal, which would indicate that there are two distinct groups - perhaps good readers and struggling readers, for example.

If you have Excel, you can use it to calculate the mode. First, enter the numbers in a column - one number in each cell. Let's say that the numbers are in cells A1 through A10. Then, enter the following formula in cell A11 (or any other empty cell): = mode(A1:A10)

- There are three main types of measures: mean, median, and mode.
- Each measure attempts to provide a summary for a distribution, namely where the center occurs.
- Which measure to use depends on the specific characteristics of the distribution.
- The only appropriate statistic for a nominal variable is the mode.
- Medians are used for ordinal variables and interval or ratio variables with outliers.
- The mean is used with interval or ratio variables without outliers.
- Each measure has a particular process for determining its value.
- The mode is determined from a tally of values.
- The median is determined from a sorted list of values.
- The
mean is
calculated by adding the values and dividing by the number of values.

The subject of the next section addresses the need to describe the spread of scores in addition to their central tendency. With these two descriptive statistics, more summary information can be provided.

- Locate the maximum value (Max) and the minimum value (Min).
- Subtract the minimum value from the maximum value. Range = Max - Min.

Step 1. Max = 9 and Min = 4

Step 2. Range = 9 - 4 = 5

If you have Excel, you can use it to calculate the range. First, enter the numbers in a column - one number in each cell. Let's say that the numbers are in cells A1 through A10. Then, enter the following formula in cell A11 (or any other empty cell): = max(A1:A10) - min(A1:A10)

First, a bit of terminology. What is a deviation and what might make it standard? A deviation is the distance between a value in a distribution and the mean. Every value in the distribution has a deviation associated with it. Because the process of averaging deviations always results in a value of 0, there is no average deviation. Instead, statisticians use a process of squaring the deviations, averaging them, and then taking the square root of the result to generate a standard deviation. [Squaring is multiplying a number by itself - 3 squared is 3X3 or 9. Taking the square root is the opposite operation - the square root of 9 is 3. For numbers that are not perfect squares, like 4, 9, 16, 25, 36, 49, 64, 81, and 100 are - among infinitely many others, it is handy to have a spreadsheet program or a calculator.] Here are the steps and the formula for the standard deviation. Refer back to these steps to understand the example presented after the formula.

- Calculate the mean, X.
- Subtract each value from the mean - these are the deviations.
- Square the deviations.
- Sum the squared deviations.
- Divide the sum of squared deviations by n-1 - this is called the variance.
- Calculate the square root of the variance - this is the standard deviation.

The mathematical formula looks like this: |

If you have Excel, you can use it to calculate the standard deviation. First, enter the numbers in a column - one number in each cell. Let's say that the numbers are in cells A1 through A10. Then, enter the following formula in cell A11 (or any other empty cell): = stdev(A1:A10)

Scores: 7, 5, 8, 9, 7, 6, 9, 8, 7, 4

Step 1. The mean that we calculated earlier is X = 7.

Step 2. The deviations are shown in the second column below.

Step 3. The squared deviations are shown in the third column below.

Step 4. Summing the squared deviations gives 24, as shown in the third to last cell of the third column.

Step 5. Dividing 24 by n-1 (10-1=9) gives 2.67, as shown in the second to last cell of the third column.

Step 6. Calculating the square root of 2.67 gives 1.63, as shown in the last cell of the third column.

X | X-X | (X-X)^{2} |

7 | 7-7=0 | 0X0=0 |

5 | 5-7=-2 | -2X-2=4 |

8 | 8-7=1 | 1X1=1 |

9 | 9-7=2 | 2X2=4 |

7 | 7-7=0 | 0X0=0 |

6 | 6-7=-1 | -1X-1=1 |

9 | 9-7=2 | 2X2=4 |

8 | 8-7=1 | 1X1=1 |

7 | 7-7=0 | 0X0=0 |

4 | 4-7=-3 | -3X-3=9 |

sum of squared deviations | 24 | |

sum divided by n-1 (9) | 2.67 | |

square root (standard deviation) | 1.63 |

Just to reiterate, no one calculates the standard deviation by hand following the steps listed here; however, understanding that the standard deviation is a measure of the amount of variation (or spread) in a distribution is very important. For example, if you have two standard deviations for similar sets of data and one is quite a bit larger than the other, the means that describe the centers of the two distributions are not equally accurate. The mean with the smaller deviation is a better summary of the distribution than the mean with the larger standard deviation. Think of the standard deviation as a quality-control measure for the mean. In fact, means should never be reported without accompanying standard deviations.

Here is one way to remember what the standard deviation tells you. Suppose you are attending a conference and are offered your choice of two dorm rooms for your housing. The only difference between the two rooms has to do with the plumbing. Because it is farther from the water heater, the shower water temperature in one room fluctuates more than in the other. Let's say that the hot water in Room 652 has a mean temperature of 100 degrees and a standard deviation of 6 degrees. Room 247 also has a mean temperature of 100 degrees but its standard deviation is only 2 degrees. Assuming that the distribution of water temperatures resembles a bell-shaped curve (much more about this in the weeks ahead), the person showering in Room 247 will generally experience hot water temperatures between 96 and 104 degrees 95% of the time - generally tolerable. Consider the person showering in Room 652. The hot water temperature in that shower will fluctuate between 88 and 112 degrees 95% of the time. [The range of temperatures reflects plus or minus two standard deviations from the mean of 100 degrees.] Which room would you pick?

How does this apply to educational settings? Would you prefer to teach in a classroom where the standard deviation of students' reading scores is large or small? Of course, this is not a math question. Students with different skill levels can benefit from interaction with more and less capable peers. Large fluctuations in prerequisite skill levels can serve to frustrate the teacher as well as the students, however. If you are an administrator and you don't really understand standard deviations, you might create classrooms where the mean score levels of students are equal but where some classrooms have more variation than others. As we are focusing on raising mean scores of students, leaving no students behind, should we set goals to increase standard deviations as well or should these be decreasing?

The formula for calculating r is one of the most complex that you will see. Correlation coefficients can be calculated by hand, but most people use a spreadsheet or statistics program. Two important aspects of this formula are that both an X distribution and a Y distribution are involved in the formula and that in addition to squaring and taking the square root of quantities, X and Y are multiplied together in the numerator (upper half) of the formula. The result of multiplying two numbers is called the product, which is why "product" is part of the full name of the correlation coefficient. Visit this site: http://allpsych.com/stats/unit2/14.html if you want to learn about evaluating the formula.

The formula for r looks like this: |

If you have Excel, you can use it to calculate r, the Pearson correlation coefficient. First, enter the numbers in two columns - one column for Xs and one column for Ys. Let's say that the numbers are in cells A1 through A10 and B1 through B10. Then, enter the following formula in cell A11 (or any other empty cell): = correl(A1:A10,B1:B10)

Before you can interpret the numerical value of r, you need to determine if it was appropriate to calculate r to begin with. The Pearson correlation coefficient, r, describes the strength and direction of a linear relationship between two variables. The first step in exploring a linear relationship is to generate the scatterplot, which you learned about in the previous section. By inspecting the scatterplot, you can assess whether a line describes the pattern or if some other curve might provide a better description. Similar to the mean, the accuracy of a correlation coefficient can also be compromised by outliers in either distribution.Visually inspect the scatterplot can identify these cases.

As mentioned, Pearson r's describe both strength and direction of a relationship. The sign of r, either + or -, indicates the direction of he relationship. A positive r indicates a direct relationship, as X increases so does Y and as X decreases so does Y. A negative r indicates an indirect relationship, as X increases, Y decreases and as X decreases Y increases. The absolute value of r indicates the strength of the relationship. Pearson r's range from -1 to +1. Values close to -1 or +1 indicate a strong linear relationship - the associated scatterplot displays the pattern of dots in a nearly straight line. A positive Pearson r isn't necessarily better (i.e., stronger) than a negative r - you need to compare the values while ignoring the signs.

Here are some websites where you can explore the meaning of the strength and direction of the correlation coefficient: http://www.stattucino.com/berrie/dsl/correlation.html, http://www.stat.berkeley.edu/~stark/Java/Html/Correlation.htm, and http://www.stat.uiuc.edu/courses/stat100/java/GCApplet/GCAppletFrame.html.

Returning to our previous example involving age and years of work experience, here is the scatterplot that was generated earlier.

The Pearson correlation coefficient associated with these two variables is shown in the following SPSS output.

This output is an example of the simplest form of a correlation matrix. This matrix has two rows and two columns, resulting in four cells. The upper left cell contains the correlation of AGE with AGE, which is always 1. The lower right cell is similar. The upper right and lower left cells are copies of each other - both contain the correlation of AGE with EXPERIENCE. The Pearson r is reported first - in this case, r = .834, which is based on 50 pairs of numbers (N). You will learn the meaning of the middle number later in the course.

What meaning can be associated with r = .834? First, because r is positive (greater than 0), older ages are associated with more years of work experience and younger ages are associated with fewer years of work experience. Second, because .834 is close to 1, the linear pattern between AGE and EXPERIENCE is strong, which means that the points fall fairly close to a straight line. Statisticians like to be precise with their interpretations; so, levels of strengths have been proposed. Ignoring the plus or minus sign, r's have been assigned the following strengths:

r value | Strength |

0-.2 | very weak |

.2-.4 | weak |

.4-.6 | moderate |

.6-.8 | strong |

.8-1 | very strong |

An even more precise measure of strength is to use the Coefficient of Determination, r

Often when studying correlations between two variables, you might begin to attribute a cause-and-effect relationship between the two variables. The existence of a meaningful correlation coefficient does not indicate causation. Foot size and reading ability are correlated, but neither causes the other.

The Pearson correlation coefficient applies to pairs of interval or ratio-level variables. Here are some other types of correlations for different measurement levels.

X | Y | Correlation |

nominal | nominal | Phi coefficient |

nominal | ordinal | Rank biserial coefficient (r_{rb}) |

nominal | interval/ratio | Point biserial coefficient (r_{pb}) |

ordinal | ordinal | Spearman rank-order coefficient (rho) |

interval/ratio | interval/ratio | Pearson r |

It is quite appropriate for these topics to be addressed just after correlations have been explained, because most of the ways in which researchers estimate the reliability and validity of measurement instruments, procedures, or the use of their results involve the use of correlations. Furthermore, embedded in the process of quantitative research is the process of converting educational constructs into numerical values (i.e., operational definitions), which must be continually scrutinized for its legitimacy.

Whenever you hear or read the word, reliability, think of the synonym, consistency. Don't confuse reliability with validity. I'm sure that you know someone who is very reliable but always late or always wrong. Their response or behavior is always the same, but that doesn't mean it is appropriate. Reliability is necessary, but not sufficient, for validity.

Type of Reliability | Application |

Test-retest | Use this type of reliability estimate
whenever you are measuring a trait over a period of time. Example: teacher job satisfaction during the school year |

Parallel forms | Use this type of reliability estimate
whenever you need different forms of the same test to measure the same
trait. Example: multiple forms of the SAT |

Internal consistency | Use this type of reliability estimate
whenever you need to summarize scores on individual items by an overall
score. Example: combining the 20 items on a statistics test to represent level of knowledge about a particular aspect of statistics |

Interrater | Use this type of reliability estimate
whenever you involve multiple raters in scoring tests. Example: AP essay test grading |

Validity is described as a unitary concept or property that can have multiple forms of evidence. Whether a test is valid or invalid does not depend on the test itself, but rather, validity depends on how the test results are used. For example, a statistics quiz may be a valid way to measure understanding of a statistical concept, but the same quiz is not valid if you intend to assign grades in English composition based on it. This situation may seem to be quite far fetched, but educators and politicians who do not carefully investigate the properties of a quantitative measurement may be making serious mistakes when basing decisions on the data. Here are some of the forms of evidence of validity that are presented in the text - there are others that you may encounter.

Type of Validity | Application |

Content | Use this type of validity estimate
whenever you are comparing test items with a larger domain of knowledge. Example: assessing the breadth of a comprehensive final exam |

Criterion | Use this type of validity estimate
whenever you are comparing a new test with an established standard. Example: developing a test to predict genius (predictive), or developing a new test comparable to the CAHSEE (high school exit exam) |

Construct | Use this type of validity estimate
whenever you are comparing a test with the elements of a theoretical
definition of a trait. Example: developing a new test for musical intelligence - distinguishing between musical and other types of intelligence |

With the exception of interrater reliability and content validity, all other estimates usually involve the calculation of a type of correlation. Multiple estimates may be used to evaluate the properties of an instrument in certain situations. One form of validity, namely predictive validity, relates a test score to some future event or condition. In the following section, you'll see how prediction and correlation can be related to help make decisions or analyze patterns.

The process of applying linear regression techniques assumes that there is a basis of historically observed data on which to base future predictions. One example of this situation is the school admissions process, where data on applicants are compared with data from previously successful and unsuccessful students. Another example is actuarial work (e.g., life insurance rate-setting), where historical data about longevity is used to predict lifespan. Regression models for these examples are much more complicated than the straight line model that is used in linear regression. Here is the regression line for the age and experience example.

Many lines can be drawn through the scatterplot of points, but one line provides the best fit, where best is defined as having a minimum of error. What is error? Whenever you use a theoretical model to make a prediction, and then check that prediction against historical data, the predicted results can either match the data or not. In the scatterplot shown above, the prediction matches the data where the line goes right through a point. Whenever the line misses a point, error occurs. The amount of error could be small (i.e., the line is near the point), or it could be large (i.e., the line is far from the point). Notice how the three lower points to the right generate quite a bit of error. One method for determining the line's location is called the method of least squares, where the vertical distance between a potential line and each point is squared and these squares are summed, and the line with the least sum of squares is selected as the best fit. The mathematics involves calculus and is beyond the scope of this introduction. The average amount by which the line misses the points is called the standard error of the estimate, which you can think of as similar to a standard deviation.

When conducting linear regression analysis using SPSS or Excel, the programs will generate the two numbers (b and a) needed to describe the line. The equation for a line is the following:

Y' = bX + a,

where Y' represents the prediction,

X represents the independent variable,

b is called the slope, and

a is called the y-intercept.

Excel uses a function named LINEST (for linear regression estimate). Please read the help notes in Excel about how to use this function.

SPSS provides the following table, among others that we'll see later.

Unfortunately, the values are not labeled consistently between statistical programs and statistics texts. The value for the y-intercept (a in the equation of the line) is found in the B column of the SPSS output within the (Constant) row. The value for the slope (b in the equation of the line) is also found in the B column within the AGE row. Remember that the slope (b) is paired with X in the equation, and, because X is the AGE variable, the slope is in the AGE row. The resulting equation is the following.

Y' = .627 X + (-13.374)

What does this equation tells us? It provides a way to estimate the likely years of work experience for a person if you already know the person's age. For example. a 42-year-old would be predicted to have almost 13 years of work experience (.627 X 42 - 13.374 = 12.96). [Do the multiplication first and then the subtraction.] Another way to interpret the equation is that for every 10 years of age, the years of work experience increases by about 6.3 years.

One word of caution about predicting these values - notice that the range of age values started at 22 years old and ended at 66. The equation is only valid to use where there were data points in the original set of data for the independent variable, age. In other words, the prediction equation generates meaningless numbers for people younger than 22 or older than 66. Try calculating the predicted years of work experience for a 15-year-old. When employing prediction equations, the range of legitimate values for the independent variable must be known.

Here's another example to consider, let's say you are determining grade-level reading ability based on reading test scores. The test publisher includes a regression equation for calculating the reading ability levels. The regression equation was derived from the data of third grade students who were moderately good readers. You have a student who tested very, very high and the regression formula places her reading ability at that of a sophomore in college. Because the regression equation was not developed with high-scoring readers, the predicted reading ability is not valid to use.

To learn more about linear regression by working with some interactive applets, visit the following web pages:

http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html

http://bcs.whfreeman.com/ips4e/cat_010/applets/CorrelationRegression.html

http://www.stattucino.com/berrie/dsl/regression/regression.html

http://www.stat.sc.edu/~west/javahtml/Regression.html