There are three commonly used measures of central tendency: the mean, the median, and the mode. Each of them has strengths and weaknesses, and none of them are the ideal measure of central tendency for all datasets. To illustrate how to calculate each of them, let’s use the following example dataset: X = \big\{5, 3, 18, 6, 3\big\}

The Mean

The mean (or average) is the sum of all of the scores divided by the number of scores. The formula for the population mean is:

\mu = \frac {\sum {X}}{N}

N is the number of scores in the population. The sample mean is:

\overline{X} = \frac {\sum {X}}{n}

\overline{X} is the mean of the scores, \sum {X} is the sum of the scores, and n is the number of scores in the sample. Using the formula to calculate the mean of our sample:

\overline{X} = \frac {\sum {X}}{n} = \frac {5+3+18+6+3} {5} = 7

Advantages of the mean as a measure of central tendency:

  • It can be expressed as a simple mathematical formula, in contrast with the other major measures of central tendency
  • It is affected by all scores in the dataset

Disadvantages include:

  • It can’t be used with non-quantitative data. For example, if your dataset is {Red, Blue, Blue, Green}, then the calculation of a meaningful mean is impossible.
  • It can be heavily affected by extreme scores. In the example dataset, 18 can be considered an extreme score. The mean is pulled toward the extreme score and away from the other scores, making it less representative of the center of the dataset.

The Median

The median (or 50th percentile) is the number for which exactly half of the dataset falls below that point. Unlike the mean, there is not an easy formula for the median. Instead, we use a procedure:

  1. Sort your datapoints from lowest to highest.
  2. If you have an even number of datapoints, the median is defined as the mean of the two datapoints closest to the middle of the dataset.
  3. If you have an odd number of datapoints, the median is defined as the middle datapoint.

To illustrate, let’s calculate the median of the example dataset. Sorted from lowest to highest, our dataset is  \big\{3, 3, 5, 6, 18\big\}. Since we have an odd number of datapoints (5), we follow step 3. The datapoint in the middle of that ordered list is 5, which is the median. Now imagine that we are to calculate the median of this dataset:  \big\{3, 5, 6, 18\big\}. We have an even number of datapoints in this case, so we follow Step 2. We take the mean of the two closest to the middle of the list to get the median: \frac {5+6} {2} = 5.5. Note that in this case any number that is greater than 5 and less than 6 would split the dataset in half. Half of the datapoints are less than 5.0000001, and half are greater than 5.0000001, for example. It is the standard convention to define the middle of that range as the median.

Advantages of the median as a measure of central tendency:

  • It is less affected by extreme scores. We calculated a median of 5 in our example dataset, which does a better job of representing the center of most of the points in our dataset.

Disadvantages include:

  • Like the mean, it can’t be used with non-quantitative data. For example, if your dataset is {Red, Blue, Blue, Green}, then the calculation of a meaningful median is impossible.
  • It is more difficult to calculate than the mean, especially for large datasets.

The Mode

The mode is the most commonly observed value in your dataset. There is no easy formula for this either. Instead, you just tally up the number of times you observe each value in your dataset, and the mode is defined as the most frequently observed value. In our example dataset, we observed the value of 3 twice, while 5, 6, and 18 were each observed once. Therefore, 3 is the mode of our dataset.

Advantages of the mode as a measure of central tendency:

  • It is usually not at all affected by extreme scores.
  • It can be calculated for a non-quantitative dataset. For our {Red, Blue, Blue, Green} dataset, the mode is Blue.

Disadvantages include:

  • It is more difficult to calculate than the mean, especially for large datasets.
  • It is not always a good representation of the center of the distribution. For example, if your dataset was {1, 1, 18, 18, 18}, then the mode is 18, which lies at the extreme right of the distribution.

When to Use Each of Them

There is no measure of central tendency that is ideal in all situations. However, the mean is by far the most common measure of central tendency used in statistics. You should always use it except for in a couple of situations. If you are worried about the influence of extreme scores, the median can be the best choice as a measure of central tendency. If you have non-quantitative (nominal) data, the mode is the only measure available to you. In all other cases, the mean is usually the best choice.

Central Tendency Functions in Microsoft Excel 2010+

For all of the following, I am assuming that your dataset resides in cells A2 through A10 of your Excel spreadsheet. You will need to replace these values with the actual location of your dataset in order for them to work properly in your spreadsheet.

Mean

=AVERAGE(A2:A10)

If you want to feel clever you can do this instead:

=SUM(A2:A10)/COUNT(A2:A10)

Median

=MEDIAN(A2:A10)

Mode

=MODE(A2:A10)


Categorised in: Statistics