The goal of a descriptive statistic is to provide a convenient and meaningful summary of a sample or population of data. Of course, any time you summarize a dataset you are removing information. The goal is to preserve the important characteristics of the dataset while deleting the less important information. However, identifying the set of ‘important characteristics’ of a dataset is a subjective process, and the characteristics that qualify as important are likely to vary from dataset to dataset. For example, consider this one:
The dataset consists of hundreds of datapoints and is much more convenient to talk about if we can summarize it in some way. The difficulty, though, lies in identifying which characteristics of this dataset are necessary to include in a good summary. An obvious aspect that we might include is the center of the distribution of sample data. In other words, which point is most representative of the dataset as a whole? You might argue that 10 lies at the center of the distribution, or the peaks of the distribution at 6 and 14 are most representative. Another characteristic is the spread of the distribution, or how much the data vary around the central point. A distribution whose points cluster quite tightly around a central point would be said to be less variable than another distribution whose datapoints diverge considerably from the center. Finally, we can talk about the shape of the distribution. In our example, the distribution appears to be at least somewhat bell-shaped with a peak in the middle that tapers off to either side. Other distributions could just as easily have other shapes. Common descriptive statistics about the shape of a distribution include kurtosis, the degree of peakedness of a distribution, and skewness, the degree of symmetry of a distribution around its mean.