Measure of Central Tendency and Dispersion:
Following are the measure of central tendency and dispersion.
Central tendency means the entire data set is tending to converge at some central point. There are three important measures that gives us an overview of central tendency of the entire dataset namely mean, mode and median.
The first and most commonly used measure is Mean. As per Wikipedia it is defined as “A mean isĀ a numeric quantity representing the center of a collection of numbers and isĀ intermediate to the extreme values of a set of numbers.” It is an average score of the entire dataset. To calculate the mean we add all values (observations) of an attribute of the dataset and divide the sum by number of values (observation). For example we have a dataset of school showing height and weights of all students i.e. 60. Then in order to calculate the mean of height we add height of all students and divide the sum of height by number of students that is 60. Mean is simple to calculate and it portrays is good picture of central tendency of the entire dataset but some times mean fails to provide the real idea of the central tendency of the dataset. It is because of high amount of inconsistent dispersion among the data points of the dataset. Especially the outliers corrupt the mean.
Mode: It is a score that is repeated most in the dataset. In above example of school, if the heights measured are observed we can see that one particular height repeats itself more that other heights. In order to find out the mode of a dataset we have to sort the dataset in ascending order and then count how many times each value or observation occurs. The value or score that occurs mostly is the mode of dataset. It may happen that a dataset has one value that repeats most or two or more than two values that repeat most. When there is only one value that repeats most then the dataset is called Uni-modal dataset and when there are two values then the dataset is call bimodal and if there are three or more than three repeating values the the dataset is call multi-modal dataset.
Median: It yet another way of quantifying the center of a distribution. It shows you the middle value when the values are ordered in ascending. Here we need to know how many observations are there. Then split the entire observations in two equal parts. If the number of observations are odd then the formula is like (n+1)/2 where n is number of observations. And if the number of observations is even then we have to modify the above formula a little. We have to take the average of two middle values.
Measure of dispersion: Study of dispersion in a data set is as important as study of central tendency of the dataset. There are following measures that gives us an idea about the amount of dispersion present in the dataset. Namely variance and standard deviation.
Variance: It is a sum of squares of errors. Where error is difference between the observation and the mean of all observations. Variance is also called as Sum of Squares (SS). It is shown by Greek letter Sigma(small) squared. This measure is rarely used since the output of this measure is square. if we find the variance of the length measured in Meter then the unit of value given by variance well be square Meter, which does not show the length but area. Therefore much robust and error free measure of variance is Standard Deviation in short S.D. Shown by Greek letter Sigma (small).
Standard Deviation: It is simply the square root of Variance. It is free from the problem of units. Also known as SME(square of m)
More value of variance and standard deviation mean that there is high amount of dispersion among the observations.
For more details study: Discovering statistics using IBM SPSS statistics By Andy Field.