National Measurement System

Chemical and Biological Metrology Website

Resources

Welcome to the NMS chemical and biological metrology website. Please log in or register to view restricted content.

Robust statistics

The other sections describe the use of outlier detection and rejection for dealing with extreme values in data sets. The aim of rejecting outliers is to obtain a more reliable estimate of the mean and standard deviation of a data set. An alternative to outlier rejection is robust statistics. Robust statistics provide a more reliable estimate of the mean and standard deviation in the presence of extreme values (and also provide a sound estimate when no outliers are present). A simple approach to calculating the robust mean and standard deviation is described below. An Excel Add-in capable of calculating more sophisticated estimates is available from the RSC Analytical Methods Committee.

Robust estimate of the population mean


The median can be used to provide a robust estimate of the population mean. The median is obtained by arranging a data set in ascending order and finding the middle value. The median of n ordered values x1xn is given by:
median eqnv2






Compared to the arithmetic mean, the median is influenced less by the presence of outliers.

Robust estimate of the standard deviation


The median absolute deviation (MAD) is an easily calculated robust estimate of the standard deviation. The median absolute deviation is obtained as follows:

  • calculate the median of the data set
  • calculate the absolute difference (deviation) of each data point from the median value
  • calculate the median of the absolute deviations.

For n values, the median absolute deviation is therefore calculated from:
MAD eqn



where
median symbol



represents the median of the data set.

For a normal distribution, MAD ≈ 0.674σ.

To provide a robust estimate that is directly comparable with the standard deviation of a normal distribution, the MAD value is divided by 0.674. The resulting value is usually referred to as ‘MADE’ (pronounced ‘mad e’):
MADe eqn




Example

The data shown below (listed in ascending order) are from a round of a proficiency testing scheme. Calculate the mean, standard deviation, robust mean and robust standard deviation.

3.5

4.0

12.3

12.6

12.7

12.8

12.8

12.8

12.8

12.9

12.94

12.99

13.0

13.05

13.1

13.1

13.2

mean

11.8

standard deviation

3.04

median

12.8

MADE

0.297

The median is the middle value when the data set is arranged in order of magnitude.

The MAD value is calculated by determining the median of the absolute deviations form the median value as shown below:

Data

Absolute deviation from the median

Deviations arranged in ascending order

3.5

9.3

0

4.0

8.8

0

12.3

0.5

0

12.6

0.2

0

12.7

0.1

0.1

12.8

0

0.1

12.8

0

0.14

12.8

0

0.19

12.8

0

0.2

12.9

0.1

0.2

12.94

0.14

0.25

12.99

0.19

0.3

13.0

0.2

0.3

13.05

0.25

0.4

13.1

0.3

0.5

13.1

0.3

8.8

13.2

0.4

9.3

Median (MAD)

0.2

MADE = MAD/0.674 = 0.2/0.674 = 0.297.

Last modified on 18 August 2009.