Method performance parameters
Method validation involves studying a number of aspects of method performance. The key parameters are described in the following sections.
Selectivity and specificity
Precision
Bias
Accuracy
Detection capability
Linearity and working range
Ruggedness
Specificity and selectivity
It is important to establish during method validation that the test method is measuring only what it is intended to measure. In other words, the method must be free from interferences which could lead to an incorrect result. The specificity of a method refers to the extent to which it can unambiguously detect and determine a particular analyte in a mixture, without interference from the other components in the mixture. In some fields of measurement, selectivity is used as an alternative term for specificity, i.e. selectivity is the ability of a method to selectively measure an analyte in the presence of other substances.
Specificity is studied by analysing samples ranging from pure measurement standards spiked with potential interferents, to known mixtures that match real sample compositions. Serious interferences need to be eliminated, but minor effects can be tolerated and included in the estimation of method bias.
Precision
Precision is defined in the standard ISO 3534-2 as, “The closeness of agreement between independent test/measurement results obtained under stipulated conditions. Precision is therefore a measure of the spread of repeated measurement results and depends only on the distribution of random errors – it gives no indication of how close those results are to the true value. It is evaluated by making repeated independent measurements on identical samples. The precision is usually expressed as the standard deviation (or relative standard deviation) of the results. The conditions under which the measurements are made will determine the type of precision estimate obtained.
Repeatability is the precision estimate obtained when measurement results are produced in one laboratory, by a single analyst using the same equipment over a short timescale. Repeatability gives an indication of the short-term variation in measurement results and is typically used to estimate the likely difference between replicate measurement results obtained in a single batch of analysis.
Reproducibility is the precision estimate obtained when measurement results are produced by different laboratories (and therefore by different analysts using different pieces of equipment). Reproducibility therefore has to be evaluated by carrying out a collaborative study.
If a single laboratory is validating a method for its own use, a study of repeatability is likely to underestimate the real variation in results when the method is used routinely. Laboratories should therefore consider evaluating the intermediate precision (also known as within-laboratory reproducibility). This involves making replicate measurements on different days, under conditions which mirror, as far as possible, the conditions of routine use of the method (e.g. measurements made by different analysts using different sets of equipment).
If the test method is to be used for the analysis of a range of sample type (e.g. different analyte concentrations or sample matrices) then the precision will need to be evaluated for a representative range of samples. For example, it is common for the precision of a test method to deteriorate as the concentration of the analyte decreases.
Bias
Trueness is defined in ISO 3534-2 as, “The difference between the expectation of a test result or measurement result and a true value”, with a note that the measure of trueness is usually expressed in terms of bias In practice, the true value is replaced by an accepted reference value (e.g. the concentration of the analyte in a certified reference material). Bias represents the total systematic error.
Evaluating bias generally involves carrying out repeat analysis of a suitable material containing a known amount of the analyte (this is the reference value). The bias is simply the difference between the average of the test results and the reference value. Bias is also frequently expressed as a percentage and as a ratio (when it is usually referred to as ‘recovery’).
One of the problems facing an analyst when planning a study of method bias is the selection of a suitable reference value. There are a number of options:
Certified reference material (CRM)
Spiked test samples
Reference method
A certified reference material is a material that has been produced and characterised to high standards and that is accompanied by a certificate stating the value of the property of interest (e.g. the concentration of a particular analyte) and the uncertainty associated with the value. If a suitable CRM is available (i.e. one that is similar to test samples in terms of sample form, matrix composition and analyte concentration) then it should be the first choice material when planning a bias study. Unfortunately, compared to the very large number of possible analyte/matrix combinations, the number of CRMs available is relatively limited so a suitable material may not be available. An alternative is to prepare a reference sample in the laboratory by spiking a previously analysed sample with an appropriate amount of the analyte of interest. With this type of study, care must be taken to ensure that the spike is in equilibrium with the sample matrix before any measurements are made.
Bias can also be evaluated by comparing results obtained from a reference method with those obtained using the method being validated. If this approach is taken, the evaluation can be carried out using test samples rather than a CRM or spiked sample.
Accuracy
Accuracy is defined as, “The closeness of agreement between a measurement result and the true value.” Accuracy is a property of a single result and is influenced by both random and systematic errors. It describes how close a result is to the true value and therefore includes the effect of both precision and bias.
Detection capability
In many situations it is useful to know the lower ‘operating limit’ of the method, for example, the minimum concentration of the analyte that can be detected and/or quantified with a reasonable degree of certainty.
The limit of detection (LoD) is the minimum concentration of the analyte that can be detected with a specified level of confidence. It can be evaluated by obtaining the standard deviation of results obtained from replicate analysis of a blank sample (containing none of the analyte of interest) or a sample containing only a small amount of the analyte. The resulting standard deviation is multiplied by a suitable factor (3s is frequently used as the basis for LoD estimates). The multiplying factor is based on statistical reasoning and is specified so that the risk of false positives (wrongly declaring the analyte to be present) and false negatives (wrongly declaring the analyte to be absent) is kept to an acceptable level (a 5% probability is usually specified for both types of error). The LoD determined during method validation should be taken only as an indicative value and the approach described is adequate if results for test samples will be well above the LoD. If sample concentrations are expected to be close to the limit of detection than the LoD should be monitored after validation and a more statistically rigorous approach may be required.
The limit of quantitation (LoQ) is the lowest concentration of analyte that can be determined with an acceptable level of uncertainty. A value of 10s is typically used (where s is the standard deviation of the results from replicate measurements of a blank or low concentration sample).
Linearity and working range
The working range of a method is the concentration range over which the method has been demonstrated to produce results that are fit for purpose. The lower end of the working range is defined by the LoD and LoQ. The upper end is usually signified by a change in sensitivity, for example a ‘tailing-off’ or ‘plateauing’ in the response.
The determination of linearity and working range at the method validation stage is important, because it allows the suitability of the method over the range required by the analytical specification to be established. It also helps in the design of the calibration strategy for the method.
In order to assess the working range, and confirm its fitness for purpose, standards whose concentrations span the expected concentration range (±10 - 20%) should be studied. The standards should be evenly spaced across the concentration range. Establishing linearity during method validation will normally require more standards (and more replication at each concentration) than is typical for calibration of a validated method in regular use.
Many chemical test methods require the test sample to be treated in some way so as to get the analyte into a form suitable for measurement. During method validation it is sensible to carry out an initial study to evaluate the response of the instrument to the analyte across the required concentration range. This can be done by analysing standards containing known concentrations of the analyte in a suitable solvent. This study will enable the calibration function for the instrument to be established. Once the instrument performance has been demonstrated to be satisfactory the linearity of the whole method should be studied. This requires the analysis of CRMs, spiked samples or matrix-matched standard solutions (i.e. solutions containing a known amount of the analyte plus the sample matrix). If the instrument response has been demonstrated to be linear then any non-linearity observed in the second study may indicate problems such as the presence of interfering compounds or incomplete extraction of the analyte from the sample matrix.
Linearity should be assessed initially by constructing a plot of response versus concentration. The data can then be evaluated by carrying out linear regression to establish the correlation coefficient (r), and the gradient and intercept of the line. Ruggedness
Ruggedness testing evaluates how small changes in the method conditions affect the measurement result, e.g. small changes in temperature, pH, flow rate, composition of extraction solvents etc. The aim is to identify and, if necessary, better control method conditions that might otherwise lead to variation in measurement results, when measurements are carried out at different times or in different laboratories. It can also be used to identify factors which need to be addressed to improve precision and bias.
Ruggedness testing can be carried out by considering each effect separately, by repeating measurements after varying a particular parameter by a small amount (say 10%) and controlling the other conditions appropriately. However, this can be labour intensive as a large number of effects may need to be considered. Since for a well-developed method most of the effects can be expected to be small, it is possible to vary several parameters at the same time. Any stable and homogeneous sample within the scope of the method can be used for ruggedness testing experiments. Experimental designs are available, which allow several independent factors to be examined simultaneously. A common experimental design used in ruggedness testing is the Plackett-Burman design which allows the study of the effect of seven parameters through carrying out eight experiments. Factors identified as having a significant effect on measurement results will require further study. If it is not possible to reduce the impact of the parameters by employing tighter control limits then further method development will be required.