ISO 13528: homogeneity and stability in proficiency testing schemes

ISO 13528: homogeneity and stability in proficiency testing schemes

ISO 13528 is a key standard for proficiency testing by interlaboratory comparison. Its purpose is to provide detailed statistical methods for designing proficiency testing schemes, analysing the data obtained and supporting the interpretation of results by both participants and accreditation bodies. The standard applies to quantitative and qualitative results and is developed as a technical complement to ISO/IEC 17043, particularly in relation to statistical design, item validation, review of results and the presentation of summary statistics.

Within this general framework, one of the most relevant sections from a practical point of view is Annex B, devoted to the homogeneity and stability of proficiency test items. And that makes perfect sense: before calculating assigned values, z scores or any other statistic, it is necessary to ensure that the material sent to participants does not introduce variability that has nothing to do with the laboratory’s performance. Because if the item is flawed from the start, statistics can only describe the disaster more precisely.

What ISO 13528 regulates in proficiency testing

ISO 13528 does not simply define a formula or an evaluation criterion. The standard broadly addresses the statistical elements that support a PT scheme: statistical design, the initial review of items and results, the determination of the assigned value, performance evaluation criteria, the calculation of performance statistics and several graphical methods for describing and reviewing results. It also includes specific guidance for qualitative schemes.

Among its general principles, the standard establishes that the statistical methods used must be fit for purpose and statistically valid, and that the statistical assumptions on which they rely must be described and justified. In addition, the statistical design and analysis techniques must be consistent with the stated objectives of the proficiency testing scheme.

This matters because in PT it is not enough simply to “have data”. The data must be obtained under a technically sound design, with suitable items and with evaluation criteria that are consistent with the purpose of the exercise.

Why ISO 13528 requires homogeneity and stability

The initial review of test items appears in Clause 6 of the standard, and begins precisely with the homogeneity and stability of PT items. ISO 13528 states that the proficiency testing provider must ensure that the batches of items are sufficiently homogeneous and stable for the purposes of the scheme. It also indicates that the assessment of homogeneity and stability must be carried out using criteria that ensure that any lack of homogeneity or instability does not adversely affect the evaluation of participants’ performance.

That detail is fundamental. Homogeneity and stability are not assessed as an abstract or decorative requirement. They are assessed because, if they fail, the laboratory’s result no longer reflects only its technical capability and begins to be contaminated by a problem with the material itself.

The standard considers three approaches for this evaluation:

  • experimental studies, such as those described in Annex B itself;
  • previous experience with very similar items in earlier rounds;
  • and review of participant data in the current round in order to detect consistency or evidence of change, unexpected dispersion or problems attributable to inhomogeneity or instability.

Homogeneity of items: what it means and how it is checked

Annex B develops a general procedure for checking the homogeneity of a bulk preparation of PT items. The logic of the procedure is clear: select an appropriate property or measurand for the study, choose a laboratory and a method with sufficiently small repeatability, prepare and package the items, randomly select a number of units from the final batch and measure them under repeatability conditions.

The standard recommends that the measurement method should have a repeatability standard deviation sufficiently low to allow significant inhomogeneity to be detected. More specifically, it indicates as a reference that the ratio between the repeatability standard deviation of the method and the standard deviation for proficiency assessment should be below 0.5, although it recognises that this is not always possible and that, in such a case, the provider should use more replicates.

From that point on, the evaluation is based on comparing between-sample variability with σpt. The items may be regarded as adequately homogeneous if the between-sample standard deviation satisfies:

  • ss ≤ 0.3 σpt
  • or equivalently, ss ≤ 0.1 δE

The justification for this factor is not arbitrary: when this criterion is met, the contribution of between-sample variability represents less than 10 % of the variance used in the performance evaluation. In other words, the influence of inhomogeneity on the statistical evaluation becomes of limited relevance.

What to do if homogeneity is not sufficient

The standard does not simply say that a batch “does not work” and stop there. It also proposes several options if the homogeneity criteria are not met. These include:

  • incorporating the between-sample standard deviation into the standard deviation for proficiency assessment;
  • including that component in the uncertainty of the assigned value and using statistics such as z’;
  • or, in some cases, accepting the risk and verifying acceptability later when σpt is derived from participant results.

If none of these options is suitable, the standard is quite direct: the item must be discarded and the preparation repeated after correcting the cause of the inhomogeneity. Which is a fairly sensible position. Continuing with a defective item just to save time is one of those ideas that sounds practical until it becomes a documented problem.

Stability during the round and in transport

Annex B also develops procedures for checking the stability of items during the round of testing and under transport conditions. The standard indicates that, when there is no solid justification based on previous experience or earlier studies, it is advisable to perform specific checks, either before distribution or during the development of the round itself.

For a basic check of stability during the round, the standard proposes comparing one group of items measured before distribution with another group kept aside and measured after the close of the round, using the same laboratory, method and number of replicates. The items may be considered sufficiently stable if the difference between the means of both groups satisfies:

  • |y1 – y2| ≤ 0.3 σpt
  • or equivalently, ≤ 0.1 δE

If the intermediate precision of the method has too much influence on the ability to detect changes, the standard allows the criterion to be expanded by including the uncertainty of the difference, or using isochronous studies, increasing the uncertainty of the assigned value or, if the problem is serious enough, not evaluating participant performance.

As for transport, the standard indicates that its effects should be checked at least in the early stages of the scheme, by comparing items subjected to transport with others retained under controlled conditions or by using equivalent designs. Any known transport effect must be taken into account in the evaluation of performance, and any significant increase in uncertainty due to transport must be included in the uncertainty of the assigned value.

ISO 13528: homogeneity and stability in proficiency testing schemes

Why Annex B is critical in ISO 13528

From the perspective of a proficiency testing provider, Annex B is not a minor appendix. It is a key operational section to ensure that the scheme is truly assessing the participants and not the defects of the material being distributed.

In practice, this implies several important ideas:

  • item validation cannot be left to a general assumption that “it has always worked before”;
  • previous experience may be useful, but it must be justified and reviewed periodically;
  • the selection of the measurand for checking homogeneity or stability must be sensitive to possible sources of variation;
  • and the design of the study must be aligned with the real purpose of the PT scheme.

Put simply: good statistical analysis begins long before the report. It begins with how the item is prepared, checked and preserved.

2026 agenda: current programmes

As part of SHAPYPRO’s planning for 2026, several proficiency testing exercises are already scheduled. To consult the full calendar, the 2026 agenda can be downloaded at the following link: https://shapypro.com/2026-agenda/

The programmes currently in progress are:

  • EN1656P. aeruginosa
  • EN1657C. albicans
  • EN1276S. aureus
  • EN1650A. brasiliensis
  • EN13704B. cereus

Integrating this type of calendar into the PT participation strategy is important because participation is not only about choosing the right scheme in theory, but also about planning in time, allocating resources and ensuring that the laboratory can generate, review and use the results within its quality cycle.

How SHAPYPRO can help

In this context, SHAPYPRO can help laboratories and organisations better understand the technical framework of proficiency testing and review critical aspects of the design and interpretation of PT schemes. This includes not only the reading of results or the evaluation of performance, but also prior issues such as the suitability of the scheme, the preparation of the material and the technical robustness of the approach used to ensure item homogeneity and stability.

Conclusion

ISO 13528 provides the statistical support needed to design and evaluate proficiency testing schemes with solid and technically justified criteria. Within that set, Annex B stands out for addressing a basic but decisive issue: that PT items should be sufficiently homogeneous and stable for the evaluation of performance to be valid.

In a well-designed PT scheme, statistics should not be used to disguise problems with the material, but to interpret correctly the results obtained on reliable items. And that requires doing the work properly from the beginning, which is exactly the part people find hardest when they are in a hurry.

 

Subscribe to stay up to date with the latest news!

Leave a Reply

Your email address will not be published. Required fields are marked *