Bland–Altman plot

A Bland–Altman plot (difference plot) in analytical chemistry or biomedicine is a method of data plotting used in analyzing the agreement between two different assays. It is identical to a Tukey mean-difference plot,^[1] the name by which it is known in other fields, but was popularised in medical statistics by J. Martin Bland and Douglas G. Altman.^[2]^[3]

Construction

Consider a sample consisting of $n$ observations (for example, objects of unknown volume). Both assays (for example, different methods of volume measurement) are performed on each sample, resulting in $2n$ data points. Each of the $n$ samples is then represented on the graph by assigning the mean of the two measurements as the $x$ -value, and the difference between the two values as the $y$ -value.

The Cartesian coordinates of a given sample $S$ with values of $S_{1}$ and $S_{2}$ determined by the two assays is

S(x,y)=\left({\frac {S_{1}+S_{2}}{2}},S_{1}-S_{2}\right).

For comparing the dissimilarities between the two sets of samples independently from their mean values, it is more appropriate to look at the ratio of the pairs of measurements.^[4] Log transformation (base 2) of the measurements before the analysis will enable the standard approach to be used; so the plot will be given by the following equation:

S(x,y)=\left({\frac {\log _{2}S_{1}+\log _{2}S_{2}}{2}},\log _{2}S_{1}-\log _{2}S_{2}\right).

This version of the plot is used in MA plot.

Interpretation

Interpretation of a Bland-Altman plot is contingent on the construction of the plot and data at hand. Variations to the default plot have introduced throughout the years and each should be interpreted accordingly.^[5]

Original Construction

The original plot displays a scatter plot of differences between individual data points. The differences should be of the new reference system minus a gold standard.^[3] An average of the differences is plotted horizontally with limits of agreement plotted parallel to this mean difference line. The limits of agreement represent a confidence interval for which most of the differences lie between systems. The mean difference represents a general bias between the two systems; a positive mean difference indicates the reference system generally produces larger values relative to the golden standard, and a negative mean difference indicating the reference system generally produces lower values than the verified system.^[3] A mean difference closet to 0 indicates agreement between two systems, though the limits of agreement illustrate more nuance.

Limits of Agreement

Since the limits of agreement are by-default contingent on the standard deviation of the data, the distribution of the differences must follow a normal distribution. In the event that the distribution of differences are not normal, limits of agreement not contingent on normal distribution may be used instead. Bland and Altman's follow up paper on the topic explains that percentile of differences are a suitable replacement in such cases.^[4]

In any case, the limits of agreement more accurately illustrate the agreement between systems as opposed to just the mean difference. A novel reference system is said to be an appropriate substitute for a golden standard system if the limits of agreement are within a predetermined threshold. The threshold depends extensively on the magnitude of the data, the nature of the systems, and the contexts in which they are to be used.^[6]

The 95% limits of agreement can be unreliable estimates of the population parameters especially for small sample sizes so, when comparing methods or assessing repeatability, it is important to calculate confidence intervals for 95% limits of agreement. This can be done by Bland and Altman's approximate method ^[3] or by more precise methods.^[7]

Visualization Variations

In the case that the differences grow proportionally to the magnitude of the data, then the data is said to have a 'proportional bias'. There are many methods for visualizing the plot and subsequent analysis to accommodate for it.^[8]

Firstly, a linear regression could illustrate any relevant trends. If the distribution of differences are equal at all points around the regression the data is said to be homoscedastic and the trend is a simple proportional bias. Inversely, if the data has wider spread at different magnitudes of the data, then the differences are said to be heteroscedastic, which has further implications. Statistical tests such as the Breusch–Pagan test or the White test can provide statistical indicators of heteroscedasticity.

One typical example of a plot with heteroscedastic data is one whose variation of differences grows proportional to the magnitude of the data, visualized as an expanding 'v' shape.^[8] In such cases, it may be suitable to visualize the proportion of data points between systems as opposed to the raw differences.^[9] Similarly, the plot of differences could be visualized logarithmically.^[8] In either case, the relationship between the two systems illustrates a multiplicative relationship as opposed to linear one. This also indicates that the magnitude of the data correlates with variations of accuracy for the systems.

Application

One primary application of the Bland-Altman plot is to compare two clinical measurements that produce continuous output.^[10] It can be used to compare a new reference system, technique, or method with a verified gold standard, but a gold standard does not imply it to be without error.^[4]

In order for the plot to be used to verify a reference system, a threshold is typically predetermined for which the limits of agreement must fall under. The value for the threshold is contingent on a myriad of contexts in which the systems and data exist within.^[6]

The ability to verify a reference system lends the plot to a broad applicability and prominence across many fields. Over the years, it has gained prominence in Optometry, nutritional science, radiology, environmental sciences, surgery, medicine, veterinary medicine, engineering, and psychology, to name a few.^[6]^[11]^[12]^[13]^[14]^[15] Many recommendations and scholarly articles have also been published in efforts of polishing the technique, the underlying statistical construction, and validity of the plot.^[16]^[17]

See Analyse-it, MedCalc, NCSS, GraphPad Prism, R, StatsDirect, or JASP for software providing Bland–Altman plots.

Notes

A similar method was proposed in 1981 by Eksborg.^[18] This method was based on Deming regression—a method introduced by Adcock in 1878.

Bland and Altman's Lancet paper ^[3] was number 29 in a list of the top 100 most-cited papers of all time with over 23,000 citations.^[19]

References

^ Cleveland WS (1993). Visualizing data. Murray Hill, N.J.: At & T Bell Laboratories. pp. 22–23. ISBN 978-0963488404. OCLC 29456028.
^ Altman DG, Bland JM (1983). "Measurement in medicine: the analysis of method comparison studies". The Statistician. 32 (3): 307–317. doi:10.2307/2987937. JSTOR 2987937.
^ ^a ^b ^c ^d ^e Bland JM, Altman DG (1986). "Statistical methods for assessing agreement between two methods of clinical measurement" (PDF). Lancet. 327 (8476): 307–10. CiteSeerX 10.1.1.587.8931. doi:10.1016/S0140-6736(86)90837-8. PMID 2868172. S2CID 2844897.
^ ^a ^b ^c Bland JM, Altman DG (1999). "Measuring agreement in method comparison studies". Statistical Methods in Medical Research. 8 (2): 135–60. doi:10.1177/096228029900800204. PMID 10501650. S2CID 9851097.
^ Ludbrook, John (2010-01-19). "Confidence in Altman–Bland plots: A critical review of the method of differences". Clinical and Experimental Pharmacology and Physiology. 37 (2): 143–149. doi:10.1111/j.1440-1681.2009.05288.x. ISSN 0305-1870.
^ ^a ^b ^c Zaki, Rafdzah; Bulgiba, Awang; Ismail, Roshidi; Ismail, Noor Azina (2012-05-25). "Statistical Methods Used to Test for Agreement of Medical Instruments Measuring Continuous Variables in Method Comparison Studies: A Systematic Review". PLOS ONE. 7 (5): e37908. Bibcode:2012PLoSO...737908Z. doi:10.1371/journal.pone.0037908. ISSN 1932-6203.
^ Carkeet A (2015). "Exact parametric confidence intervals for Bland–Altman Limits of Agreement" (PDF). Optometry and Vision Science. 92 (3): e71 – e80. doi:10.1097/OPX.0000000000000513. PMID 25650900. S2CID 11643889.
^ ^a ^b ^c Ludbrook, John (2010-01-19). "Confidence in Altman–Bland plots: A critical review of the method of differences". Clinical and Experimental Pharmacology and Physiology. 37 (2): 143–149. doi:10.1111/j.1440-1681.2009.05288.x. ISSN 0305-1870.
^ Giavarina, Davide (2015). "Understanding Bland Altman analysis". Biochemia Medica. 25 (2): 141–151. doi:10.11613/bm.2015.015. ISSN 1846-7482. PMC 4470095. PMID 26110027.
^ Hanneman SK (2008). "Design, analysis, and interpretation of method-comparison studies". AACN Advanced Critical Care. 19 (2): 223–234. doi:10.1097/01.AACN.0000318125.41512.a3. PMC 2944826. PMID 18560291.
^ Carkeet, Andrew (January 2020). "A Review of the Use of Confidence Intervals for Bland-Altman Limits of Agreement in Optometry and Vision Science". Optometry and Vision Science. 97 (1): 3–8. doi:10.1097/opx.0000000000001465. ISSN 1538-9235. PMID 31895271.
^ Moore, A Russell (2023-08-24). "A review of <scp>Bland–Altman</scp> difference plot analysis in the veterinary clinical pathology laboratory". Veterinary Clinical Pathology. 53 (S1): 75–85. doi:10.1111/vcp.13293. ISSN 0275-6382. PMID 37620637.
^ "Figure 2: Bland-Altman plots for the inter-rater reliability of neck circumference measurement". doi:10.7717/peerj.16816/fig-2. {{cite web}}: Missing or empty |url= (help)
^ Nieuwenhuijsen, Mark (June 2015). "A71 Variability in and agreement between modelled and personal continuously measured black carbon levels using novel smartphone and sensor technologies". Journal of Transport & Health. 2 (2): S42. Bibcode:2015JTHea...2S..42N. doi:10.1016/j.jth.2015.04.559. ISSN 2214-1405.
^ Haghayegh, Shahab; Kang, Hyeon-Ah; Khoshnevis, Sepideh; Smolensky, Michael H.; Diller, Kenneth R. (2020). (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 Citoid/WMF (mailto:noc@wikimedia.org)&ssu=&ssv=&ssw=&ssx=eyJfX3V6bWYiOiI3ZjYwMDBmZjM4OGJjYi0xZmU5LTQ1ZjItYTBiYi04ZmI3MWQ4NTcyOTQxNzQ2MDQ4MjU0MDQ5MC0zOGZkYTNmYzZkNmEzYmU5MTAiLCJyZCI6ImlvcC5vcmciLCJ1em14IjoiN2Y5MDAwZTgwY2I3ZWQtZThkNy00YWZhLTkxMWUtMTg3Y2M2ZTZkMDA1MS0xNzQ2MDQ4MjU0MDQ5MC00ZTliY2Y5ZWU5N2IyZjI0MTAifQ== "A comprehensive guideline for Bland–Altman and intra class correlation calculations to properly compare two methods of measurement and interpret findings". Physiological Measurement. 41 (5). Bibcode:2020PhyM...41e5012H. doi:10.1088/1361-6579/ab86d6. PMID 32252039. {{cite journal}}: Check |url= value (help)
^ Olofsen, Erik; Dahan, Albert; Borsboom, Gerard; Drummond, Gordon (2015-02-01). "Improvements in the application and reporting of advanced Bland–Altman methods of comparison". Journal of Clinical Monitoring and Computing. 29 (1): 127–139. doi:10.1007/s10877-014-9577-3. ISSN 1573-2614.
^ Oke, Gerke (May 2020). "Reporting Standards for a Bland–Altman Agreement Analysis: A Review of Methodological Reviews". Diagnostics. 10 (5). doi:10.3390/diagno (inactive 1 May 2025). ISSN 2075-4418. Archived from the original on 2025-03-20.{{cite journal}}: CS1 maint: DOI inactive as of May 2025 (link)
^ Eksborg S (1981) Evaluation of method-comparison data. Clin Chem 27:1311–1312
^ Van Noorden R, Maher B, Nuzzo R (2014). "The top 100 papers". Nature. 514 (7524): 550–553. Bibcode:2014Natur.514..550V. doi:10.1038/514550a. ISSN 0028-0836. PMID 25355343.

[1] Cleveland WS (1993). Visualizing data. Murray Hill, N.J.: At & T Bell Laboratories. pp. 22–23. ISBN 978-0963488404. OCLC 29456028.

[Altman1983-2] Altman DG, Bland JM (1983). "Measurement in medicine: the analysis of method comparison studies". The Statistician. 32 (3): 307–317. doi:10.2307/2987937. JSTOR 2987937.

[Bland1986-3] Bland JM, Altman DG (1986). "Statistical methods for assessing agreement between two methods of clinical measurement" (PDF). Lancet. 327 (8476): 307–10. CiteSeerX 10.1.1.587.8931. doi:10.1016/S0140-6736(86)90837-8. PMID 2868172. S2CID 2844897.

[Bland1999-4] Bland JM, Altman DG (1999). "Measuring agreement in method comparison studies". Statistical Methods in Medical Research. 8 (2): 135–60. doi:10.1177/096228029900800204. PMID 10501650. S2CID 9851097.

[5] Ludbrook, John (2010-01-19). "Confidence in Altman–Bland plots: A critical review of the method of differences". Clinical and Experimental Pharmacology and Physiology. 37 (2): 143–149. doi:10.1111/j.1440-1681.2009.05288.x. ISSN 0305-1870.

[:0-6] Zaki, Rafdzah; Bulgiba, Awang; Ismail, Roshidi; Ismail, Noor Azina (2012-05-25). "Statistical Methods Used to Test for Agreement of Medical Instruments Measuring Continuous Variables in Method Comparison Studies: A Systematic Review". PLOS ONE. 7 (5): e37908. Bibcode:2012PLoSO...737908Z. doi:10.1371/journal.pone.0037908. ISSN 1932-6203.

[Carkeet2015-7] Carkeet A (2015). "Exact parametric confidence intervals for Bland–Altman Limits of Agreement" (PDF). Optometry and Vision Science. 92 (3): e71 – e80. doi:10.1097/OPX.0000000000000513. PMID 25650900. S2CID 11643889.

[:1-8] Ludbrook, John (2010-01-19). "Confidence in Altman–Bland plots: A critical review of the method of differences". Clinical and Experimental Pharmacology and Physiology. 37 (2): 143–149. doi:10.1111/j.1440-1681.2009.05288.x. ISSN 0305-1870.

[9] Giavarina, Davide (2015). "Understanding Bland Altman analysis". Biochemia Medica. 25 (2): 141–151. doi:10.11613/bm.2015.015. ISSN 1846-7482. PMC 4470095. PMID 26110027.

[Hanneman2008-10] Hanneman SK (2008). "Design, analysis, and interpretation of method-comparison studies". AACN Advanced Critical Care. 19 (2): 223–234. doi:10.1097/01.AACN.0000318125.41512.a3. PMC 2944826. PMID 18560291.

[11] Carkeet, Andrew (January 2020). "A Review of the Use of Confidence Intervals for Bland-Altman Limits of Agreement in Optometry and Vision Science". Optometry and Vision Science. 97 (1): 3–8. doi:10.1097/opx.0000000000001465. ISSN 1538-9235. PMID 31895271.

[12] Moore, A Russell (2023-08-24). "A review of <scp>Bland–Altman</scp> difference plot analysis in the veterinary clinical pathology laboratory". Veterinary Clinical Pathology. 53 (S1): 75–85. doi:10.1111/vcp.13293. ISSN 0275-6382. PMID 37620637.

[13] "Figure 2: Bland-Altman plots for the inter-rater reliability of neck circumference measurement". doi:10.7717/peerj.16816/fig-2. {{cite web}}: Missing or empty |url= (help)

[14] Nieuwenhuijsen, Mark (June 2015). "A71 Variability in and agreement between modelled and personal continuously measured black carbon levels using novel smartphone and sensor technologies". Journal of Transport & Health. 2 (2): S42. Bibcode:2015JTHea...2S..42N. doi:10.1016/j.jth.2015.04.559. ISSN 2214-1405.

[15] Haghayegh, Shahab; Kang, Hyeon-Ah; Khoshnevis, Sepideh; Smolensky, Michael H.; Diller, Kenneth R. (2020). (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 Citoid/WMF (mailto:noc@wikimedia.org)&ssu=&ssv=&ssw=&ssx=eyJfX3V6bWYiOiI3ZjYwMDBmZjM4OGJjYi0xZmU5LTQ1ZjItYTBiYi04ZmI3MWQ4NTcyOTQxNzQ2MDQ4MjU0MDQ5MC0zOGZkYTNmYzZkNmEzYmU5MTAiLCJyZCI6ImlvcC5vcmciLCJ1em14IjoiN2Y5MDAwZTgwY2I3ZWQtZThkNy00YWZhLTkxMWUtMTg3Y2M2ZTZkMDA1MS0xNzQ2MDQ4MjU0MDQ5MC00ZTliY2Y5ZWU5N2IyZjI0MTAifQ== "A comprehensive guideline for Bland–Altman and intra class correlation calculations to properly compare two methods of measurement and interpret findings". Physiological Measurement. 41 (5). Bibcode:2020PhyM...41e5012H. doi:10.1088/1361-6579/ab86d6. PMID 32252039. {{cite journal}}: Check |url= value (help)

[16] Olofsen, Erik; Dahan, Albert; Borsboom, Gerard; Drummond, Gordon (2015-02-01). "Improvements in the application and reporting of advanced Bland–Altman methods of comparison". Journal of Clinical Monitoring and Computing. 29 (1): 127–139. doi:10.1007/s10877-014-9577-3. ISSN 1573-2614.

[17] Oke, Gerke (May 2020). "Reporting Standards for a Bland–Altman Agreement Analysis: A Review of Methodological Reviews". Diagnostics. 10 (5). doi:10.3390/diagno (inactive 1 May 2025). ISSN 2075-4418. Archived from the original on 2025-03-20.{{cite journal}}: CS1 maint: DOI inactive as of May 2025 (link)

[Eksborg1981-18] Eksborg S (1981) Evaluation of method-comparison data. Clin Chem 27:1311–1312

[Van_NoordenMaher2014-19] Van Noorden R, Maher B, Nuzzo R (2014). "The top 100 papers". Nature. 514 (7524): 550–553. Bibcode:2014Natur.514..550V. doi:10.1038/514550a. ISSN 0028-0836. PMID 25355343.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]