Reliability of thermal quantitative sensory testing: a systematic review.
Article Type: Report
Subject: Somatosensory system (Testing)
Neurologic examination (Research)
Neurologic examination (Quality management)
Authors: Moloney, Niamh A.
Hall, Toby M.
Doody, Catherine M.
Pub Date: 02/01/2012
Publication: Name: Journal of Rehabilitation Research & Development Publisher: Department of Veterans Affairs Audience: Academic Format: Magazine/Journal Subject: Health Copyright: COPYRIGHT 2012 Department of Veterans Affairs ISSN: 0748-7711
Issue: Date: Feb, 2012 Source Volume: 49 Source Issue: 2
Topic: Event Code: 330 Product information; 310 Science & research; 353 Product quality
Accession Number: 288538326
Full Text: INTRODUCTION

In recent years, understanding pain mechanisms among patient populations has become a key focus of many clinical and research groups. In conjunction with this, quantitative sensory testing (QST) has seen increasing use in areas such as musculoskeletal and neuropathic pain for profiling somatosensory phenotypes [1-5] and as an outcome measure in intervention studies [6-7]. Profiling patients using QST involves analyzing multiple parameters of sensory testing to determine whether patients demonstrate dominant features of sensory deficit or sensory hyperexcitability [5,8]. It is thought that this will further the understanding of pain mechanisms and the development or application of more appropriate interventions [8].

QST is a psychophysical means of assessing the function of small and large diameter nerve fibers and their respective pathways [9]. A number of different modalities can be assessed using QST, including vibration, pressure pain thresholds, and thermal thresholds. Thermal thresholds include cold detection threshold (CDT), warm detection threshold (WDT), cold pain threshold (CPT), and heat pain threshold (HPT) [9]. As a psychophysical test, QST is not objective, and consistency in QST data relies heavily on environmental factors, such as ambient temperature and noise; methodological factors, such as test protocol, test application, and test instructions; and the cooperation and attention of the individual being tested [10]. The two primary methods employed in the assessment of thermal QST are the method of limits (MLI) and the method of levels (MLE). The MLI is a reaction-time inclusive method, whereby the applied stimulus increases gradually at a preset rate from the baseline temperature. Participants are then asked to depress a switch when they (1) perceive a change in temperature for detection thresholds or (2) perceive the sensation as painful for pain thresholds. The MLE, sometimes referred to as the forced-choice method, is a reaction-time exclusive method. A set temperature is applied, and the participant is requested to give a "yes" or "no" response on whether or not he or she perceived the sensation. If the participant answers yes, then the temperature is reduced; if he or she answers no, the temperature is increased. This procedure is repeated until the threshold is identified. The staircase method is a variation of the MLI [11].

For any measure to be clinically useful or sufficiently robust for research purposes, it must be reliable [12]. Reliability refers to the consistency of a measurement across time, patients, or observers and the degree to which measurements are free from error [12]. Adequate reliability of a measurement is imperative for clinical decision-making [13]. Reliability of QST also has important consequences for accurate patient profiling. However, a previous literature review of reliability in QST found notable variability in methodology, statistical analyses, and results among the reviewed studies [10]. The use of QST and the body of work in relation to reliability of thermal QST has grown substantially since Chong and Cros' 2004 review [10]. Therefore, the aim of this article is to systematically review the literature (from January 1990 to May 2010) to determine the level of reliability in thermal QST.

METHODS

Search and Selection

We developed an electronic search strategy through author consensus with a medical librarian and performed it within the following databases: PubMed, Embase, CINAHL (Cumulative Index to Nursing and Allied Health Literature), Web of Science, Science Direct, and Cochrane Library Reviews (covering the period from January 1990 to May 2010). Where possible, we used key words to identify relevant MeSH (medical subject headings) that we then exploded. To gain a list of potentially relevant papers, we combined the QST key words using "or." We repeated this strategy for the reliability key words. To identify papers on reliability in thermal QST, we combined the two groups of key words using "and." Table 1 displays the search strategies used for PubMed and Embase. We adapted the search for the other databases using combinations of the search terms outlined in Table 1. We subsequently hand-searched reference lists from retrieved articles for supplementary studies. Articles were eligible for the review if they fulfilled the criteria outlined in Figure 1.

Reviewer 1 (Ms. Moloney) initially screened titles of articles and article abstracts and only included those that mentioned reliability and/or reproducibility of thermal QST in the review. Two reviewers (Ms. Moloney and Dr. Doody) independently reviewed titles and abstracts to ensure that articles met the inclusion criteria. Where uncertainty arose regarding the eligibility of an article from its abstract, we retrieved the full-text version of the article and evaluated it against the inclusion criteria. We discussed disagreements and achieved consensus for all articles to be included. We then retrieved full-text versions of the studies to be included in the review for quality assessment and data extraction. The reviewers consisted of two physiotherapists (one PhD student and one college lecturer), each with at least 12 years of postgraduate experience.

Quality Assessment

We used a recently devised data extraction form, the Quality Appraisal for Reliability Studies (QAREL) [14], to extract and record data. We then completed a QAREL checklist to facilitate a quality appraisal of the studies using the guidelines suggested by Lucas et al. [14] (Figure 2 and Table 2). Using the standard of Van Trijffel et al. [15], we considered studies to be of high quality if they received a yes score on at least 50 percent of relevant checklist items. As there are 11 items on the checklist, we required a minimum of six yes answers for the study to achieve high quality status. Alternatively, if we considered some checklist items inapplicable for that type of study, we required a yes score on at least 50 percent of the remaining relevant items. The two primary reviewers conducted quality assessment independently using the QAREL checklist. They discussed disagreements, and in all cases, reached a consensus. We obtained statistical advice from a biostatistician regarding the appropriateness of the last item on the QAREL checklist, i.e., statistical measures and their interpretation.

Data Extraction

The two reviewers independently extracted data from the original studies using the QAREL data extraction form, which includes publication details, type of study, subject and observer characteristics, inclusion and exclusion criteria, blinding, randomization, considerations of stability of measure, interpretation of data, and statistical analysis methods [14]. In addition, we extracted and assessed specific data pertaining to the methodological issues, e.g., control of environmental factors.

QST output, measured in degrees Celsius, is continuous in nature; thus, intraclass correlation coefficients (ICCs) and coefficients of variation (CVs) are the most commonly used estimates of reliability. In the absence of an accepted standard for the qualitative interpretation of ICC values, we used the interpretation of ICC values by Shrout and Fleiss [16], whereby <0.4 is considered poor agreement, 0.40 to 0.59 is fair, 0.60 to 0.75 is good, and >0.75 is excellent. Note that reliability estimates such as the ICC can be difficult to interpret in the context of an individual score, and as such, an estimate of precision (e.g., standard error of measurement) is important for judging about the degree that measurements vary for an individual [14,17-18].

RESULTS

Search Strategy Yield

The initial search yielded 2,214 references, of which reviewer 1 removed 2,124 irrelevant and duplicate articles (Figure 3). Both reviewers reviewed the titles and abstracts of the remaining 90 articles. Of these, we dismissed 63 articles based on the inclusion and exclusion criteria. We retrieved a further 8 articles after hand-searching the remaining 27 articles. Both reviewers reviewed a total of 35 articles in full. Following the review of the complete articles, we excluded a further 14 articles because they did not meet the inclusion criteria. Consequently, we included 21 total articles in the review.

Quality Assessment and Data Extraction

Of the 21 studies included in the review, we deemed that only 5 studies illustrated high quality using the criteria outlined [19-23]. On closer examination of the QAREL checklist results (Table 2), it is clear that the majority of studies investigated asymptomatic cohorts and few studies described the examiners, which limits the external validity of these studies. Furthermore, recruitment strategies for both study subjects and examiners were poorly outlined. With respect to internal validity, details concerning the blinding of examiners and randomization of the test procedures were the main weaknesses, with the majority of scores interpreted as "unclear." Of the five studies that we deemed high quality, one study presented statistics that we did not deem comprehensive [19], i.e., ICC values only, without measures of precision or sufficient raw data [14].

Narrative Analysis

For the purpose of this review, we present a narrative analysis of the results. Meta-analysis was not possible because of the variation in study quality and statistical methods used across studies. We reviewed the various aspects of the studies under three sections: (1) type of study, sample, and raters; (2) methodological issues; (3) statistical analysis; and (4) results. Table 3 presents details of methods, statistical analyses, and results.

Type of Study, Sample, and Raters

Eight studies investigated intrarater reliability [11,21-22,24-28]. Of those, three also assessed interrater reliability [22,24,26]. Six studies investigated test-retest reliability [23,29-33], but the remaining seven studies did not specify which type of reliability they intended to assess [25,28,34-38]. Of the 21 studies we reviewed, 11 provided details of the raters. In most studies, details of the raters is limited to "the authors of the papers" [23-24,27,31] or brief information such as "a single technician or one of a number of observers" [11,28,32-33]. Of the three remaining studies, the raters were (1) individuals trained and certified by the Central Reading and Coordinating Center, Department of Neurology, University of Pennsylvania (Philadelphia, Pennsylvania) [34]; (2) neurologists [19]; and (3) authors trained by the German Research Network on Neuropathic Pain (DFNS) [22].

Of the 21 articles we reviewed, 14 investigated non-disabled populations, 7 investigated people with diabetes with and without associated neuropathy [19,28,30,33-36], 2 investigated people with spinal cord injuries [29,37] (one with neuropathic pain [29]), and 3 investigated people with diffuse pain syndromes and/or musculoskeletal disorders [21,23,38].

[FIGURE 3 OMITTED]

Methodological Issues

Modality assessed. CDT and WDT were the most commonly investigated, and all but two of the total articles reviewed examined the reliability of thermal detection thresholds. Seven studies included the assessment of thermal pain thresholds [20,22,26,29-30,37-38], while Yarnitsky et al. [32] assessed HPT alone and Wasner and Brock [31] reported on reliability of HPT and CPT. The MLI was the more common assessment method of choice studied, with nine studies having studied the MLI alone [20,22-25,29,31,37-38] while five studies assessed or compared MLI and MLE [11,21,25,27,32]. The MLE was used in seven studies [19,26,28,30,33-35].

Interval. We found the time between successive testing sessions to vary between studies. Testing intervals ranged from 2 days to 1 month, with many studies allowing some variation within this, e.g., "within a 7 day period" [24] or "3 weeks or longer" [31].

Environmental factors and instructions. Environmental factors reported in the studies we reviewed included standardizing room temperature, controlling noise and distractions, and recording skin temperature. Eight of the studies described how they controlled environmental factors [11,21,24,26,29,31-33], while two studies stated that they used the protocol described by the DFNS [29-30]. The remainder did not provide either sufficient or any details about environmental factors. Regarding instructions, 10 studies either described their instructions or stated that they used standardized instructions [20-22,24,26-27,30-32,38]; the remaining 11 did not specify.

Blinding. Blinding may incorporate blinding of testers to their previous results, to the results of other testers, and to clinical information that may influence their testing. Surprisingly, 18 of the 21 studies did not specify any details relating to blinding within their design. In the remaining studies, aspects of blinding were conducted in two studies [19,22] while the remaining study by Pigg et al. [22] was the only article we deemed to have reported appropriate blinding.

Randomization. The testing order was randomized in four studies and fixed in a further four studies that described the order. The remaining studies did not describe whether they controlled the order of testing. The sequence of examiner was randomized in one study and fixed in another. It was not relevant in two studies and not known whether relevant or not in the remainder of the studies, because the type of reliability study being performed was not clearly stated. The side to be tested was randomly assigned in five studies and not discussed in the remainder of the articles.

Statistical Analysis

We found large variation in the statistical methods used to analyze the data in these studies. Lucas et al. have recommended that appropriate statistical analysis should utilize a measure of reliability (e.g., ICC) as well as a measure of precision or stability (e.g., 95% confidence intervals or standard error of measurement) [14]. If studies achieved these criteria, a yes score was given on QAREL checklist item 11. We categorized six studies as using appropriate statistical analysis but with insufficient detail provided to truly determine reliability [19,25,27-28,30-31]; for example, measures of precision were not provided in conjunction with the reliability, or ICC or insufficient actual raw data were provided [19,30-31]. The ICC was the most common estimate of reliability used (9 studies) [19,22-24,29-31,34,37]. The coefficient of reliability was the second most frequently used statistical test, with Bland-Altman limits of agreement, CV, and repeatability r being the other measures used.

Results: Estimates of Reliability

Table 3 presents the actual results and estimates of reliability, which are divided into three sections according to the method used. Given the variability of actual results for all parameters, it is difficult to draw definite conclusions regarding reliability. We found no observable difference in reliability between the MLE and the MLI. Analysis of reliability of the five studies with high methodological quality indicated considerable variability [19-23]. CDT and WDT ranged from poor to excellent (Dyck et al.: excellent [19]; Heldestad et al.: good for CDT and excellent for WDT [20]; Kemler et al.: good at the wrist site and poor elsewhere for CDT and WDT [21]; Pigg et al.: poor to excellent for CDT, poor to good for WDT [22]; Zwart and Sand: fair for both CDT and WDT, fair to excellent for CPT and HPT [23]; Held estad et al.: excellent for CPT and good for HPT [20]; Pigg et al.: fair to excellent for both CPT and HPT [22]). When we included the remaining studies, the reliability of CDT ranged from poor to excellent, with the majority of studies indicating fair reliability; WDT ranged from poor to excellent, with the majority of studies indicating fair reliability; CPT was divided between fair and good; and finally, HPT was also divided between fair and good, with one study finding poor reliability.

DISCUSSION

Type of Study, Sample, and Raters

Of the 21 studies we included, only 3 provided adequate information about the raters [19-20,34], while 13 either assessed test-retest reliability or did not state which type of reliability they investigated. This affects the external validity of these studies because the study design does not take into account the possible influence of the rater on the test outcome. Indeed, Becser et al. reported some differences between raters and noted that using more than one rater may introduce bias in a measurement [24]. It has also been previously highlighted that standardized methods of assessment and attention to training are important factors in achieving consistent results in QST [34]. Appropriate training in conducting QST is warranted, and indeed, groups such as the DFNS have developed standardized training days for testers. As such, further information on the raters, their relevant training, and where they conducted the study would allow for better appraisal of the broader applicability of reliability studies.

External validity of the studies we reviewed is also limited by the number of reliability studies that were conducted on nondisabled populations. It appears that reliability in nondisabled participants (n = 14) and people with diabetes (n = 7) have been thoroughly investigated to date. However, this is not the case for musculoskeletal pain, which has been less well investigated despite the growing use of QST in profiling patients with these conditions, e.g., whiplash [1-3], patellofemoral pain syndrome [39], and low back pain [4]. Therefore, further studies on the reliability of QST in populations with musculoskeletal pain are warranted. At least two such studies are underway [40-41].

Methodological Issues

It is clear from this review that the majority of studies favored the use of the MLI (n = 14). This may be partly explained by the greater time involved in using the MLE. Despite the evidence for greater accuracy and reliability with the MLE in two studies [21,27], analysis of all the studies included in this review suggests that the MLI and the MLE demonstrate comparable reliability. In fact, studies using the MLE provided inadequate information regarding analysis and results more frequently (4 out of 7 studies [19,28,30,35]) than those using the MLI alone (2 out of 9 studies [24,31]). Estimates of reliability in studies that provided inadequate information must be interpreted with caution.

Reliability studies of thermal detection thresholds have been explored to a greater extent than thermal pain thresholds, and as such, the reliability of thermal pain thresholds is less well established. Regarding environmental factors and standardization of instructions to subjects, note that these are important components of reliability in QST [10]. Environmental factors may include standardizing room temperature, controlling noise and distractions, and recording skin temperature. With this in mind, studies that controlled environmental factors and issued standardized instructions would be deemed more reliable than those that did not, although this was not clearly reflected in the actual estimates of reliability. Of all methodological factors of importance in a reliability study, blinding appears to be one that has been most poorly described in the studies in this review. Only four of the studies included in this review outlined any blinding procedures, and of those, we only deemed one to demonstrate appropriate blinding according to the QAREL checklist [22]. The importance of blinding in a reliability study is highlighted by Lucas et al. in their development of the QAREL checklist, in which 5 of the 11 items in the QAREL checklist pertain to blinding [14].

It is also clear from this review that consideration of the test and examiner sequence varies considerably. Randomization was consistently poorly described throughout the articles reviewed. In this review, only three studies indicate that they randomized their test protocol [20,25,27], while four studies fixed it [21,24,30,37]. The remaining 14 studies did not reference any form of randomization.

We did not find a consistent association between the interval between sessions and the estimate of reliability. Indeed, Wasner and Brock demonstrated better ICC estimates for CPT between days 0 and 1 versus days 0 and 21, but poorer ICC estimates for HPT between days 0 and 1 versus days 0 and 21 [31].

In summary, the main areas of methodology that warrant greater attention in future studies are descriptions of the raters and their training, blinding, and randomization. In addition, standardization of test protocols, environmental factors, and instructions are also important factors to consider. Regarding the study population, future studies of patient populations, particularly those with painful conditions, are warranted so that the subject samples are representative of those who would typically be undergoing QST in clinical or research settings.

Statistical Analysis and Results

The range of statistical measures used in reliability studies limits the ability to perform meta-analysis on the data. Differences in statistical methods probably reflect changing trends within statistics. It has been suggested that ICCs are the most appropriate measures, but only if they are presented in conjunction with a measure of precision, e.g., standard error of measurement [14,17-18]. However, it can be argued that when the data demonstrate large or small variation, the ICC may over--or underestimate reliability, respectively [12]. It has been suggested that the presentation of sufficient data alongside measures of reliability and precision allows a more accurate analysis of the data by the reader. This has been demonstrated by more recent studies, such as Pigg et al. [22].

As discussed earlier in the "Results" section, no one thermal QST parameter demonstrated consistent estimates of reliability across studies. Therefore, it is difficult to draw definite conclusions about reliability. While at risk of being oversimplistic, the summary of the results suggests that the reliability is fair for CDT and WDT and fair to good for CPT and HPT.

Limitations

We acknowledge a number of limitations to this review. We only included published studies, and as such, results from this review may overestimate reliability. Furthermore, we only included studies published in English. Finally, we only included studies if they were published within the last two decades.

CONCLUSIONS

The methodological quality of research investigating the reliability of thermal QST could adhere to more rigorous guidelines as suggested by the QAREL checklist, particularly in relation to incorporating appropriate blinding procedures into the design. Further studies investigating reliability of QST in populations with pain are warranted. The results of this review found that the reliability of thermal QST varied considerably. CDT and WDT were found to have fair reliability. CPT and HPT demonstrated good reliability in high quality studies and varied from fair to good reliability in the other studies. We found no difference in reliability between studies using the MLI and those using the MLE. The reliability of thermal pain thresholds is less well established than thermal detection thresholds.

Abbreviations: CDT = cold detection threshold, CPT = cold pain threshold, CV = coefficient of variation, DFNS = German Research Network on Neuropathic Pain, HPT = heat pain threshold, ICC = intraclass correlation coefficient, MLE = method of levels, MLI = method of limits, QAREL = Quality Appraisal for Reliability Studies, QST = quantitative sensory testing, WDT = warm detection threshold.

JRRD at a Glance

Quantitative sensory testing (QST) is a method of assessing the function of nerves. It is used to assess medical conditions such as diabetes (for screening diabetic neuropathy) and spinal cord injuries (for assessing nerve function). Recently, QST has also been used to assess conditions such as referred pain from the spine and whiplash. It is thought that results from QST will lead to more accurate diagnoses and treatments. The reliability of QST is not fully established. This review evaluates the current level of reliability of thermal QST and highlights areas of QST protocols in need of greater attention.

ACKNOWLEDGMENTS

Author Contributions:

Study concept and design: N. A. Moloney, T. M. Hall, C. M. Doody. Acquisition of data: N. A. Moloney.

Analysis and interpretation of data: N. A. Moloney, C. M. Doody. Drafting of manuscript: N. A. Moloney.

Critical revision of manuscript for important intellectual content:

C. M. Doody, T. M. Hall.

Narrative analysis: N. A. Moloney, C. M. Doody, T. M. Hall. Study supervision: C. M. Doody, T. M. Hall.

Financial Disclosures: The authors have declared that no competing interests exist.

Funding/Support: This material was based on work supported by the Irish Research Council for Science, Engineering and Technology Embark Initiative (grant 1853271).

Additional Contributions: The authors would like to thank Dr. Catherine Blake for her advice regarding the interpretation of statistical methods and results in this review.

REFERENCES

[1.] Chien A, Eliav E, Sterling M. Hypoesthesia occurs in acute whiplash irrespective of pain and disability levels and the presence of sensory hypersensitivity. Clin J Pain. 2008; 24(9):759-66. [PMID: 18936593] http://dx.doi.org/10.1097/AJP.0b013e3181773b95

[2.] Chien A, Eliav E, Sterling M. Whiplash (grade II) and cervical radiculopathy share a similar sensory presentation: An investigation using quantitative sensory testing. Clin J Pain. 2008;24(7):595-603. [PMID: 18716498] http://dx.doi.org/10.1097/AJP.0b013e31816ed4fc

[3.] Chien A, Eliav E, Sterling M. Hypoaesthesia occurs with sensory hypersensitivity in chronic whiplash--Further evidence of a neuropathic condition. Man Ther. 2009;14(2): 138-46. [PMID: 18294899] http://dx.doi.org/10.1016/j.math.2007.12.004

[4.] Freynhagan R, Rolke R, Baron R, Tolle TR, Rutjes AK, Schu S, Treede RD. Pseudoradicular and radicular low-back pain--A disease continuum rather than different entities? Answers from quantitative sensory testing. Pain. 2008; 135(1-2):65-74. [PMID: 17570589] http://dx.doi.org/10.1016/j.pain.2007.05.004

[5.] Maier C, Baron R, Tolle TR, Binder A, Birbaumer N, Birklein F, Gierthmuhlen J, Flor H, Geber C, Huge V, Krumova EK, Landwehrmeyer GB, Magerl W, Maihofner C, Richter H, Rolke R, Scherens A, Schwarz A, Sommer C, Tronnier V, Uceyler N, Valet M, Wasner G, Treede RD. Quantitative sensory testing in the German Research Network on Neuropathic Pain (DFNS): Somatosensory abnormalities in 1236 patients with difference neuropathic pain syndromes. Pain. 2010;150(3):439-50. [PMID: 20627413] http://dx.doi.org/10.1016/j.pain.2010.05.002

[6.] George SZ, Bishop MD, Bialosky JE, Zeppieri G Jr, Robinson ME. Immediate effects of spinal manipulation on thermal pain sensitivity: An experimental study. BMC Musculoskelet Disord. 2006;7:68. [PMID: 16911795] http://dx.doi.org/10.1186/1471-2474-7-68

[7.] Sterling M, Pedler A, Chan C, Puglisi M, Vuvan V, Vicenzino B. Cervical lateral glide increases nociceptive flexion reflex threshold but not pressure or thermal pain thresholds in chronic whiplash associated disorders: A pilot randomised controlled trial. Man Ther. 2010;15(2):149-53. [PMID: 19884037] http://dx.doi.org/10.10167j.math.2009.09.004

[8.] Rolke R, Baron R, Maier C, Tolle TR, Treede RD, Beyer A, Binder A, Birbaumer N, Birklein F, Botefur IC, Braune S, Flor H, Huge V, Klug R, Landwehrmeyer GB, Magerl W, Maihofner C, Rolko C, Schaub C, Scherens A, Sprenger T, Valet M, Wasserka B. Quantitative sensory testing in the German Research Network on Neuropathic Pain (DFNS): Standardized protocol and reference values. Pain. 2006;123(3): 231-43. [PMID: 16697110] http://dx.doi.org/10.1016/j.pain.2006.01.041

[9.] Hansson P, Backonja M, Bouhassira D. Usefulness and limitations of quantitative sensory testing: Clinical and research application in neuropathic pain states. Pain. 2007;129(3): 256-59. [PMID: 17451879] http://dx.doi.org/10.1016/j.pain.2007.03.030

[10.] Chong PS, Cros DP. Technology literature review: Quantitative sensory testing. Muscle Nerve. 2004;29(5):734-47. [PMID: 15116380] http://dx.doi.org/10.1002/mus.20053

[11.] Yarnitsky D, Sprecher E. Thermal testing: Normative data and repeatability for various test algorithms. J Neurol Sci. 1994;125(1):39-45. [PMID: 7964887] http://dx.doi.org/10.1016/0022-510X(94)90239-9

[12.] Portney LG, Watkins MP. Foundations of clinical research: Applications to practice. 2nd ed. Upper Saddle River (NJ): Prentice Hall Health; 2000.

[13.] Costa LO, Maher CG, Latimer J, Smeets RJ. Reproducibility of rehabilitative ultrasound imaging for the measurement of abdominal muscle activity: A systematic review. Phys Ther. 2009;89(8):756-69. [PMID: 19520732] http://dx.doi.org/10.2522/ptj.20080331

[14.] Lucas NP, Macaskill P, Irwig L, Bogduk N. The development of a quality appraisal tool for studies of diagnostic reliability (QAREL). J Clin Epidemiol. 2010;63(8):854-61. [PMID: 20056381] http://dx.doi.org/10.1016/j.jclinepi.2009.10.002

[15.] Van Trijffel E, Anderegg Q, Boussuyt PM, Lucas C. Inter-examiner reliability of passive assessment of intervertebral motion in the cervical and lumbar spine: A systematic review. Man Ther. 2005;10(4):256-69. [PMID: 15994114] http://dx.doi.org/10.1016Zj.math.2005.04.008

[16.] Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychol Bull. 1979;86(2):420-28. [PMID: 18839484] http://dx.doi.org/10.1037/0033-2909.86.2.420

[17.] Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 1998;26(4):217-38. [PMID: 9820922] http://dx.doi.org/10.2165/00007256-199826040-00002

[18.] Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. 2005;19(1):231-40. [PMID: 15705040]

[19.] Dyck PJ, Kratz KM, Lehman KA, Karnes JL, Melton LJ 3rd, O'Brien PC, Litchy WJ, Windebank AJ, Smith BE, Low PA, et al. The Rochester Diabetic Neuropathy Study: Design, criteria for types of neuropathy, selection bias, and reproducibility of neuropathic tests. Neurology. 1991;41(6):799-807. [PMID: 2046920]

[20.] Heldestad V, Linder J, Sellersjo L, Nordh E. Reproducibility and influence of test modality order on thermal perception and thermal pain thresholds in quantitative sensory testing. Clin Neurophysiol. 2010;121(11):1878-85. [PMID: 20478739] http://dx.doi.org/10.1016/j.clinph.2010.03.055

[21.] Kemler MA, Reulen JP, Van Kleef M, Barendse GA, Van den Wildenberg FA, Spaans F. Thermal thresholds in complex regional pain syndrome type I: Sensitivity and repeatability of the methods of limits and levels. Clin Neurophysiol. 2000; 111(9):1561-68. [PMID: 10964065] http://dx.doi.org/10.1016/S1388-2457(00)00358-8

[22.] Pigg M, Baad-Hansen L, Svensson P, Drangsholt M, List T. Reliability of intraoral quantitative sensory testing (QST). Pain. 2010;148(2):220-26. [PMID: 20022428] http://dx.doi.org/10.1016/j.pain.2009.10.024

[23.] Zwart JA, Sand T. Repeatability of dermatomal warm and cold sensory thresholds in patients with sciatica. Eur Spine J. 2002;11(5):441-46. [PMID: 12384751] http://dx.doi.org/10.1007/s005860100333

[24.] Becser N, Sand T, Zwart JA. Reliability of cephalic thermal thresholds in healthy subjects. Cephalalgia. 1998;18(8): 574-82. [PMID: 9827251] http://dx.doi.org/10.1046/j.1468-2982.1998.1808574.x

[25.] Claus D, Hilz MJ, Neundorfer B. Thermal discrimination thresholds: A comparison of different methods. Acta Neurol Scand. 1990;81(6):533-40. [PMID: 2220312] http://dx.doi.org/10.1111/j.1600-0404.1990.tb01015.x

[26.] Gelber DA, Pfeifer MA, Broadstone VL, Munster EW, Peterson M, Arezzo JC, Shamoon H, Zeidler A, Clements R, Green DA, Porte D Jr, Laudadio C, Bril V. Components of variance for vibratory and thermal threshold testing in normal and diabetic subjects. J Diabetes Complications. 1995;9(3):170-76. [PMID: 7548981] http://dx.doi.org/10.1016/1056-8727(94)00042-M

[27.] Moravcova E, Bednaik J, Svobodnik A, Dusek L. Reproducibility of thermal threshold assessment in small-fibre neuropathy patients. Scripta Medica (BRNO). 2005;78(3): 177-84.

[28.] Valensi P, Attali JR, Gagant S. Reproducibility of parameters for assessment of diabetic neuropathy. The French Group for Research and Study of Diabetic Neuropathy. Diabet Med. 1993;10(10):933-39. [PMID: 8306589] http://dx.doi.org/10.1111/j.1464-5491.1993.tb00009.x

[29.] Felix ER, Widerstrom-Noga EG. Reliability and validity of quantitative sensory testing in persons with spinal cord injury and neuropathic pain. J Rehabil Res Dev. 2009;46(1): 69-84. [PMID: 19533521] http://dx.doi.org/10.1682/JRRD.2008.04.0058

[30.] Peltier A, Smith AG, Russell JW, Sheikh K, Bixby B, Howard J, Goldstein J, Song Y, Wang L, Feldman EL, Singleton JR. Reliability of quantitative sudomotor axon reflex testing and quantitative sensory testing in neuropathy of impaired glucose regulation. Muscle Nerve. 2009;39(4): 529-35. [PMID: 19260066] http://dx.doi.org/10.1002/mus.21210

[31.] Wasner GL, Brock JA. Determinants of thermal pain thresholds in normal subjects. Clin Neurophysiol. 2008;119(10): 2389-95. [PMID: 18778969] http://dx.doi.org/10.1016/j.clinph.2008.07.223

[32.] Yarnitsky D, Sprecher E, Zaslansky R, Hemli JA. Heat pain thresholds: Normative data and repeatability. Pain. 1995; 60(3):329-32. [PMID: 7596629] http://dx.doi.org/10.1016/0304-3959(94)00132-X

[33.] De Neeling JN, Beks PJ, Bertelsmann FW, Heine RJ, Bouter LM. Sensory thresholds in older adults: Reproducibility and reference values. Muscle Nerve. 1994;17(4): 454-61. [PMID: 8170493] http://dx.doi.org/10.1002/mus.880170414

[34.] Bird SJ, Brown MJ, Spino C, Watling S, Foyt HL. Value of repeated measures of nerve conduction and quantitative sensory testing in a diabetic neuropathy trial. Muscle Nerve. 2006;34(2):214-24. [PMID: 16708368] http://dx.doi.org/10.1002/mus.20577

[35.] Bravenboer B, Van Dam PS, Hop J, vd Steenhoven J, Erkelens DW. Thermal threshold testing for the assessment of small fibre dysfunction: Normal values and reproducibility. Diabet Med. 1992;9(6):546-49. [PMID: 1643803] http://dx.doi.org/10.1111/j.1464-5491.1992.tb01836.x

[36.] Claus D, Mustafa C, Vogel W, Herz M, Neundorfer B. Assessment of diabetic neuropathy: Definition of norm and discrimination of abnormal nerve function. Muscle Nerve. 1993;16(7):757-68. [PMID: 8505932] http://dx.doi.org/10.1002/mus.880160711

[37.] Krassioukov A, Wolfe DL, Hsieh JT, Hayes KC, Durham CE. Quantitative sensory testing in patients with incomplete spinal cord injury. Arch Phys Med Rehabil. 1999;80(10): 1258-63. [PMID: 10527084] http://dx.doi.org/10.1016/S0003-9993(99)90026-6

[38.] Agostinho CM, Scherens A, Richter H, Schaub C, Rolke R, Treede RD, Maier C. Habituation and short-term repeatability of thermal testing in healthy human subjects and patients with chronic non-neuropathic pain. Eur J Pain. 2009;13(8):779-85. [PMID: 19019713] http://dx.doi.org/10.1016/j.ejpain.2008.10.002

[39.] Jensen R, Hystad T, Kvale A, Baerheim A. Quantitative sensory testing of patients with long lasting patellofemoral pain syndrome. Eur J Pain. 2007;11(6):665-76. [PMID: 16268247] http://dx.doi.org/10.1016/j.ejpain.2006.10.007

[40.] El-Masry Ivanovic S, Faris H, Baad-Hansen L, Svensson P, List T, Dransgholt M. Preliminary report of reliability of quantitative sensory testing (QST) in patients with persistent idiopathic dentoalveolar pain. 13th World Congress on Pain; 2010 Aug 29-Sep 2; Montreal, Canada. Seattle (WA): IASP; 2010.

[41.] Geber C, Klein T, Azad S, Baron R, Birklein F, Gierth muhlen J, Huge V, Lauchart M, Nitzsche D, Stengel M, Valet M, Baron R, Maier C, Tolle T, Treede R. Test-retest and interobserver reliability of quantitative sensory testing according to the protocol of the German Research Network on Neuropathic Pain (DFNS): A multi-centre study. Pain. 2011;152(3):548-56. [PMID: 21237569] http://dx.doi.org/10.1016/j.pain.2010.11.013

Submitted for publication March 15, 2011. Accepted in revised form July 18, 2011.

This article and any complementary material should be cited as follows:

Moloney NA, Hall TM, Doody CM. Reliability of thermal quantitative sensory testing: A systematic review. J Rehabil Res Dev. 2012;49(2):191-208. http://dx.doi.org/10.1682/JRRD.2011.03.0044

Niamh A. Moloney, MSc; (1) * Toby M. Hall, PhD; (2) Catherine M. Doody, PhD (1)

(1) School of Public Health, Physiotherapy, and Population Science, University College Dublin, Belfield, Dublin, Ireland; (2) School of Physiotherapy, Curtin Health Innovation Research Institute, Curtin University, Perth, Australia

* Address all correspondence to Niamh A. Moloney, MSc; School of Public Health, Physiotherapy, and Population Science, University College Dublin, Belfield, Dublin 4, Ireland; +353-86-8106336; fax: +353-1-296-0647.

Email: n_moloney@yahoo.com

http://dx.doi.org/10.1682/JRRD.2011.03.0044
Figure 1. Article selection criteria. QST = quantitative sensory
testing.

Inclusion criteria--

* Experimental studies that assessed thermal QST for intra-rater,
inter-rater, and test-retest reliability.

* Experimental studies that compared thermal QST with other methods
of assessment but with reliability of thermal QST as stated aim
and/or objective.

* Experimental studies involving at least two separate test
sessions.

* Studies investigating both nondisabled and/or patient
populations.

* Articles published in English between 1990 and 2010.

* Studies with adult participants (>18 yr).

Exclusion criteria--

* Studies that did not have reliability as stated primary or
secondary aim or objective.

* Studies not involving at least two separate test sessions.

* Studies that described methods and statistical analysis
insufficiently to allow adequate analysis.

* Studies that involved manipulation of test scenario, e.g.,
simulation of results.

* Letters, editorials, or comments.

Figure 2. Quality Appraisal for Diagnostic Reliability checklist.

1. Was test evaluated in sample of subjects who were representative
of those to whom authors intended results to be applied?

2. Was test performed by raters who were representative of those to
whom authors intended results to be applied?

3. Were raters blinded to findings or other raters during study?

4. Were raters blinded to their own prior findings of test under
evaluation?

5. Were raters blinded to results of accepted reference standard or
disease status for target disorder (or variable) being evaluated?

6. Were raters blinded to clinical information that was not
intended to be provided as part of testing procedure or study
design?

7. Were raters blinded to additional cues that were not part of
test?

8. Was order of examination varied?

9. Was stability (or theoretical stability) of variable being
measured taken into account when determining suitability of
time-interval between repeated measures?

10. Was test applied correctly and interpreted appropriately?

11. Were appropriate statistical measures of agreement used?


Table 1.
Search strategy for PubMed and Embase.

Phase                            MeSH Terms

1. Specific Search Terms     Reliability
for Reliability Studies.     Reproducibility

2. Specific Search Terms     Electrophysiology
for Thermal QST.             Neurophysiology
                             Sensory threshold
                             Pain threshold
                             Pain receptors
                             Pain assessment
                             Hypesthesia
                             Reduced/impaired sensation
                             Thermal hypesthesia
3. Combination of Phases 1   --
and 2.

Phase                            Emtree Terms

1. Specific Search Terms     Reliability
for Reliability Studies.     Intratester reliability
                             Intertester reliability
                             Test-retest reliability
                             Reproducibility
2. Specific Search Terms     Electroneurology
for Thermal QST.             Sensory system electrophysiology
                             Perceptive threshold
                             Pain threshold
                             Thermal stimulation

3. Combination of Phases 1   --
and 2.

Phase                        Additional General Terms

1. Specific Search Terms     Method reliability
for Reliability Studies.

2. Specific Search Terms     QST
for Thermal QST.             Thermal QST
                             Thermal pain thresholds
                             Thermal detection thresholds
                             Psychophysical testing
                             Sensory testing
                             Thermal detection
                             Thermal pain
                             Pain detection
3. Combination of Phases 1   --
and 2.

MeSH = medical subject heading, QST = quantitative sensory testing.

Table 2.
Results from Quality Appraisal for Reliability Studies (QAREL)
checklist.

                                      QAREL Item

Study                         1       2        3 *         4

Agostinho et al. (2009)      Yes   Unclear   Unclear/   Unclear
[1]                                             NA

Becser et al. (1998) [2]     No    Unclear   Unclear    Unclear

Bird et al. (2006) [3]       Yes     Yes     Unclear    Unclear

Bravenboer et al.            Yes   Unclear   Unclear/   Unclear
(1992) [4]                                      NA

Claus et al. (1990) [5]      No    Unclear   Unclear/   Unclear
                                                NA

Claus et al. (1993) [6]      Yes   Unclear   Unclear/   Unclear
                                                NA

De Neeling et al. (1994)     Yes   Unclear   Unclear/   Unclear
[7]                                             NA

Dyck et al. (1991) [8]       Yes     Yes       Yes        Yes

Felix and Widerstrom-        Yes   Unclear   Unclear/   Unclear
Noga (2009) [9]                                 NA

Gelber et al. (1995) [10]    No    Unclear   Unclear/   Unclear
                                                NA

Heldestad et al. (2010)      No    Unclear   Unclear/   Unclear
[11]                                            NA

Kemler et al. (2000)         Yes     Yes        NA      Unclear
[12]

Krassioukov et al.           Yes   Unclear   Unclear/   Unclear
(1999) [13]                                     NA

Moravcova et al. (2005)      Yes   Unclear      NA      Unclear
[14]

Peltier et al. (2009) [15]   Yes   Unclear   Unclear/   Unclear
                                                NA

Pigg et al. (2010) [16]      No    Unclear     Yes        Yes

Valensi et al. (1993)        Yes   Unclear   Unclear    Unclear
[17]

Wasner and Brock             No    Unclear      NA      Unclear
(2008) [18]

Yarnitsky and Sprecher       No    Unclear      NA      Unclear
(1994) [19]

Yarnitsky et al. (1995)      No    Unclear      NA      Unclear
[20]

Zwart and Sand (2002)        Yes     Yes        NA      Unclear
[21]

Study                        QAREL Item

                                5         6         7         8

Agostinho et al. (2009)      Unclear   Unclear   Unclear   Unclear
[1]

Becser et al. (1998) [2]       NA        NA        NA        No
                                                           (fixed)

Bird et al. (2006) [3]       Unclear   Unclear   Unclear   Unclear

Bravenboer et al.            Unclear   Unclear   Unclear   Unclear
(1992) [4]

Claus et al. (1990) [5]        NA        NA        NA        Yes

Claus et al. (1993) [6]      Unclear   Unclear   Unclear   Unclear

De Neeling et al. (1994)     Unclear   Unclear   Unclear   Unclear
[7]

Dyck et al. (1991) [8]       Unclear     Yes     Unclear   Unclear

Felix and Widerstrom-        Unclear   Unclear   Unclear   Unclear
Noga (2009) [9]

Gelber et al. (1995) [10]      NA        NA        NA      Unclear

Heldestad et al. (2010)        NA        NA        NA        Yes
[11]

Kemler et al. (2000)         Unclear   Unclear   Unclear     No
[12]                                                       (fixed)

Krassioukov et al.           Unclear   Unclear   Unclear     No
(1999) [13]                                                (fixed)

Moravcova et al. (2005)      Unclear   Unclear   Unclear     Yes
[14]

Peltier et al. (2009) [15]   Unclear   Unclear   Unclear     No
                                                           (fixed)

Pigg et al. (2010) [16]        NA        NA        NA      Unclear

Valensi et al. (1993)        Unclear   Unclear   Unclear   Unclear
[17]

Wasner and Brock               NA        NA        NA      Unclear
(2008) [18]

Yarnitsky and Sprecher         NA        NA        NA      Unclear
(1994) [19]

Yarnitsky et al. (1995)        NA        NA        NA      Unclear
[20]

Zwart and Sand (2002)        Unclear     Yes     Unclear   Unclear
[21]

                                      QAREL Item

Study                           9        10              11

Agostinho et al. (2009)        Yes       Yes            Yes
[1]

Becser et al. (1998) [2]       Yes     Unclear          Yes

Bird et al. (2006) [3]         Yes       Yes            Yes

Bravenboer et al.              Yes       Yes            Yes
(1992) [4]

Claus et al. (1990) [5]        Yes       Yes        No [[dagger]

Claus et al. (1993) [6]        Yes       Yes            Yes

De Neeling et al. (1994)       Yes       Yes            Yes
[7]

Dyck et al. (1991) [8]         Yes       Yes     No[ double dagger]

Felix and Widerstrom-        Unclear     Yes            Yes
Noga (2009) [9]

Gelber et al. (1995) [10]      Yes       Yes            Yes

Heldestad et al. (2010)        Yes       Yes            Yes
[11]

Kemler et al. (2000)           Yes       Yes            Yes
[12]

Krassioukov et al.             Yes       Yes            Yes
(1999) [13]

Moravcova et al. (2005)        Yes       Yes     No [double dagger]
[14]

Peltier et al. (2009) [15]   Unclear     Yes     No [double dager]

Pigg et al. (2010) [16]        Yes       Yes            Yes

Valensi et al. (1993)        Unclear   Unclear   No [double dager]
[17]

Wasner and Brock               Yes       Yes     No [double dager]
(2008) [18]

Yarnitsky and Sprecher         Yes       Yes            Yes
(1994) [19]

Yarnitsky et al. (1995)        Yes       Yes            Yes
[20]

Zwart and Sand (2002)          Yes       Yes            Yes
[21]

                              High +
                              Quality
Study                        [dagger]

Agostinho et al. (2009)         No
[1]

Becser et al. (1998) [2]        No

Bird et al. (2006) [3]          No

Bravenboer et al.               No
(1992) [4]

Claus et al. (1990) [5]         No

Claus et al. (1993) [6]         No

De Neeling et al. (1994)        No
[7]

Dyck et al. (1991) [8]         Yes

Felix and Widerstrom-           No
Noga (2009) [9]

Gelber et al. (1995) [10]       No

Heldestad et al. (2010)        Yes
[11]

Kemler et al. (2000)           Yes
[12]

Krassioukov et al.              No
(1999) [13]

Moravcova et al. (2005)         No
[14]

Peltier et al. (2009) [15]      No

Pigg et al. (2010) [16]        Yes

Valensi et al. (1993)           No
[17]

Wasner and Brock                No
(2008) [18]

Yarnitsky and Sprecher          No
(1994) [19]

Yarnitsky et al. (1995)         No
[20]

Zwart and Sand (2002)          Yes
[21]

Results from Quality Appraisal for Reliability Studies (QAREL)
checklist.

* In many studies where type of study has not been specified, it was
unclear whether it was applicable for testers to be blinded to
findings of other testers (noted as "Unclear/NA").

[dagger] High quality requirement: 50 percent of checklist items
achieving "yes" score. With 11 -item checklist, minimum of six "yes"
scores was required, or in cases where checklist items were not
relevant, score of 50 percent of relevant items was required.

[double dagger] No "[double dagger]" symbol indicates that measure of
statistical analysis was appropriate, but study lacked measure of
precision or sufficient raw data.

[1.] Agostinho CM, Scherens A, Richter H, Schaub C, Rolke R, Treede
RD, Maier C. Habituation and short-term repeatability of thermal
testing in healthy human subjects and patients with chronic
non-neuropathic pain. Eur J Pain. 2009;13(8):779-85.

[2.] Becser N, Sand T, Zwart JA. Reliability of cephalic thermal
thresholds in healthy subjects. Cephalalgia. 1998;18(8):574-82.

[3.] Bird SJ, Brown MJ, Spino C, Watling S, Foyt HL. Value of repeated
measures of nerve conduction and quantitative sensory testing in a
diabetic neuropathy trial. Muscle Nerve. 2006;34(2):214-24.

[4.] Bravenboer B, Van Dam PS, Hop J, vd Steenhoven J, Erkelens DW.
Thermal threshold testing for the assessment of small fibre
dysfunction: Normal values and reproducibility. Diabet Med. 1992;9(6):
546-19.

[5.] Claus D, Hilz MJ, Neundorfer B. Thermal discrimination
thresholds: A comparison of different methods. Acta Neurol Scand.
1990;81(6):533-40.

[6.] Claus D, Mustafa C, Vogel W, Herz M, Neundorfer B. Assessment of
diabetic neuropathy: Definition of norm and discrimination of abnormal
nerve function. Muscle Nerve. 1993;16(7):757-68.

[7.] De Neeling JN, Beks PJ, Bertelsmann FW, Heine RJ, Bouter LM.
Sensory thresholds in older adults: Reproducibility and reference
values. Muscle Nerve. 1994;17(4):454-61.

[8.] Dyck PJ, Kratz KM, Lehman KA, Karnes JL, Melton LJ 3rd, O'Brien
PC, Litchy WJ, Windebank AJ, Smith BE, Low PA, et al. The Rochester
Diabetic Neuropathy Study: Design, criteria for types of neuropathy,
selection bias, and reproducibility of neuropathic tests. Neurology.
1991;41(6):799-807.

[9.] Felix ER, Widerstrom-Noga EG. Reliability and validity of
quantitative sensory testing in persons with spinal cord injury and
neuropathic pain. J Rehabil Res Dev. 2009;46(1):69-84.

[10.] Gelber DA, Pfeifer MA, Broadstone VL, Munster EW, Peterson M,
Arezzo JC, Shamoon H, Zeidler A, Clements R, Green DA, Porte D Jr,
Laudadio C, Bril V. Components of variance for vibratory and thermal
threshold testing in normal and diabetic subjects. J Diabetes
Complications. 1995;9(3):170-76.

[11.] Heldestad V, Linder J, Sellersjo L, Nordh E. Reproducibility and
influence of test modality order on thermal perception and thermal
pain thresholds in quantitative sensory testing. Clin Neurophysiol.
2010; 121(11):1878-85.

[12.] Kemler MA, Reulen JP, Van Kleef M, Barendse GA, Van den
Wildenberg FA, Spaans F. Thermal thresholds in complex regional pain
syndrome type I: Sensitivity and repeatability of the methods of
limits and levels. Clin Neurophysiol. 2000;111(9):1561-68.

[13.] Krassioukov A, Wolfe DL, Hsieh JT, Hayes KC, Durham CE.
Quantitative sensory testing in patients with incomplete spinal cord
injury. Arch Phys Med Rehabil. 1999;80(10):1258-63.

[14.] Moravcova E, Bednaik J, Svobodnik A, Dusek L. Reproducibility of
thermal threshold assessment in small-fibre neuropathy patients.
Scripta Medica (BRNO). 2005;78(3):177-84.

[15.] Peltier A, Smith AG, Russell JW, Sheikh K, Bixby B, Howard J,
Goldstein J, Song Y, Wang L, Feldman EL, Singleton JR. Reliability of
quantitative sudomotor axon reflex testing and quantitative sensory
testing in neuropathy of impaired glucose regulation. Muscle Nerve.
2009;39(4):529-35.

[16.] Pigg M, Baad-Hansen L, Svensson P, Drangsholt M, List T.
Reliability of intraoral quantitative sensory testing (QST). Pain.
2010;148(2):220-26.

[17.] Valensi P, Attali JR, Gagant S. Reproducibility of parameters
for assessment of diabetic neuropathy. The French Group for Research
and Study of Diabetic Neuropathy. Diabet Med. 1993;10(10):933-39.

[18.] Wasner GL, Brock JA. Determinants of thermal pain thresholds in
normal subjects. Clin Neurophysiol. 2008;119(10):2389-95.

[19.] Yarnitsky D, Sprecher E. Thermal testing: Normative data and
repeatability for various test algorithms. J Neurol Sci.
1994;125(1):39-45.

[20.] Yarnitsky D, Sprecher E, Zaslansky R, Hemli JA. Heat pain
thresholds: Normative data and repeatability. Pain. 1995;60(3):329-32.

[21.] Zwart JA, Sand T. Repeatability of dermatomal warm and cold
sensory thresholds in patients with sciatica. Eur Spine J.
2002;11(5):441-46.

NA = not applicable.

Table 3.
Testing methods, statistical analysis, and results found in electronic
search results.

                            Methods of Limits

                        Type of
                     Reliability/           Subjects/
Study                  Interval              Testers

Agostinho       Test-retest [dagger]/     36 nondis-
et al. (2009)   2 days                    abled;
[1]                                       36 non-
                                          neuropathic
                                          pain/Not
                                          specified

Becser et al.   Intrarater                20 nondis-
(1998) [2]      and inter-                abled/
                rater/Within              Becser and
                7 day limit               Zwart (no
                                          other details)

Claus et al.    Test-retest/              30 nondis-
(1993) [3]      2 conse-                  abled; 12
                cutive days               diabetes/
                within                    Not
                1 week                    specified

Felix and       Test-retest/              10 SCI and
Widerstrom-     1 week                    neuropathic
Noga(2009)                                pain; 10 non-
[4]                                       disabled/Not
                                          specified

Heldestad       Test-retest/              38 nondis-
et al.          Between                   abled/Not
(2010) [5]      1-8 days                  specified

Kra-            Test-retest/              21 SCI; 14
sioukov et      SCI: 3                    nondis-
al. (1999)      weeks;                    abled/Not
[6]             Nondis-                   specified
                abled
                controls:
                1 week

Pigg et al.     Intra- and                21 nondis-
(2010) [7]      interrater/               abled/2
                Twice day 1               authors
                by 2 exam-                trained by
                iners, again              DFNS
                1-2 weeks
                later by 1
                examiner

Wasner and      Test-retest/              20 nondis-
Brock (2008)    3 times over              abled/
[8]             3 weeks                   Wasner
                (days 0, 1,
                and 21)

Zwart and       Test-retest/              19 lumbo-
Sand (2002)     Tested twice              sacral radicu-
[9]             1 to 2 hours              lopathy/
                between tests             Zwart

Mixed Method of Limits and Method of Levels

Claus et al.    Test-retest [dagger]/3    55 nondis-
(1990) [10]     consecutive               abled/Not
                days                      specified

Kemler et al.   Intrarater/               53 CRPS/
(2000) [11]     1 month                   Not
                                          specified

Moravcova       Intrarater/               58 small-
et al. (2005)   Twice over 1              fiber neuro-
[12]            week                      pathy;
                                          30 nondisab
                                          led/Morav-
                                          cova

Yarnitsky       Intrarater/2              106 nondis-
and Spre-       weeks                     abled/Not
cher (1994)                               specified
[13]                                      ("single
                                          technician")

Yarnitsky et    Test-retest/              72 nondis-
al. (1995)      2 weeks                   abled/Not
[14]                                      specified
                                          ("single
                                          technician")

Methods of Levels

Bird et al.     Test-retestt [dagger]/3   1,100 clini-
(2006) [15]     separate days             cally stable
                within 4-                 diabetes with
                week period               mild neurop-
                                          athy/CRCC-
                                          trained neu-
                                          rologists and
                                          technologists

Bravenboer      Test-retest [dagger]/     39 diabetes
et al. (1992)   2 weeks                   without
[16]                                      known neur-
                                          opathy/Not
                                          specified

De Neeling      Test-retest/              19 nondis-
et al. (1994)   13-24 days                abled; 20
[17]                                      with without
                                          non-insulin
                                          dependent
                                          diabetes/Not
                                          specified
                                          ("one of
                                          three observ-
                                          ers")

Dyck et al.     Intra- and                20 diabetes
(1991) [18]     interrater/               with and
                3-5 days                  without
                                          neuropathy/
                                          3 neurolo-
                                          gists

Gelber et       Intra- and                10 nondis-
al. (1995)      interrater/               abled for
[19]            3 test ses-               intratester
                sions on                  reliability;
                3 days (days              compared
                1, 2, and 7               140 nondis-
                for n = 29);              abled at 6
                1 test ses-               centers/Not
                sion on                   specified
                3 days (days
                1, 2, and 7
                for n = 9)

Peltier et      Test-retest/              19 impaired
al. (2009)      Twice over                glucose
[20]            30 days                   regulation
                                          and periph-
                                          eral neurop-
                                          athy/Not
                                          specified

Valensi et      Intrarater                132 diabetes
al. (1993)      (inter-                   with periph-
[21]            center)/                  eral neurop-
                4 weeks                   athy/1
                                          neurophysi-
                                          ologist in
                                          each center

                               Method of Limits

                                        Environmental
                     Modality/             Factors/
Study                Equipment           Instructions

Agostinho       CDT, WDT,              No details on
et al. (2009)   CPT, HPT/              environmen-
[1]             TSA-II                 tal factors/
                Neuro                  Standardized
                Sensory                protocol
                Analyzer               developed by
                [double                DFNS that
                dagger]                includes
                                       verbal instruc-
                                       tions

Becser et al.   CDT, WDT/              Quiet room
(1998) [2]      Somedic                22[degrees]C-
                thermotest             23[degrees]C/
                equipment              Brief details
                [section]              on instruc-
                                       tions
                                       provided

Claus et al.    CDT, WDT/              Not
(1993) [3]      Modified               specified
                Marstock
                thermode **

Felix and       CDT, WDT,              Quiet room
Widerstrom-     CPT, HPT/              with tempera-
Noga(2009)      TSA-II                 ture con-
[4]             Neuro                  trolled; skin
                Sensory                temperature
                Analyzer               recorded/No
                [double                details on
                dagger]                instructions

Heldestad       CDT, WDT,              Quiet room;
et al.          CPT, HPT/              no other
(2010) [5]      Somedic                details pro-
                thermotest             vided/Stand-
                equipment              ard instruc-
                [section]              tions used

Kra-            CDT, WDT,              Not
ioukov et      CPT/TSA-               specified
al. (1999)      II Neuro-
[6]             Sensory
                Analyzer
                [double
                dagger]

Pigg et al.     CDT, WDT,              Followed
(2010) [7]      CPT, HPT/              DFNS proto-
                MSA                    col (DFNS
                Thermal                provides
                Stimulator             standardized
                [section]              instructions
                                       but no infor-
                                       mation on
                                       environmen-
                                       tal factors)

Wasner and      CPT, HPT/              Room held at
Brock (2008)    TSA-II                 22[degrees]C-
[8]             Neuro                  23[degrees]C
                Sensory                with relative
                Analyzer               humidity of
                [dagger]               50%-60%/
                                       Used DFNS
                                       protocol
                                       instructions; no
                                       information on
                                       noise control

Zwart and       CDT, WDT/              Not specified
Sand (2002)     Somedic
[9]             thermotest
                equipment
                [section]

Mixed Method of Limits and Method of Levels

Claus et al.    CDT, WDT/              Not specified
(1990) [10]     Modified
                Marstock
                thermode **
                (MLI, MLE)

Kemler et al.   CDT, WDT/              Temperature-
(2000) [11]     TSA-II Neu-            controlled
                ro Sensory             laboratory
                Analyzer               (22[degrees]C-
                                       24[degrees]C);
                [double                no visual
                dagger]                access to
                (MLE vs                computer; no
                MLI)                   visual/audi-
                                       tory cues/
                                       Instructions
                                       described

Moravcova       CDT, WDT/              Protocol
et al. (2005)   Nicolet                description for
[12]            Viking IV              thermal QST
                electrodiag-           very brief;
                nostic unit            authors state
                [dagger]               that "condi-
                [dagger]               tions were
                TSA-II Neu-            standardized"
                ro Sensory             but detail
                Analyzer               insufficient/
                [double                Standardized
                dagger]                instructions
                (MLI [ran-             used
                dom and
                nonran-
                dom], MLE)

Yarnitsky       CDT, WDT/              Soundproof
and Spre-       TSA-II                 air-condi-
cher (1994)     Neuro                  tioned room
[13]            Sensory                with distrac-
                Analyzer               tions mini-
                [double                mized/
                dagger]                Standard
                (MLI, MLE,             instructions
                SC)                    used

Yarnitsky et    HPT/TSA-II             Soundproof
al. (1995)      NeuroSen-              air-condi-
[14]            sory Ana-              tioned room
                lyzer [double          with distrac-
                dagger] (MLI,          tions mini-
                MLE, SC)               mized/
                                       Standard
                                       instructions
                                       used

Methods of Levels

Bird et al.     CDT/CASE               Temperature
(2006) [15]     IV system              controlled
                [double dagger]        room/No
                [double dagger]        details on
                (MLE: 4-2-1            instructions or
                stepping               noise
                algorithm)

Bravenboer      CDT, WDT/              Not specified
et al. (1992)   Triple T
[16]            Thermal
                Threshold
                Tester 2 [section]
                [section]

De Neeling      TDT (combi-            Quiet ambi-
et al. (1994)   nation of              ence with con-
[17]            CDT and                stant room
                WDT)                   temperature of
                                       18[degrees]C-
                                       22[degrees]C/
                                       No details on
                                       instructions

Dyck et al.     CDT, WDT/              Not specified
(1991) [18]     CASE III/
                IV sys-
                tems [double dagger]
                [double dagger]

Gelber et       CDT/Ther-              Quiet room
al. (1995)      mal sensi-             free from
[19]            tivity                 visual dis-
                tester [paragraph]     tractions;
                [paragraph]            skin tempera-
                                       ture
                                       recorded/
                                       Standardized
                                       instructions
                                       used

Peltier et      CDT/CASE               "Conditions
al. (2009)      IV system [double      of the testing
[20]            dagger] [double        were stan-
                dagger] ("previ-       dardized"/
                ously pub-             Standardized
                lished                 instructions
                methodol-              used
                ogy")

Valensi et      CDT, WDT/              No informa-
al. (1993)      Thermal                tion provided
[21]            testing
                system (no
                additional
                informa-
                tion)

                            Method of Limits

                                                   Measure of
                         Measure of                Precision/
Study                   Repeatability               Stability

Agostinho       ANOVA                           Analysis of
et al. (2009)                                   absolute vari-
[1]                                             ables/Bland-
                                                Altman
                                                analysis

Becser et al.   CR; Bland-                      95% reference
(1998) [2]      Altman analysis,                limits
                presented as [degrees]C, ICC    presented as
                                                "retest as
                                                percent of first
                                                test"

Claus et al.    Linear correlation              CV, 90th
(1993) [3]      coefficient                     percentile day
                                                to day differ-
                                                ences of abso-
                                                lute values/
                                                Magnitude of
                                                day to day
                                                differences;
                                                % mean value

Felix and       ICC (one-way random             95% CI
Widerstrom-     effects model)
Noga(2009)
[4]

Heldestad       Repeatability between           Inter- and
et al.          days and test sessions;         intrasubject
(2010) [5]      CR Bland-Altman analy-          reproducibil-
                sis; correlations between       ity; Analyses
                data at repeated testing        of inter-
                in different days and           and intraindi-
                different sessions within       vidual
                same day (Spearman              differences
                rank correlation); varia-
                tions in thresholds
                between days and test
                sessions (CV)

Kra-            ICC                             SD; CV
sioukov et
al. (1999)
[6]

Pigg et al.     ICC                             Mean [+ or -] SD/
(2010) [7]                                      MID

Wasner and      ICC                             No
Brock (2008)
[8]

Zwart and       CR; ICC for between varia-      Not applicable
Sand (2002)     tion, repeated measures of      as statistical
[9]             ANOVA                           analysis
                                                sufficient

Mixed Method of Limits and Method of Levels

Claus et al.    Reliability coefficient (Rtt)   Confidence
(1990) [10]                                     limits men-
                                                tioned but
                                                values not
                                                provided

Kemler et al.   CR                              Bland-Altman
(2000) [11]                                     analysis

Moravcova       CR                              No
et al. (2005)
[12]

Yarnitsky       Repeatability r                 MISD
and Spre-
cher (1994)
[13]

Yarnitsky et    Repeatability r                 MISD
al. (1995)
[14]

Methods of Levels

Bird et al.     Total                           CV
(2006) [15]     variance (SD); ICC

Bravenboer      CR Bland-Altman                 No
et al. (1992)   analysis
[16]

De Neeling      Reliability coefficient         95% CI, SD
et al. (1994)                                   diff, CV
[17]

Dyck et al.     ICC                             CI on graph
(1991) [18]                                     but specific
                                                measures not
                                                provided

Gelber et       ANOVA linear regression         CV
al. (1995)
[19]

Peltier et      ICC                             No
al. (2009)
[20]

Valensi et      CV; percentages of total        No
al. (1993)      variance
[21]

                      Method of Limits

Study                      Results

Agostinho       ANOVA: No significant
et al. (2009)   systematic difference between
[1]             days for WDT and HPT; sys-
                tematic difference between
                days for CDT and CPT.
                Bland-Altman analysis: No
                significant difference from
                baseline for CDT and CPT.
                r-Values--
                All:
                CDT = 0.54, WDT = 0.44,
                CPT = 0.61, HPT = 0.52.
                Nondisabled:
                CDT = 0.43, WDT = 0.49,
                CPT = 0.62, HPT = 0.51.
                Patients:
                CDT = 0.62, WDT = 0.41,
                CPT = 0.60, HPT = 0.55.

Becser et al.   Intrarater reliability--
(1998) [2]      ICC average:
                CDT = 0.63, WDT = 0.66.
                CR average ([degrees]C):
                CDT = 1.04, WDT = 1.56.
                95% reference limits (lower):
                CDT = 39-56, WDT =
                35-68.
                95% reference limits (upper):
                CDT = 174-244, WDT =
                137-220.
                Interrater reliability--
                ICC average:
                CDT = 0.53, WDT = 0.60.
                CR average ([degrees]C):
                CDT = 1.17, WDT = 1.45.

Claus et al.    r-Value:
(1993) [3]      CDT = 0.66, WDT = 0.77.
                CV (%):
                CDT = 65, WDT = 32.
                90th percentile difference:
                CDT = 3.1, WDT = 2.3.
                Magnitude day to day
                difference:
                CDT = 19.0, WDT = 2.0.

Felix and       ICC (95% CI)--
Widerstrom-     SCI:
Noga(2009)      CDT = 0.90 (0.83-0.94),
[4]             WDT = 0.95 (0.91-0.95),
                CPT = 0.50 (0.28-0.67),
                HPT = 0.50 (0.28-0.66).
                Nondisabled:
                CDT = 0.68 (0.54-0.78),
                WDT = 0.70 (0.57-0.80),
                CPT = 0.49 (0.31-0.64),
                HPT = 0.68 (0.55-0.79).

Heldestad       No difference between
et al.          repeated testing on days 1,
(2010) [5]      2, and 7.
                Repeatability between
                measurements within
                participants (as first test),
                mean CR ([degrees]C)--
                First test:
                CDT = 1.00, WDT = 1.06,
                CPT = 6.50, HPT = 5.99.
                After thermal pain
                assessment:
                CDT = 2.92, WDT = 2.08.
                Intraindividual variation for
                first test (CV [degrees]C)--
                Absolute values (mean):
                CDT = 3.10, WDT = 1.80,
                CPT = 0.63, HPT = 1.60.
                [DELTA] (%):
                CDT = 65.80, WDT = 35.30,
                CPT = 0.29, HPT = 5.80.
                Within days repeatability
                (CV%) absolute values--
                CDT = 0.89-6.07 (first test),
                WDT = 0.33-4.10 (first test),
                CPT = 0.46-1.70, HPT =
                0.63-8.10.

Kra-            Nondisabled--
sioukov et      ICC:
al. (1999)      CDT = 0.75-0.90, WDT =
[6]             0.36-0.84, CPT = 0.91
                0.95.
                CV%:
                CDT = 4.00-10.80, WDT =
                1.20-8.30, CPT = 56.30 -
                100.30.
                SCI--
                ICC:
                CDT = 0.45-0.81, WDT =
                0.23-0.69, CPT = 0.65 -
                0.89.
                CV%:
                CDT = 42.60-75.50, WDT =
                7.10-12.00, CPT = 72.7 -
                139.30.

Pigg et al.     Intrarater reliability--
(2010) [7]      ICC:
                CDT = 0.45-0.77, WDT =
                0.23-0.67, CPT = 0.55-0.87,
                HPT = 0.64-0.80.
                MID:
                CDT = 0.40-4.70, WDT =
                0.50-2.40, CPT = 2.20-4.20,
                HPT = 1.20-2.30.
                Interrater reliability--
                ICC:
                CDT = 0.21-0.61, WDT =
                0.13-0.65, CPT = 0.44-0.91,
                HPT = 0.58-0.87.
                MID:
                CDT = 1.20-5.70, WDT =
                0.90-2.40, CPT = 2.00-4.60,
                HPT = 1.40-1.70.

Wasner and      ICC (r-value)--
Brock (2008)    Day 0 vs day 1:
[8]             CPT = 0.948, HPT = 0.648.
                Day 0 vs day 21:
                CPT = 0.781, HPT = 0.887.

Zwart and       Symptomatic side--
Sand (2002)     CR (%):
[9]             CDT = 42-51, WDT = 39-57.
                ICC:
                CDT = 0.40-0.83, WDT =
                0.35-0.67.
                Asymptomatic side--
                CR (%):
                CDT = 34-52, WDT = 40-65.
                ICC:
                CDT = 0.27-0.86, WDT =
                0.43-0.82.

Mixed Method of Limits and Method of Levels

Claus et al.    MLI--
(1990) [10]     Rtt 1/2:
                WDT = 0.73, CDT = 0.71.
                Rtt 2/3:
                WDT = 0.83, CDT = 0.71.
                MLE--
                Rtt 1/2:
                WDT = 0.82, CDT = 0.82.
                Rtt 2/3:
                WDT = 0.78, CDT = 0.78.

Kemler et al.   CR--
(2000) [11]     MLE:
                CDT unaffected wrist = 0.8,
                CDT affected wrist = 0.7, CDT
                unaffected foot = 4.1, CDT
                affected foot = 5.8, WDT unaf-
                fected wrist = 1.0, WDT
                affected wrist = 2.0, WDT
                unaffected foot = 5.4, WDT
                affected foot = 4.0.
                MLI:
                CDT unaffected wrist = 2.3,
                CDT affected wrist = 3.7, CDT
                unaffected foot = 5.3, CDT
                affected foot = 3.4, WDT unaf-
                fected wrist = 1.7, WDT
                affected wrist = 5.0, WDT
                unaffected foot = 2.9, WDT
                affected foot = 4.4.

Moravcova       Thenar cold--
et al. (2005)   Nondisabled:
[12]            MLI nonrandom = 1.06, MLI
                random = 0.71, MLE = 0.48.
                Patients:
                MLI nonrandom = 2.18, MLI
                random = 1.40, MLE = 1.22.
                Thenar warm--
                Nondisabled:
                MLI nonrandom = 0.76, MLI
                random = 0.72, MLE = 0.54.
                Patients:
                MLI nonrandom = 1.38, MLI
                random = 1.56, MLE = 1.24.

Yarnitsky       r-Value--
and Spre-       Thenar:
cher (1994)     MLE CDT = 1.040, MLE
[13]            WDT = 0.572, MLI CDT =
                1.964, MLI WDT = 1.587,
                SC CDT = 1.144, SC WDT =
                0.720.
                Foot:
                MLE CDT = 3.016, MLE
                WDT = 3.758, MLI CDT =
                3.778, MLI CDT = 4.298.
                Mean intersession
                difference--
                Thenar:
                MLE CDT = -0.086, MLE
                WDT = -0.006, MLI CDT =
                0.419, MLI WDT = 0.249,
                SC CDT = 0.013, SC WDT =
                -0.013.
                Foot:
                MLE CDT = -0.044, MLE
                WDT = 0.352, MLI CDT =
                0.197, MLI WDT = -0.115.

Yarnitsky et    Thenar eminence: r = 5.85.
al. (1995)      Foot: r = 4.47
[14]

Methods of Levels

Bird et al.     Variance--
(2006) [15]     Total: 20.88.
                Due to site: 1.15 (6%).
                Due to patient: 14.41 (69%).
                Random error: 5.58 (27%).
                ICC range-- 0.68-0.73.
                CV-- 30.22%.

Bravenboer      Correlation of reliability--
et al. (1992)   Normal:
[16]            Warm hand = 0.19, cold hand
                = 0.17, warm foot = 4.34,
                cold foot = 0.60.
                Abnormal:
                Warm hand = 1.17, cold
                hand = 1.01, cold foot = 4.69.

De Neeling      r (95% CI): 0.54 (0.26-0.73).
et al. (1994)   SD diff (95% CI): 0.49
[17]            (0.39-0.61).
                CV: 0.72.

Dyck et al.     ICC:
(1991) [18]     CDT > 0.9, WDT > 0.8.
                95% CI:
                CDT = 0.95-0.99 (approx),
                WDT = 0.55-0.90 (approx).

Gelber et       CV (%)--
al. (1995)      Finger:
[19]            Day-day = 41, technician -
                technician = 60, within
                day (same technician) = 80,
                center-center = 47.
                Toe:
                Day-day = 95, technician-
                technician = 145, within day
                (same technician) = 114,
                center-center = 87.

Peltier et      ICC--
al. (2009)      Test 1 vs test 4: 0.80.
[20]            Trial 1 vs trial 2: 0.83.

Valensi et      Total CV (%):
al. (1993)      WDT = 64.5, CDT = 116.6.
[21]            Intercenter variability (%):
                WDT = 3.9, CDT = 12.5.
                Intersubject variability (%):
                WDT = 39.4, CDT = 85.8.
                Intrasubject variability (%):
                WDT = 21.2, CDT = 18.3.

                       Method of Limits

Study                    Conclusions *

Agostinho       No significant differences
et al. (2009)   between r-values for patients
[1]             or nondisabled controls.
                Systematic difference
                between days for CDT and
                CPT but differences in abso-
                lute values is small.
                Conclusions--
                CDT = good r but systematic
                difference between days
                indicates fair reliability.
                WDT = fair reliability.
                CPT = good r but systematic
                difference between days
                indicates fair reliability.
                HPT = Fair reliability.

Becser et al.   Intrarater reliability--
(1998) [2]      CDT and WDT
                ICC values: Good.
                CR: moderate.
                Interrater reliability--
                ICC values: Good.
                Small but significant differ-
                ence for CDT.
                Conclusions--
                Intrarater:
                CDT and WDT = fair
                reliability.
                Interrater:
                CDT = fair reliability.
                WDT = good reliability but
                limited information. [paragraph]

Claus et al.    CDT demonstrates good
(1993) [3]      correlation coefficient but
                high variability and day to
                day differences. WDT
                demonstrates excellent
                correlation coefficient, mod-
                erate difference, and small
                day to day difference.
                Conclusions--
                CDT = poor reliability.
                WDT = good reliability.

Felix and       Conclusions--
Widerstrom-     CDT and WDT: Excellent
Noga(2009)      reliability in participants
[4]             with SCI and good reliability
                in nondisabled participants.
                CPT and HPT: Fair reliability
                in participants with SCI and
                fair to good reliability in
                nondisabled participants.

Heldestad       High degree of reproduc-
et al.          ibility for all measures.
(2010) [5]      Detection thresholds more
                reliable when assessed
                before pain thresholds.
                Conclusions--
                CDT = good reliability.
                WDT = excellent reliability.
                CPT = excellent reliability.
                HPT = good reliability.

Kra-            No systematic differences
sioukov et      across repeated days. Very
al. (1999)      large SD/CV for CDT and
[6]             CPT in participants with
                SCI and CPT in nondisabled
                participants.
                Conclusions--
                CDT = excellent reliability
                in nondisabled participants;
                fair reliability in partici-
                pants with SCI.
                WDT = fair to excellent reli-
                ability in nondisabled partici-
                pants; fair to good reliability
                in participants with SCI.
                CPT = high CV scores indi-
                cate poor to fair reliability
                in both groups.

Pigg et al.     Reliability varied although
(2010) [7]      authors suggest that reli-
                ability is suitable to clinical
                use. MID values low for
                most measures with low
                ICC values.
                Conclusions--
                Intrarater:
                CDT = fair to excellent.
                WDT = poor to good.
                CPT = fair to excellent.
                HPT = good to excellent.
                Interrater:
                CDT = poor to good.
                WDT = poor to good.
                CPT = fair to excellent.
                HPT = fair to excellent.

Wasner and      Conclusions--
Brock (2008)    CPT and HPT = good to excel-
[8]             lent reliability but limited
                information." [paragraph]

Zwart and       Mixed results with ICC
Sand (2002)     values varying from poor to
[9]             excellent. Coefficients of
                repeatability were high
                throughout.
                Conclusions--
                CDT = fair reliability.
                WDT = fair reliability.

Mixed Method of Limits and Method of Levels

Claus et al.    Rtt values indicate good
(1990) [10]     reliability, but analyses of
                retest values as percentage of
                initial assessment reveals
                marked variability between
                tests.
                Conclusions--
                CDT and WDT = fair
                reliability but limited
                information. [paragraph]

Kemler et al.   Conclusions--
(2000) [11]     All measures demonstrated
                poor reliability at foot.
                MLE CDT and WDT = good
                reliability at wrist.
                MLI CDT and WDT = poor
                reliability at all sites except
                unaffected wrist.

Moravcova       Reliability better for MLE
et al. (2005)   than MLI. Reliability better
[12]            for patient group than non-
                disabled participants.
                Conclusions--
                CDT and WDT = good
                reliability but limited
                information. [paragraph]

Yarnitsky       Intersession bias found for
and Spre-       MLI complicates reliability
cher (1994)     study. Higher r for lower
[13]            limb correlates with higher
                threshold values.
                Conclusions--
                MLI:
                CDT and WDT for thenar
                area = poor reliability.
                MLE and SC for thenar area:
                CDT = fair reliability.
                WDT = good reliability.
                MLI and MLE for foot area.
                CDT = fair reliability.
                WDT = fair reliability.

Yarnitsky et    Large coefficients of repeat-
al. (1995)      ability. MISD data not pre-
[14]            sented, but authors report
                "intersession bias" for heat
                pain at thenar eminence. No
                specific data provided for
                MLI, MLE, or SC.
                Conclusions--
                HPT = poor reliability but
                limited information. [paragraph]

Methods of Levels

Bird et al.     Low variance between sites.
(2006) [15]     High intrasubject variation.
                ICC values = good.
                CV = 30.22% (moderate
                variance).
                Main methodological limita-
                tion: Not primarily designed
                as a reliability study.
                Conclusions--
                CDT = fair reliability.

Bravenboer      Conclusions--
et al. (1992)   CDT and WDT = fair reli-
[16]            ability in hand and poor reli-
                ability in foot but limited
                information. [paragraph]

De Neeling      Fair estimate of reliability
et al. (1994)   with large variance.
[17]            Conclusions--
                TDT = poor reliability.

Dyck et al.     Results very briefly
(1991) [18]     described.
                Conclusions--
                CDT and WDT = excellent
                reliability but limited
                information. [paragraph]

Gelber et       Statistical methods briefly
al. (1995)      described. CV high for all
[19]            measures.
                Conclusions--
                CDT = poor reliability.

Peltier et      No measure of precision
al. (2009)      provided. Mean values
[20]            appear similar across tests,
                but SD appears large. Range
                of CDT values look large on
                graph but actual values not
                presented.
                Conclusions--
                CDT = excellent reliability
                but limited information. [paragraph]

Valensi et      Large variance for both
al. (1993)      WDT and CDT. Large inter-
[21]            subject variability. Small
                intrasubject or intercenter
                variability.
                Conclusions--
                CDT and WDT = fair
                reliability but limited
                information. [paragraph]

Note: Intra- and interrater reliability for all continuous variables.
ICC of <0.40 is considered poor; 0.40 to 0.59 fair; 0.60 to 0.75 good;
>0.75 excellent agreement [22].

* In absence of consensus on interpretation of reliability scores and/
or measures of precision, conclusions outlined are based on authors'
interpretation of statistical analysis.

[dagger] Method of reliability not stated, therefore assumed to be
test-retest.

[double dagger] Medoc Advance Medical Systems; Ramat Yishai, Israel.

[section] Somedic AB; Horby, Sweden.

[paragraph] Limited information: Absence of sufficient data or
additional measures of precision limits interpretation of estimates of
reliability.

** Marstocknervtest; Schriesheim, Germany.

[dagger] [dagger] Nicolet Biomedical; Madison, Wisconsin.

[double dagger] [double dagger] W. R. Medical Electronics; Stillwater,
Minnesota.

[section] [section] Medelc, Ltd; Old Woking, United Kingdom.

[paragraph] [paragraph] Sensortek, Inc; Clifton, New Jersey.

[1.] Agostinho CM, Scherens A, Richter H, Schaub C, Rolke R, Treede
RD, Maier C. Habituation and short-term repeatability of thermal
testing in healthy human subjects and patients with chronic
non-neuropathic pain. Eur J Pain. 2009;13(8):779-85.

[2.] Becser N, Sand T, Zwart JA. Reliability of cephalic thermal
thresholds in healthy subjects. Cephalalgia. 1998;18(8):574-82.

[3.] Claus D, Mustafa C, Vogel W, Herz M, Neundorfer B. Assessment of
diabetic neuropathy: Definition of norm and discrimination of abnormal
nerve function. Muscle Nerve. 1993;16(7):757-68.

[4.] Felix ER, Widerstrom-Noga EG. Reliability and validity of
quantitative sensory testing in persons with spinal cord injury and
neuropathic pain. J Rehabil Res
Dev. 2009;46(1):69-84.

[5.] Heldestad V, Linder J, Sellersjo L, Nordh E. Reproducibility and
influence of test modality order on thermal perception and thermal
pain thresholds in quantitative sensory testing. Clin Neurophysiol.
2010;121(11):1878-85.

[6.] Krassioukov A, Wolfe DL, Hsieh JT, Hayes KC, Durham CE.
Quantitative sensory testing in patients with incomplete spinal cord
injury. Arch Phys Med Rehabil. 1999;80(10):1258-63.

[7.] Pigg M, Baad-Hansen L, Svensson P, Drangsholt M, List T.
Reliability of intraoral quantitative sensory testing (QST). Pain.
2010;148(2):220-26.

[8.] Wasner GL, Brock JA. Determinants of thermal pain thresholds in
normal subjects. Clin Neurophysiol. 2008;119(10):2389-95.

[9.] Zwart JA, Sand T. Repeatability of dermatomal warm and cold
sensory thresholds in patients with sciatica. Eur Spine J.
2002;11(5):441-46.

[10.] Claus D, Hilz MJ, Neundorfer B. Thermal discrimination
thresholds: A comparison of different methods. Acta Neurol Scand.
1990;81(6):533-40.

[11.] Kemler MA, Reulen JP, Van Kleef M, Barendse GA, Van den
Wildenberg FA, Spaans F. Thermal thresholds in complex regional pain
syndrome type I: Sensitivity and repeatability of the methods of
limits and levels. Clin Neurophysiol. 2000;111(9):1561-68.

[12.] Moravcova E, Bednaik J, Svobodnik A, Dusek L. Reproducibility of
thermal threshold assessment in small-fibre neuropathy patients.
Scripta Medica (BRNO). 2005;78(3):177-84.

[13.] Yarnitsky D, Sprecher E. Thermal testing: Normative data and
repeatability for various test algorithms. J Neurol Sci.
1994;125(1):39-45.

[14.] Yarnitsky D, Sprecher E, Zaslansky R, Hemli JA. Heat pain
thresholds: Normative data and repeatability. Pain. 1995;60(3):329-32.

[15.] Bird SJ, Brown MJ, Spino C, Watling S, Foyt HL. Value of
repeated measures of nerve conduction and quantitative sensory testing
in a diabetic neuropathy trial. Muscle Nerve. 2006;34(2):214-24.

[16.] Bravenboer B, Van Dam PS, Hop J, vd Steenhoven J, Erkelens DW.
Thermal threshold testing for the assessment of small fibre
dysfunction: Normal values and reproducibility. Diabet Med.
1992;9(6):546-49.

[17.] De Neeling JN, Beks PJ, Bertelsmann FW, Heine RJ, Bouter LM.
Sensory thresholds in older adults: Reproducibility and reference
values. Muscle Nerve. 1994;17(4):454-61.

[18.] Dyck PJ, Kratz KM, Lehman KA, Karnes JL, Melton LJ 3rd, O'Brien
PC, Litchy WJ, Windebank AJ, Smith BE, Low PA, et al. The Rochester
Diabetic Neuropathy Study: Design, criteria for types of neuropathy,
selection bias, and reproducibility of neuropathic tests. Neurology.
1991;41(6):799-807.

[19.] Gelber DA, Pfeifer MA, Broadstone VL, Munster EW, Peterson M,
Arezzo JC, Shamoon H, Zeidler A, Clements R, Green DA, Porte D Jr,
Laudadio C, Bril V. Components of variance for vibratory and thermal
threshold testing in normal and diabetic subjects. J Diabetes
Complications. 1995;9(3):170-76.

[20.] Peltier A, Smith AG, Russell JW, Sheikh K, Bixby B, Howard J,
Goldstein J, Song Y, Wang L, Feldman EL, Singleton JR. Reliability of
quantitative sudomotor axon reflex testing and quantitative sensory
testing in neuropathy of impaired glucose regulation. Muscle Nerve.
2009;39(4):529-35.

[21.] Valensi P, Attali JR, Gagant S. Reproducibility of parameters
for assessment of diabetic neuropathy. The French Group for Research
and Study of Diabetic Neuropathy. Diabet Med. 1993;10(10):933-39.

[22.] Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing
rater reliability. Psychol Bull. 1979;86(2):420-28.

ANOVA = analysis of variance, approx = approximately, CDT = cold
detection threshold, CI = confidence interval, CPT = cold pain
threshold, CR = coefficient of reliability, CRCC = Central Reading and
Coordinating Center, CRPS = complex regional pain syndrome, CV =
coefficient of variation, DFNS = German Research Network on
Neuropathic Pain, HPT = heat pain threshold, ICC = intraclass
correlation coefficient, MID = mean intraindividual difference, MISD =
mean intersession difference, MLE = method of levels, MLI = method of
limits, QST = quantitative sensory testing, Rtt = reliability, SC =
staircase method, SCI = spinal cord injury, SD = standard deviation,
TDT = thermal detection threshold, WDT = warm detection threshold.
Gale Copyright: Copyright 2012 Gale, Cengage Learning. All rights reserved.