An external validity study of the visual memory subtest of the test of visual perceptual skills, Third Edition.
Abstract: Purpose: The purpose of this study was to compare scores on the Visual Memory subtest of the Test of Visual Perceptual Skills, Third Edition, with those on a Draw-from-Memory task.

Method: The external validity of the Visual Memory subtest was measured by comparing the subtest scores with performance on a Draw-from-Memory activity. Participants were 80 primary school age children, including 28 identified as learning disabled.

Results: A moderate correlation (r = 0.411) was found, supporting the hypothesis that the two measures are related. However, the learning disabled children accounted for much of the covariance found, possibly due to the mix of verbal and non-verbal disabilities.

Conclusion: The validity of the standardised test based on its correlation with a related activity was only weakly supported.

Key words:

External validity, visual perception, children, assessment.
Article Type: Report
Subject: Memory (Research)
Disabled children (Health aspects)
Visual perception (Research)
Author: Cote, Carol A.
Pub Date: 10/01/2011
Publication: Name: British Journal of Occupational Therapy Publisher: College of Occupational Therapists Ltd. Audience: Academic Format: Magazine/Journal Subject: Health Copyright: COPYRIGHT 2011 College of Occupational Therapists Ltd. ISSN: 0308-0226
Issue: Date: Oct, 2011 Source Volume: 74 Source Issue: 10
Topic: Event Code: 310 Science & research
Geographic: Geographic Scope: United States Geographic Code: 1USA United States
Accession Number: 271053263
Full Text: Introduction

Tests of visual perception are commonly used in paediatric occupational therapy practice (Stewart 2010). Because important decisions regarding a therapy programme may be based at least in part on scores from standardised tests, Brown (2009) advised: 'Practitioners must be confident that the tests and measures that they use to evaluate clients are assessing what they purport they do in a rigorous and robust manner' (p519). This confidence is based on evidence from various types of validity studies. The focus of this study is on the validity of one specific subtest, Visual Memory (VM), from the Test of Visual Perceptual Skills, Third Edition (TVPS-3, Martin 2006), a commonly used standardised assessment.

Several approaches to measuring validity are used by researchers; these can be broadly divided into measures of internal and external test validity. Internal validity looks at the test structure, how items are related and whether the response patterns are consistent with the constructs the test was designed to assess, whereas external validity looks to supporting evidence from sources outside the test (Slaney et al 2009). Information on validity is usually provided in the manual by the test's authors, but outside researchers also investigate the psychometric rigour of a published test. In the occupational therapy literature some measures of internal validity, such as item analysis and internal consistency, of earlier versions of the TVPS-3 have found inconsistent but generally positive results supporting validity (Klein et al 2002, Brown and Gaboury 2006, Brown and Rodger 2009).

Occupational therapists use standardised assessments of visual perception to identify possible underlying reasons for a child's observed performance on school-related tasks (Stewart 2010). These underlying factors may be revealed by the subtests of assessments, such as the TVPS-3, which are purported to reflect the components of visual perception (for example, visual discrimination or figure-ground perception) (Martin 2006). In addition to knowing that tests are psychometrically rigorous, occupational therapists would be concerned that a score obtained on a subtest was providing meaningful information about a child's skill in that specific area. Corcoran (2007) emphasised the importance of defining constructs common in occupational therapy practice, and determining whether a task used as a measurement of a construct is similar to that which is observed in a naturalistic practice setting. In the case of tests of visual perception, would a score on one component truly represent the construct of that component in the children tested? Would it correlate with performance on an actual classroom task? Comparison of the test score with another related naturalistic task is an example of external test validity (Slaney et al 2009).

An important part of a test's development is determining which items best represent the domain under question, referred to as content validity (Beckstead 2009). Authors provide evidence that the test items have been chosen based on analysis of accepted theory or by a survey of experts (Stewart 2010). In the original Test of Visual Perception Skills (Gardner 1982), the author explained: 'Content validity was established by determining the important factors of visual perception and designing items to represent the task demands of each factor' (p15). No further information was provided as to how the stimulus items had been chosen, nor was evidence given to support their representation of the constructs implied by the test names. The more recent TVPS-3 provides some background theory on visual perception in general, but does not address how the test discriminates specific abilities or provide rationale in terms of how the test results can be applied to therapeutic or educational purposes (Ackerman 2010).

The format of many standardised visual perception test items is a black line abstract figure, which is to be matched, disembedded or otherwise recognised from among similar choices. This design is useful in that it constrains the myriad variables inherent in any real life visual activity so that derived scores may be reliably evaluated according to group norms. However, such tasks are unlikely to be found in a child's typical daily experience. The main question addressed in this study is whether a test using an abstract figure with a multiple-choice response provides an indication of how a child would perform on a more ecologically relevant visual memory activity.

In this study, the subtest VM from the TVPS-3 was selected to test for evidence of external validity. Of the seven subtest areas in this assessment, visual memory can be most readily observed in a common task. It is assumed to be the construct involved when acting on a visual task after the stimulus figure has been removed. Other subtest areas (for example, spatial relations) are likely to overlap with other components within a complex task and would be difficult to observe specifically in isolation.

In designing an appropriate comparison task, several considerations were made. One is that visual memory is not a single, isolated ability. Klein et al (2002) found a low relationship between the subtests VM and Visual Sequential Memory, possibly due to the Visual Sequential Memory tasks enlisting verbal strategies for recall. Therefore, the comparison task should be mainly visual, not easily encoded verbally and not presented sequentially. The task should also have characteristics of common elementary school activities that involve visual memory. An example might be visual samples to be followed in an early geometry lesson or art class. Classroom tasks often require output such as drawing or construction, and output can provide detailed evidence of what was remembered. Visual samples are also likely to involve more than one feature, with some features having more salience (is meaningful to the child) than others, along with some features providing context or structure.

Although it is difficult to imagine a classroom task that would be restricted to memory, that is, giving no opportunity to check back on details, visual memory is required for many tasks simply because one cannot hold the stimulus figure and the construction of the figure in visual awareness at the same time. By making the comparison task a memory-only task with the stimulus unavailable after viewing, the visual memory component can be evaluated separately from other strategies a child might employ to do the task. Finally, the comparison task must be scorable.

Based on these assumptions about a typical visual memory activity, the Draw-from-Memory (DM) task was devised for this study. The two pictures, shown in Fig. 1, contain several elements, some with high saliency (for example, an almost-smiley face and common geometric forms) and some with lesser saliency or structural elements (for example, dividing lines). These pictures were inspired by the Rey-Osterreith Complex Figure Test (ROCF, Lezak et al 2004), a test that is commonly used to assess visual memory in adults. The ROCF has many elements and involves a direct copy of the figure prior to the memory drawing. The DM pictures are simpler, having been designed to be a reasonable memoryonly drawing task for young children. The DM pictures were field tested on children and were found to be sufficiently challenging in that they elicited a range of performance levels, but were not overwhelming; some older children were able to reconstruct an entire picture correctly.

The purpose of this study was to compare scores on the VM subtest with performance on the DM. Because both measures reflect the underlying construct of visual memory, it is expected that the correlation of the measures will be strong and thus provide a source of external validity. Children of a wide range of ages, including some identified as learning disabled, were sought for this study, because this diverse group may reveal a relationship between the measures that might not be apparent in a more homogeneous group.



A total of 90 children participated in this study; 10 did not complete both study tasks and thus were not included in the analysis. The final pool included a convenience sample of 52 children considered to be typically developing (ages 5.0 to 11.5 years, mean = 7.3, SD 1.95, 25 male/27 female) who were recruited from four local publicly funded after-school programmes. All children attending the programme were invited to participate through a letter sent home to the parent. The response rate was approximately 50%, but varied by setting. An additional 28 participants were recruited from a private school for children with learning disabilities (ages 8.0 to 16.0 years, mean = 11.3, SD 2.15, 16 male/12 female). All children in the school were invited through a letter sent home; the response rate was 95%. All settings were in northeast Pennsylvania.


The Institutional Review Board of the University of Scranton approved this study (IRB #14-08C), and parental written consent, as well as participant verbal assent, was obtained.


Participants were seen individually by the author or by a graduate assistant. The activity took place at a table in a quiet area of the general purpose room of each school or centre, with the examiner seated opposite the participants. After a greeting, the participant was told that he or she would be shown a picture for just a few seconds, then the picture would be taken away and he or she was to draw it from memory. The first DM picture was presented; even-numbered participants were shown picture A and odd-numbers were shown picture B. Pictures were computer-generated black line drawings, placed inside a clear page-protector within a black binder. Viewing was allowed for about 5 seconds (counted by the examiner, not timed), then the binder was closed. The examiner next provided an 8 1/2 x 5 1/2 inch (21.6 x 14 cm) piece of white paper and a thin marker pen for the participant to draw the remembered picture. After an intervening task (a design copy task not reported on here), the second drawing was presented in the same manner.

Lastly, the VM subset of the TVPS-3 was administered according to the standard instructions provided in the manual. The participant viewed a stimulus figure for 4 to 5 seconds and then chose the exact match from among four choices given on the next page.


The TVPS-3 is a nationally (American) standardised assessment, which reports high reliability and provides evidence of validity (Martin 2006); however, it has been criticised as lacking sufficient reliability information and lacking validity measures, particularly at the subtest level (Ackerman 2010). The VM subtest includes two trial items and 16 test items. A score is generated by the number of correct responses before a ceiling of three out of five errors is reached.

Each DM picture has seven elements, as shown in Fig. 1. Each element was given 2 points if it was well formed and in the correct location, 1 point if the element was distorted but recognisable or in the wrong location, and 0.5 point if it was both distorted and in the wrong location. The overall aim of the accuracy score was to determine how much was remembered, not to judge precision, and therefore leniency was the rule to be certain that credit was given for everything remembered.

The drawings were scored by graduate assistants involved in the project. Another group of graduate students, who were not involved in the project, rescored 20 protocols (40 drawings) using the scoring procedures described above. Agreement was high, r = 0.95.

Data analysis

No significant differences between male and female participants were found on either measure and the order of presentation of the pictures was not a significant factor. These were not considered for further analysis. Scores for picture A (mean = 7.68, SD 3.67) were significantly lower than scores for picture B (mean = 8.45, SD 3.54): paired samples t (79) -2.204, p = 0.03, but the two measures had a high correlation (r = 0.626, p<0.01). The two scores were combined for an overall average score. The raw score of the VM test and the final average score for the DM task were analysed using PASW Statistics 17 package. Participant age and group membership (learning disabled or typical) were additional variables entered.


Scores from both DM and VM showed a significant and moderate correlation with age: DM (n = 80), r = 0.617, p<0.001; VM (n = 80), r = 0.535, p<0.001. For the calculation of the correlation between DM and VM, age was partialled out. A significant and moderate correlation was found: (n = 80) r = 0.411, p<0.001. Separating the group of learning disabled participants from the typically developing participants showed that the relationship between the two tests was stronger for the learning disabled children: typical children (n = 52) r = 0.279, p = 0.048; learning disabled children (n = 28) r = 0.515, p = 0.006.


The purpose of this study was to seek evidence for the validity of a commonly used measure in school-based occupational therapy practice. Having confidence that the measures were rigorous and represent the constructs they purport to was important to the assessment process (Corcoran 2007, Brown 2009). In this study, the validity of the VM was supported by a significant correlation with a naturalistic task, the DM. However, among the typically developing children the correlation was low. The stronger relationship found with the learning disabled children might be explained by the fact that children with language-based disabilities and children with non-verbal learning disabilities were included but not specifically identified. It has been found that children with specific language-based disabilities generally do not score poorly on visual memory tasks (Baird et al 2010), whereas non-verbal disabilities are associated with poor scores on visual memory tests (Alloway et al 2009). Thus the stronger correlation found with this group may reflect differences related to the type of disability rather than reflect the relationship of the two measures.

The results do not provide clear and strong support for the tasks on VM to be representing the same construct as that used in the DM task; therefore, evidence for the external validity of the VM is inconclusive. Yet, both activities fit the assumed definition of visual memory, suggesting that visual memory may not be a singular psychological process, but may vary from task to task. This difference was also evidenced by the poor relationship found between VM and Visual Sequential Memory subtests in a previous study (Klein et al 2002).

A limitation of this study is that the DM tasks were constructed for this study and thus have no reliability or validity data. It is possible that the lack of a strong relationship with VM is due to inconsistencies in this less constrained task. However, the purpose here was to look for relevant applications for the scores on the standardised VM and not to compare these scores to another highly constrained standardised measure.

For future research, it would be important to know whether other visual perception tests (for example, Visual Discrimination) would be more or less correlated with the DM or other memory task. Such information could help determine whether VM uniquely discriminates the construct of visual memory, or if general mechanisms of visual perception are involved in complex tasks. Additionally, research is needed to investigate the external validity of other standardised measures of the various constructs involved in visual perception to provide an additional source of confidence for practitioners in their interpretation of test results.


The results provide evidence for some shared underlying capacity in the VM and DM tasks, but the measures are not correlated strongly enough to add support to the validity of VM as a test of the construct of visual memory when applied to a different task. Visual memory may be complex and varied, and perhaps a single test does not capture this ability in all memory situations.

Key findings

* A low to moderate correlation between the Visual Memory subtest of the Test of Visual Perceptual Skills, Third Edition, and a related visual memory activity was found.

What the study has added

External validity measures can provide practitioners with information about how a standardised test is related to relevant activities within the same domain.


The author wishes gratefully to acknowledge the following former University of Scranton students for their assistance on this project: Lea Bono, Jessica Daw, Nicole Delia, Kaitlyn Mimnaugh and Katie Rhoads. Conflict of interest: None declared.


Ackerman PL (2010) Review of the Test of Visual Perceptual Skills, 3rd Edition. In: RA Spies, JF Carlson, KF Geisinger, eds. The eighteenth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental Measurements, 658-63.

Alloway TP, Rajendran G, Archibald L (2009) Working memory in children with developmental disorders. Journal of Learning Disabilities, 42(4), 372-82.

Baird G, Dworzynski K, Slonims V, Simonoff E (2010) Memory impairment in children with language impairment. Developmental Medicine and Child Neurology, 52(6), 535-40.

Beckstead JW (2009) Content validity is naught. International Journal of Nursing Studies, 46(9), 1274-83.

Brown T (2009) Assessing occupation: the importance of using valid tests and measures. British Journal of Occupational Therapy, 72(12), 519.

Brown GT, Gaboury I (2006) The measurement properties and factor structure of the Test of Visual-Perceptual Skills--Revised: implications for occupational therapy assessment and practice. American Journal of Occupational Therapy, 60(2), 182-93.

Brown T, Rodger S (2009) An evaluation of the validity of the Test of Visual Perceptual Skills--Revised (TVPS-R) using the Rasch Measurement Model. British Journal of Occupational Therapy, 72(2), 65-78.

Corcoran M (2007) Defining and measuring constructs. American Journal of Occupational Therapy, 61(1), 7-8.

Gardner MF (1982) Test of Visual-Perceptual Skills (Non-Motor). Burlingame, CA: Psychological and Educational Publications.

Klein S, Sollereder P, Gierl M (2002) Examining the factor structure and psychometric properties of the Test of Visual-Perceptual Skills. OTJR: Occupation, Participation, and Health, 22(1), 16-24.

Lezak MD, Howieson DB, Loring DW (2004) Neuropsychological assessment. 4th ed. New York: Oxford University Press.

Martin NA (2006) Test of Visual Perceptual Skills, 3rd Edition. Novato, CA: Academic Therapy Publications.

Slaney KL, Tkatchouk M, Gabriel SM, Maraun MD (2009) Psychometric assessment and reporting practices: incongruence between theory and practice. Journal of Psychoeducational Assessment, 27(6), 465-76.

Stewart KB (2010) Purposes, processes, and methods of evaluation. In: J Case-Smith, JC O'Brien, eds. Occupational therapy for children. 6th ed. Maryland Heights, MO: Mosby.

Correspondence to:

Dr Carol A Cote, Assistant Professor, Department of Occupational Therapy, University of Scranton, Linden Street, Scranton, Pennsylvania 18510, USA.


Reference: Cote CA (2011) An external validity study of the Visual Memory subtest of the Test of Visual Perceptual Skills, Third Edition. British Journal of Occupational Therapy, 74(10), 484-488.

DOI: 10.4276/030802211X13182481841985

Submitted: 4 January 2011.

Accepted: 11 August 2011.
Gale Copyright: Copyright 2011 Gale, Cengage Learning. All rights reserved.