The effect of coaching on the simulated malingering of memory impairment

Background Detecting malingering or exaggeration of impairments in brain function after traumatic brain injury is of increasing importance in neuropsychological assessment. Lawyers involved in brain injury litigation cases routinely coach their clients how to approach neuropsychological testing to their advantage. Thus, it is important to know how robust assessment methods are with respect to symptom malingering or exaggeration. Methods The influence of different coaching methods on the simulated malingering of memory impairments is investigated in neurologically healthy participants using the Short-Term-Memory Test from the Bremer Symptom-Validierung (STM-BSV). Cut-offs were derived from patients with mild to severe traumatic brain injury. For comparison purposes, the German adaptation of the Rey Auditory Verbal Learning Test (AVLT), and the Rey 15 Items Test (FIT) were additionally administered. Four groups of neurologically healthy subjects were instructed to (1) perform as best as they can, (2) simulate brain injury, (3) simulate brain injury and received additional information about the sequelae of head trauma, (4) simulate brain injury and received additional information on how to avoid detection. Furthermore, a group of patients with mild to severe closed head injury performed the tests with best effort. Results The naïve simulator and the symptom coached groups were the easiest to detect, whereas the symptom plus test coached group was the hardest to detect. The AVLT and the FIT were not suited to detect simulators (sensitivities from 0% to 50.8% at 75% specificity) whereas the STM-BSV detected simulators with 67% – 88% sensitivity at a specificity of 73%. However, the STM-BSV was not robust to coaching. Conclusion The present investigation shows that symptom validity testing as implemented in the BSV-STM is one clinically useful element in the detection of memory malingering. However, clinicians have to be aware that coaching influences performance in the test.


Background
The detection of malingering or exaggeration of impairments in brain function is of increasing importance in clinical neuropsychological assessment. In a forensic setting, an estimated 70% or more of patients assessed by clinical neuropsychologists are suspected to alter their presentations [1,2]. Memory impairment is one common symptom of brain injury that is well-known to laypersons. For example, 82% of the general public are aware that a concussion often results in memory problems [3]. Individuals who attempt to malinger head trauma symptoms often report a variety of memory difficulties [4] and perform poorly on memory tests [5]. Patients with brain injury also often complain of difficulties to remember things [6], and their performance on memory tests is impaired [7].
It is considered standard practice for neuropsychologists in North America to use measures for malingering detection routinely [8][9][10]. In contrast, effort testing has had limited impact on clinical practice in European countries. Notable exceptions are the Amsterdam Short-Term Memory Test [11], adaptations of Green's Word Memory Test [12] to several European languages, and the "Testbatterie zur Forensischen Neuropsychologie" (TBFN; [13]). The TBFN contains 23 tests specifically designed to detect malingering (a computerized version of Rey's 15 Item Test, FIT [14]; an auditory analog version of Rey's 15 Item Test; two tests for the assessment of memory in everyday life; the Bremer Symptom-Validierung, BSV: 19 symptom validity tests to assess perceptual and mnestic impairments). The present study uses an analog design to evaluate the usefulness of the BSV short-term memory assessment subtest, the FIT and the VLMT (Verbaler Lernund Merkfähigkeitstest [15], German adaptation of Rey's Auditory Verbal Learning Test) to detect malingering of memory impairment. Furthermore, the effects of different coaching procedures on classification rates are investigated.

Approaches for the detection of memory malingering
Three basic approaches for malingering detection have been proposed: looking for inconsistencies in test results [16], the use of tests specifically designed to detect incomplete effort, and the application of cut-off values derived from standard neuropsychological tests. The most-widely used groups of tests specifically designed to detect incomplete effort are (a) tests that appear to be more difficult than is actually the case (e.g., FIT), and (b) the symptom validity technique.
Tests appearing more difficult than they actually are The FIT is introduced as a very difficult memory test as it requires to remember 15 different items in a short time. In fact, the test is very simple because of the redundancy among the items, and patients with significant brain impairment can perform the test without much difficulty. The rationale of the test assumes that malingerers are unaware of this fact and reason that, in order to be categorized as memory impaired, they will have to recall only a few items. Thus, patients with brain impairment will do well on the FIT, whereas malingerers perform poorly and can thus be identified [17][18][19].

Symptom validity testing
In the symptom validity technique, each item has a 50% probability of obtaining a correct response when guessing. Theoretically, a person scoring below chance is most likely malingering. Prominent examples of this technique are the Test of Memory Malingering (TOMM; [20]), and the Portland Digit Recognition Test, [21]).
Symptom validity tests require that participants believe that they have to perform a difficult task. If the malingerer does not realize that the task is easy, he will perform poorly. However, if the patient notices that the task is easy, he might recognize the attempt to detect malingering and, thus, perform normally on the task. In this context, it is interesting that 48% of US lawyers believe that they should provide information about psychological tests to their clients [22], and that lawyers involved in brain injury litigation cases indeed do this regularly [23]. Furthermore, the internet provides an easily accessible source of information about tests of malingering detection that can be used by patients to prepare themselves for a neuropsychological assessment [24]. Thus, litigants may well be aware of the rationale of symptom validity testing.

Standard memory tests
One approach to overcome these criticisms of the symptom validity tests is the use of measures derived from standard neuropsychological procedures. Studies on the usefulness of cut-offs derived from standard neuropsychological tests have yielded mixed results. For example, the Rey Auditory Verbal Learning Test (AVLT) has been used to detect poor effort in personal injury litigants. Various measures derived from the test have been studied. Classification rates ranged from 13% to 76% at specificities of 90% or above. Note that only two studies report one measure each with a sensitivity above 70% (data taken from [25]). It seems that bona fide patients do perform poorly on this test of memory function making the discrimination between real and malingered deficits difficult. However, Barrash et al. [26] proposed an extended version of the AVLT (additional recognition trial after 60 minutes) that yielded better results. We included the German version of the AVLT (i.e., the VLMT) in our assessment as it is a commonly used memory test that can be administered in a reasonable amount of time.
Other measures derived from a variety of standard neuropsychological tests yielded more promising results (for a recent review, see [26]).

The effects of coaching on the detection of memory malingering
As stated above, coaching affects the validity of tests of memory malingering. Thus, it is important to study effects of coaching on different single or combined measures used to detect feigned memory performance. In previous research employing analog designs, healthy participants were instructed to feign memory impairment and were provided with different amounts of information on the sequelae of brain injury and on the procedures involved in neuropsychological testing (for notable exceptions including a group of brain-injured patients, see [27,28]). For most of the studied measures, naïve malingerers and malingerers who received information about the most common symptoms of brain injury (symptom coached simulators) were relatively easy to detect. In contrast, symptom-plus test-coached simulators (subjects receiving information about symptoms, a warning that neuropsychological testing includes tests designed to detect memory malingering, and some hints on how to perform on neuropsychological tests to avoid detection) were quite successful in passing the tests (for a recent review, see [29]). However, these studies only investigated measures derived from single tests [30].

The present study
In the present study, the usefulness of the BSV-STM for the detection of memory malingering was investigated in an analog design using German participants. Furthermore, the sensitivity of measures derived from a standard test of memory function (VLMT) and of a test appearing more difficult than it actually is (FIT) were assessed for comparison purposes. Finally, the influence of coaching on test performance was explored. To this purpose, four groups of healthy subjects received different instructions one week prior to testing (best effort, BE; naïve simulators, NS; symptom coached simulators, SS; symptom plus test coached simulators, TS). Furthermore, a group of inpatients with mild to severe closed head injury performed the test with best effort.

Participants
123 undergraduate students or young professionals (n = 7; all holding a university degree) were randomly assigned to one of four groups (n = 33 best effort group; n = 29 naïve simulation group; n = 30 symptom coached group; n = 31 symptom plus test-coached group; for group description see below; see table 1 for demographic characteristics). All subjects were free of neurological diseases (past or present), had normal or corrected to normal visual acuity and were right handed. An additional group of 33 inpatients of a neurological rehabilitation clinic in Magdeburg performed the test with their best effort (PAT). Patients received rehabilitation after mild to severe closed head injury (mild: 3, moderate: 5, severe: 25; mean duration of coma: 27.3 days, range 0 to 240 days; mean duration of retrograde amnesia: 29.3 days, range 0 -500 days; mean duration of anterograde amnesia: 10.7 days, range 0 to 45 days; for demographic information see table 1). Time from injury to assessment ranged from 1 month to 145 months (1 -3 months: 10 patients, 4-12: 10, 12-24: 6, 24-48: 2, 48-120: 2, > 120: 3). None of the patients was currently involved in litigation. Patients were encouraged to perform the tests with their best effort and were assured that the results are only used for therapy planning in the clinic and, in an anonymous form, for a scientific study (no further explanation were given concerning the purpose of the study). The students received course credit for their participation whereas the young professionals were not compensated. Patients performed the tests as a part of routine neuropsychological assessment in the rehabiliatation clinic.
Due to missing information on formal school education for the patients, we cannot provide statistical information on possible differences. However, all healthy participants had at least 13 years of schooling, whereas most of the patients had jobs requiring 10 years of schooling plus at least three years of additional vocational training. Thus, the patient group most likely had less school education compared to the healthy subjects.
The study protocol was approved by the ethical committee of Magdeburg University. All participants gave written informed consent prior to psychometric testing.

Procedure
One week prior to testing all participants received a sealed envelope containing the instructions that differed according to group assignment. The best effort group (BE) was instructed to perform the given cognitive tests as best as they can. The naïve simulation group (NS) received the following scenario: "Imagine that you were involved in a car accident in which another driver hit your car. You were knocked unconscious and woke up in a hospital. You were kept overnight for observation and the doctors told you that you experienced a concussion. Imagine that after the accident, you are involved in a lawsuit against the driver of the other car. If you are found to have experienced significant injuries as a result of the accident, you are likely to receive a bigger settlement. You have decided to fake symptoms of a brain injury in order to increase the settlement you will receive. As part of the lawsuit, you are required to undergo cognitive testing to determine whether or not you have experienced a brain injury. If you can successfully convince the examiner that you have experienced significant brain damage, you are likely to get a better settlement. In the tests that you will have to undergo, I would like you to simulate brain damage, but in a believeable way, such that your examiner cannot tell that you are attempting to fake a brain injury" (presented in German; after [5]). The symptom coached group (SS) received additional information about the typical sequelae of brain injury (such as concentration and memory problems, headache, sleep disturbances etc). The symptom plus test coached group (TS) furthermore received the following information on how to approach testing: -Tests that appear to be easy most likely are easy and can be solved by people with brain injuries.
-Performance of people with brain injury is consistent, i.e., try to perform equally well/equally bad in all tests that you will have to complete.
-Try not to perform too bad as most people with brain injury can at least answer some items in the tests that will follow.
Furthermore, participants were instructed not to talk to the examiner about their group assignment. After testing, the examiner debriefed the subject and the envelope containing the instructions was returned to the examiner. Furthermore, a postexperimental questionnaire was given to the subject asking how they approached the task and checking for compliance with the instructions. None of the subjects had forgotten the instructions given one week earlier.
The patient group was tested in the rehabilitation clinic with the instruction to perform the tests as best as they can.
The Test d2, the BIS, and the subtests of the TAP are commonly used standard neuropsychological tests in Germany. These were included to create a test situation that resembles standard cognitive testing. The results of these tests are not reported further in this paper.

Test description and measures used to detect malingering
Only the three reported tests (VLMT, FIT, and Short-term memory form A from the BSV) are described in detail.
VLMT [15] The VLMT is a German adaptation of the Rey Auditory Verbal Learning Test and consists of 15 words that are read one at a time by the examiner at a pace of one word per second. The examinee has to recall all the words that he can remember. This procedure is repeated five times. Then, a second list of 15 words is read and has to be recalled (interference list). In the 6 th trial, the original list must be recalled once again but is not read by the examiner. After 30 minutes, a delayed recall trial (trial 7) is performed followed by a recognition task. The recognition list contains the 15 original words, the 15 words from the interference list and 20 semantically or phonologically related new words.
The following measures were derived from the VLMT: supraspan (number of correctly recalled items in trial 1), number of recalled items in trials 5, total number of correctly recalled items in trials 1 to 5; number of correctly recalled items of the interference list; number of correctly recalled items in trial 6 (after the interference list), number of correctly recalled items after 30 minutes (delayed recall), loss due to interference (trial 6 -trial 5), loss due to forgetting over time (trial 7 -trial 5), number of correctly recognized words, corrected recognition (correctly recognized items -recognition errors), number of items at least three times recalled in trials 1 to 5 but not recognized, number of times the first word was recalled in trials 1 to 5, number of times the last word was recalled in trials 1 to 5.

Short-term memory form A from the BSV (STM-BSV)
This computerized test encompasses 100 trials consisting of two pictures each. The first picture contains a simple line drawing, whereas the second picture contains two complex line drawings. In one of these two pictures, the object shown on picture one is embedded. In a two-alternative-forced choice procedure, the participant has to decide which of the two drawings contains the object presented in picture one. Response times and accuracy are recorded. The test material consists of 20 different pictures that are presented in a randomized order. Each stimulus is repeated five times for a total of 100 trials. The following measures are analyzed: total correct responses and response time for correct responses.

Rey 15 Items Test
The FIT was performed according to standard instructions. Furthermore, we included the recognition trial developed by [14]. A sheet of paper containing 30 items (15 targets and 15 distracters that are similar to the original stimuli, e.g., the letter d) is given to the subject. The items that were presented in the learning phase have to be marked. The following indices were derived from the FIT: number of correctly recalled items, number of correctly recognized items, combination score: number of correctly recalled items + number of correctly recognized items.

Data analysis
To determine which variables could discriminate between the five groups, we computed one-way ANOVAs with the factor GROUP (best effort, simulants, symptom coached, symptom + test coached, patients) and the Scheffé-contrasts for all variables.
Sensitivities and specificities were then computed at different cut-off values for all variables [34]. Cut-off scores were determined on the basis of patients' performance.
Parametric statistics were chosen for our analyses. As not all variables were distributed normally, we also conducted the respective nonparametric analysis. However, since all of the results of these two procedures were similar in magnitude and direction, we chose to report the results that we consider to be more user-friendly to clinicians, which are the parametric results. Table 2 shows the means, standard errors and ANOVAresults for all variables. In general, the BE group performed best, followed by TS and PAT that both performed better than NS and SS. All one-way ANOVAs with the between subjects factor GROUP yielded a main effect of the GROUP-factor indicating that these variables could, in principle, discriminate between the different instruction conditions (see table 2).

Group comparisons
For most variables of the STM-BSV, VLMT and FIT, Scheffé-contrasts showed that (1) BE and PAT differed reliably from the simulation groups (exception: VLMT trial 6), (2) NS and SS groups performed worse than the TS group (exceptions: BSV-STM RT, VLMT trial 1, VLMT interference list, VLMT trial 6, VLMT trial 7, VLMT trial 6-5, VLMT trial 7-5), and (3) NS and SS groups did not differ (exception: VLMT corrected recognition). Thus, NS and SS groups were combined to form a new group of 59 subjects hereafter termed NSS (naïve and symptom coached simulators). Tables 3 to 5 show the sensitivities and the specificity for each group and each variable of the VLMT, the FIT, and the STM-BSV. For these computations, data from the patient group was used to define the cut-off values. Thus, the column "specificity" lists the percentage of patients correctly classified as non-simulators and the percentage of subjects in the best effort group correctly classified as non-simulators at the respective cut-off value. The column "sensitivity" lists the percentage of subjects correctly classified as simulators in case of the NSS and TS groups. Overall, sensitivities were best for the STM-BSV-variables, whereas FIT and VLMT did not yield acceptable sensitivities.

Sensitivity of the tests to detect memory malingering
For the VLMT, at a specificity of 72-75%, the variables trial 7 (delayed recall), the corrected recognition score, the recognition score, and the total of trials 1 to 5 yielded the best classification rates (47.5% to 66.1% for the NSS group). For the FIT, the combination score (recall + recognition) provided the best results. For the STM-BSV, both variables (total correct responses, RT correct responses) yielded good sensitivities. For all variables, sensitivity for the NSS group was greater than for the TS group. Furthermore, the BE group participants were correctly categorized in at least 94% of cases by all variables.
The VLMT yields 11 scores. Only three of these scores showed sensitivities above 45% at a specificity above 70%  for the simulator groups. We computed a combination score out of the three best VLMT-variables. This score (VLMT1 = VLMT trial 7 + VLMT trial 1-5 + VLMT corrected recognition score) classified 52.5% NSS and 26.7% TS participants correctly with a specificity of 75% (see table  6).
VLMT-indices that require to keep track of previous responses (number of items at least three times recalled but not recognized) or knowledge about concepts of memory functioning (i.e., serial position effects; number of times the first and the last word are recalled in trials 1-5, respectively) were not superior to standard VLMT-variables in the detection of malingering.
Apart from the empirically derived cut-off values, the STM-BSV classifies the performance of the subjects based on the number of errors and on a probability analysis [13]. 29 of 33 BE participants and 29 of 33 non-litigating, non-simulating patients passed the test (corresponding to a specificity of 87.8%), while 40 of 59 NSS (corresponding to a sensitivity of 67.8%), and 14 of 31 TS (sensitivity 45.2%) participants failed the test. Using the cut-off scores derived from patient performance, 88.1% of the NSS and 67.7% of the TS group participants were correctly classified at a specificity of 73%.
To examine whether a combination score derived from the best measures of the three memory tests employed in the present study is useful for the detection of memory malingering, we developed the following combination score: comb1 = (VLMT1 + FIT comb + STM-BSV total correct responses)/3. At a specificity of 75%, this combination score classified 57.6% NSS, and 20% of the TS group participants correctly (see table 6).
Positive and negative predictive power (PPP and NPV, respectively) are diagnostic classification statistics that can be helpful in clinical decision making. PPP is the probability of the presence of a disorder (of malingering) in case of a positive test finding, NPP is the probability of the absence of a disorder (of malingering) given a negative test finding. Information on mathematical computing can be found in [34]. Table 7 shows NPP and PPP for the different cut-offs at a base-rate of 57.7% corresponding to the overall percentage of subjects instructed to simulate brain injury in the present study (90 of 156 participants).
NPP and PPP depend on the base-rate of the condition of interest [34]. Unfortunately, reliable data on the base rate of malingering of neurocognitive symptoms in Germany is not available. Given the differences in litigation legislature in Germany and in the U.S., we consider it inappropriate to rely on estimates originating in the U.S. Thus, table 8 shows NPP and PPP at three different base rates of malingering (10%, 20% and 30%). Please note that NPP and PPP are computed on the basis of sensitivities and specificities that include coached simulators. Thus, NPP and PPP reflect the fact that the tests used to detect malingering in the present study are all susceptible to coaching.

Discussion
The present study examined the usefulness of the BSV-STM for the detection of feigning memory impairments. Furthermore, the influence of different coaching methods on the accuracy of simulation detection was investigated in an analog design. Four groups of neurologically healthy participants and a group of brain-injured inpatients of a neurological rehabilitation clinic were given the VLMT, the FIT, and the STM-BSV as part of a larger neuropsychological test battery. To reiterate, besides a best effort group, three simulator groups with different levels of prior information (naïve, symptom coached, symptom plus test coached, NS, SS, TS, respectively) were created. Overall, the NS and the SS were the easiest to detect and did not differ in performance, whereas the TS group was the hardest to detect. The scores derived from the used symptomvalidity test, the STM-BSV, showed the best sensitivity but was sensitive to coaching. A standard neuropsychological test, the VLMT, and the FIT as well as the combination scores derived from several tests failed to provide acceptable sensitivities.
This is the first study investigating the usefulness of the STM-BSV for detection of incomplete effort in an analog design. This symptom validity test yielded a satisfactory specificity (all participants of the best effort group passed the test). However, the BSV-STM was sensitive to coaching: at a cut-off of < 98 correct responses (corresponding to 73% specificity) 88.1% of naïve and symptom coached simulators were detected. In contrast, only 67.7% of a group receiving additional information on how to See text for an explanation of the variables and abbreviations. approach effort testing were categorized correctly. Thus, one has to be aware that the clinical usefulness of the STM-BSV in detecting memory malingering can be diminished if the subject is informed how to approach effort testing.
All other measures used to detect memory malingering were also sensitive to coaching, and this is especially true for the symptom plus test coached group. Thus, it seems that irrespective of the method used, sophistically coached memory malingerers are hard to detect (see also [29]).
In clinical practice, malingering tests should have a specificity of at least 90%. Note that in the current study, cutoffs were derived from the performance of patients with mostly severe closed head injuries. Thus, the cut-offs derived from these data can be seen as quite conservative and it might be acceptable to lower the specificity required under such conditions. However, the main findings of the present study also hold at a specificity of 90%: (1) the BSV-STM yields the highest sensitivities but is sensitive to coaching (sensitivities: Previous studies using the FIT and the VLMT for malingering detection have yielded mixed results, but most are generally in line with our observation of sensitivities that are too low for clinical use in memory malingering detection. For example, using the FIT, Boone and coworkers also reported rather low sensitivities at a specificity above 85% ranging from 5% to 86% (for references, see tables 1 and 2 in [14]). Given these findings, several attempts have been made to improve the FIT. Boone and colleagues [14] introduced the recognition procedure and showed that a cut-off value of < 20 of a combination score (sum of cor-rectly reproduced and correctly recognized items) improved sensitivity to 71% (at >= 92% specificity). However, for the current data set, this combination score did not enhance sensitivity compared to the recall score (cutoff < 22: from 37.3% to 34% in the present study for the NSS group; from 16.1% to 9.7% at 70% specificity for the TS group; from 59.2% to 71% at 95% specificity in Boone et al. [14]). Thus, we could not replicate the usefulness of the addition of the recognition trial.
To our knowledge, this is the first study of simulated memory impairment detection using the VLMT. However, as the VLMT is the German adaptation of the AVLT, it might be possible to compare these two tests. In the present study, delayed recall, the sum of words recalled in trials 1 to 5, and the corrected recognition score yielded the best classification rates (45.8% to 66.1% for the NSS group, 29% to 51.6% for the TS group at 75% specificity). For comparison, [16] report sensitivities of 40.4% (trial 7) and 21.3% (corrected recognition) in a sample of real world suspected malingerers at above 90% specificity. Thus, we obtained grossly comparable sensitivities albeit at a lower specificity that is most likely caused by using a sample of mostly severe head injured patients for deriving the cut-off scores in the present investigation.
In contrast to previous work [35], the presence of primacy-and recency-effects could only detect the most "severe" cases of memory malingering in our study. We tried several operationalizations of the presence of serial position effects (sum of words 1-5, 6-10, 11-15 recalled in trials 1-5, serial position effect present if both, recall of words 1-5 and 11-15 is larger than recall of words 6-10; number of times the first word is recalled in trails 1-5; number of times the last word is recalled in trails 1-5) none of which yielded satisfactory sensitivities. Most patients with moderate to severe head injury were able to recall the first and the last word of the list at least 4 times, but most of the subjects instructed to malinger memory impairment also did. Furthermore, words that were recalled at least three times in trial 1-5 but not recalled have been proposed as an index of memory malingering (termed index 1 by [36]). In the present study, this index was slightly superior to the best standard indices of the VLMT indicating that more complex measures requiring to keep track of previous memory performance are good candidates for the detection of feigned memory impairment. Furthermore, the inclusion of a second delayed recognition trial as proposed by Barrash and colleagues [26] in the extended AVLT might improve the usefulness of the VLMT in the detection of memory feigning.
The NPP and PPP values shown in tables 7 and 8 can be used to assess the usefulness of the different variables in clinical decision making. Note that these tables are com-  Note: Sensitivity and specificity are computed from the combined values of the patients and best effort groups (specificity) and the combined values of all simulator groups (sensitivity). puted on the basis of sensitivities and specificities derived from the complete sample, i.e. including the coached simulators. Thus, the values reflect the difficulties of the tests to detect coached malingerers. We think that computing NPP and PPP in this way (and not separately for each group) is more appropriate to the situation of the clinician who does not know whether an examinee was coached prior to the assessment session.

Methodological limitations of the study
Several methodological limitations of this study have to be considered. First, the present findings are limited by the use of a simulated malingering design. More specifically, the use of simulators might decrease generalizability of the results. However, some evidence in support of the simulation design has been presented showing that student malingerers perform similar to mild traumatic brain injury patients [37,38]. In the present study, however, patients performed better in most tests compared to the student simulators. Furthermore, the use of student populations who have no financial incentive to simulate malingering may also limit generalizability. Research shows that financial compensation does affect patients' performance in clinical contexts [39,40]. Thus, most likely the absence of significant financial incentives for the participants in the present study influenced their performance. Furthermore, it has to be considered that a sample of university students with above-average intelligence has been employed. Thus, it is an open issue whether the same results would be obtained with a sample of simulators of average or below-average IQ.
The cut-offs used in the present study to compute the sensitivity and the specificity of the tests for the detection of memory malingering were derived from a sample of nonlitigating patients with mild to severe closed head injury that were instructed to perform the neurocognitive assessment with their best effort. It has been shown that a considerable percentage of patients in such heterogenous samples perform below the cut-off suggested by the test developers [41]. Thus, our use of cut-offs derived from such a sample can be considered as conservative. Moreover, it increases the clinical utility of the findings.

Conclusion
The present analog study is the first to document the usefulness of the STM-BSV as a test of memory malingering. However, clinicians have to be aware that the BSV-STM is sensitive to coaching. Furthermore, we showed that the FIT and the VLMT are not clinically useful for the detection of memory malingering when cut-offs derived from real-world, mild to severely head injured patients are used.