The Swedish SCOPA-AUT was found to be relevant and easy to understand and use by people with PD, and respondent burden was acceptable. In addition, our psychomtric findings largely mirror, but also expand on those in the RMT based Spanish study [11], showing generally good fit to the Rasch model. However, limitations with the SCOPA-AUT were also identified, i.e., items represent more severe dysautonomia than that reported by the sample, response categories do not work as intended in a majority of items. Furthermore, there was minor DIF by gender in four items, as well as local dependency that could be resolved by taking account for the subscale structure in the analysis.
Similarly to the study by Forjaz et al. [11], the sample did not represent the full range of item locations with few persons reporting more severe dysautonomia. Our sample also appear to represent people with somewhat milder dysautonomia compared to those in the study by Forjaz et al. (mean location -1.434 and -1.068 respectively) [11]. This implies two things. First, both studies have some limitations in the ability to assess the measurement properties of the SCOPA-AUT, particularly towards the upper end of the scale. Future studies of the SCOPA-AUT should therefore attempt to include persons with more severe dysautonomia in order to better evaluate the full instrument. Second, persons with PD who report low levels of dysautonomia are not measured very well by the SCOPA-AUT. While it may be argued that this is of less concern and it is more important to be able to capture those with more pronounced problems (who arguably also would be the ones primarily targeted by various therapies) it remains a measurement limitation of the instrument.
We found reversed response category thresholds in a majority of SCOPA-AUT items, indicating that the response scale does not work as intended and that the current scoring may be unjustified [23]. The most likely reason for the disordered thresholds is that respondents fail to discriminate between two adjacent categories (“sometimes” or “regularly”). In accordance with our observations, Forjaz et al. [11] also found disordered thresholds for most items of the Spanish SCOPA-AUT. The significance of disordered thresholds is under debate [24,25,26,27,28]. However, since the ordering of categories reflects the respondents understanding of what it means to have more or less of the property and categories are assumed to be operating as intended, evidence of proper ordering is critical [27]. Since the same response scale is used across all items the threshold reversal is probably generalizable rather than incidental [23]. Thus, available data presented here and elsewhere [11] suggest that revision of the SCOPA-AUT response categories should be considered, either by reducing the number of response categories and/or by rewording category wording.
One item (22, “sexual arousal”) exhibited poor model fit. Inspection of item responses relative to its expected ICC revealed a divergent pattern among women while the responses for men followed what was expected. That is, despite increasing dysautonomia severity, women tended to have unchanged or even decreasing problems with “sexual arousal”. Other studies have reported similar findings, i.e. despite an increase in all SCOPA-AUT items with disease severity there is no increase in sexual dysfunction among women [4, 7], and another study found no association between sexual dysfunction and disease severity among women while it was significant among men [6]. Regardless of the explanation for these observations, they suggest that this item should not be merged but kept separate for men and women.
There was no DIF by age but four items exhibited evidence of DIF by gender. In two items women were more likely to endorse a positive response and in the other two it was the opposite. This probably contributed to the relatively minor importance of the observed DIF and the fact that there was no DIF when the SCOPA-AUT was analyzed as six subtests. It has been argued that if external information facilitate the understanding and interpretation of real DIF, resolving DIF may threaten content validity [29]. In the case of SCOPA-AUT the DIF found in four items can be clinically justified. That is, heat intolerance (item 21) and urinary incontinence (item 9) would be expected to be more common among women due to, e.g., menopause and weakened sphincter muscles, respectively, and weak stream of urine (item 11) would be expected to be more common among men due to prostate enhancement. This has implications for how best to deal with these instances of DIF [30]. If these problems are considered important for the measured variable (dysautonomia), regardless of whether they appear due to autonomic dysfunction or because of other reasons (e.g., menopause), adjustment for DIF would compromise measurement validity [30]. Regardless, in this particular case the observed DIF appear to be of minor practical importance, particularly given the lack of DIF when taking the subscale structure into account, but should be kept in mind in future studies of the SCOPA-AUT.
As in the original item level analysis, there was good fit between data and the Rasch model when taking the subscale structure into account by creating subtests representing the suggested SCOPA-AUT subscales. However, the dispersion of person locations shrunk and reliability dropped considerably, which both are signs of the local dependency identified in the first analysis [31]. That is, the initial reliability estimates are inflated. This aspect of the measurement properties was not addressed in the previous RMT based study of the SCOPA-AUT [11]. Instead, the use of subscale scores was questioned due to support of unidimensionality of the full instrument [11]. However, this argument may be challenged because unidimensionality is a relative rather than an absolute concept that, among other things, depends on the conceptualization of the measured variable [32]. This may be illustrated by the metaphor of a rope, where a variable (e.g., dysautonomia) is thought of as a thick rope (cf. the SCOPA-AUT) that is made up by finer ropes (cf. subscales) that in turn consist of even finer threads (cf. items) [25]. Although our findings also partly support unidimensionality of the SCOPA-AUT, the measurement properties of its individual subscales remain to be examined.
The hierarchical subtest order, as determined by their respective locations and taking the respective uncertainties (95% CIs of locations) into account, revealed that sexual and urinary problems were easier than cardiovascular, thermoregulatory and pupillomotor symptoms, which in turn were easier than gastrointestinal problems. This pattern appears to make general sense from a clinical perspective and broadly corresponds with findings from a retrospective clinicopathological cohort study from the UK, where urinary problems were the most common and sweating abnormalities and upper gastrointestinal dysfunction were the least common, with orthostatic hypotension appearing in the mid-range [33].
Our PD sample represents people in relatively early stages of PD, also compared to those reported in a previous SCOPA-AUT evaluation [11]. However, our observations, including the observed item hierarchy, are in general agreement with those reported previously [11]. Thus, while generalizations to the PD population at large should be made with some caution, our findings suggests that the SCOPA-AUT is conceptually stable across Spanish and Swedish cultures/languages and appears appropriate for use in early PD. This is important since dysautonomia occur and increase throughout the disease trajectory [4, 33, 34], and useful scales need to be applicable in all stages of PD. However, further assessments of the measurement properties of the SCOPA-AUT should aim to include people with more severe PD and, particularly, with more pronounced dysautonomia in order to better elucidate its properties at the more severe end. Our data did not allow for assessment of test–retest stability. However, we did not detect any DIF by time, indicating that items work the same way over time. This often overlooked aspect is considered at least as important as test–retest stability, and a prerequisite for meaningful test–retest evaluation [17]. In addition, autonomic dysfunction in PD could be due to other conditions, which could not be distinguished in this study. However, the SCOPA-AUT is intended to assess dysautonomia and alert clinicians of the potential need for further examinations [4]. Therefore, we do not consider this a major limitation. A further limitation is that there were no other demographic or clinical details in common than those presented in the paper (i.e., age, sex, time since diagnosis, and Hoehn & Yahr stages), since data were combined from two different studies. Finally, our samples did not undergo any objective autonomic testing, which precludes analyses of the relationship between patient-reported and objectively measured dysautonomia.