We designed the OSAS to measure both the amount and quality of use of the affected hand in tasks in which repetitive bimanual use is demanded. It was developed primarily to measure treatment effect in research and clinical practice. In the present study, as a first evaluation, intra-rater, inter-rater and test retest reliability and agreement were determined using ICC, standard deviation of measurement differences and SDD [17, 18].
Because the ICC provides an index that relates to distinguishing patients within a specific group its importance for clinical evaluation of change is limited. However, based on the generally high ICC values in most tasks, it can be concluded that the OSAS has good discriminative capacity in patient groups resembling the study population. For the older children, an exception in this general pattern is the amount of use of both hands for which the variation is small in all tasks, which explains the low ICCs in test-retest reliability. The mean differences between the measurements for inter- and intra-rater reliability are generally small compared to the width of their scales, which indicates good agreement of measurements.
The largest SDD for the amount of use of both hands was 14.5% in the older age group and 30.8% for the younger children. As the SDD uses the same units as the original measurement, its interpretation for clinical use is straightforward. The high amount of use in the older children, in combination with large SDDs leads to a ceiling effect, rendering this measurement non-useful for follow-up. Because the OSAS demands the use of both hands to perform the tasks, this is not surprising. In the younger age group, in which movement patterns are not yet very stable, there was less use of both hands, more variation and large SDDs. This means that amount of use of the OSAS in younger children is not suited for evaluating individual changes but may still be used to compare groups in scientific research.
The ‘quality of reach’ has very low variation, which leads to low reliability in both age groups. The lack of variation may be explained by the fact that children with unilateral CP tend not to reach with their affected hand. If they do, it is only for a second and almost always with the same pattern. For the future it would be better to score the fact that the affected hand reached without an attached quality criterion.
The ICC of quality of use is good to excellent for ‘grasp wrist’, ‘hold fingers’, and ‘hold wrist’ in all children and all tasks. This is also the case for ‘quality of release’ in all tasks in the older age group and the threading beads task in the younger children. They do not yet show very consistent release patterns, which is especially obvious during the Pop-Onz and stacking blocks tasks. The ICC of ‘quality of grasp fingers’ is not good in the Pop-Onz task of the younger, and in the small and large construction tasks (only inter-rater reliability) of the older children. This means that better coaching of the observers in this criterion will be needed. Generally, the reliability for the older age group is better than for the younger children. This may be explained by the fact that older children show more consistent movement patterns. The SDDs of the quality items are generally small, but clearly larger in the stacking blocks task of the younger children. The largest SDDs are found in the ‘quality of grasp’ and ‘hold wrist’ mean score in the stacking blocks task. Therefore this task is not very useful to measure change. Apart from ‘reach’ the SDDs of the quality criteria for the OSAS tasks are low compared to the width of their scales which makes these criteria potentially useful for assessing change in patients.
The OSAS seems to be a useful addition to existing assessments of bimanual functioning for children with unilateral hand function problems, such as the AHA. The AHA measures actual spontaneous use of the affected hand in bimanual performance. With the OSAS, the amount and quality of use of the affected hand can be measured in a precise way, as a measure of capacity. The tasks are designed to force the child to use the affected hand repeatedly, are appropriate for the age group and do not interfere with visual spatial or praxis problems. In contrast to the MUUL, the OSAS measures the affected hand as an assisting hand in bimanual functioning. The simplicity and the short duration of the tasks make the OSAS easier to administer with young children. A disadvantage is that scoring takes longer, 20 minutes per task.
In the present study, 32 children between the age of 2.5 and 16 years were included in the intra- and inter-rater reliability analyses and 26 children in the test-retest reliability analysis. This number is limited. Moreover, children aged 7–11 years old were not included in the present study. However, part of the OSAS was developed for children aged 7–16 years old. Reliability data from this age group will need to be collected.
More agreement data is needed, with adapted scoring of the reach item. Only the frequency of reaching with the affected hand during the task can be scored. The stacking blocks task, which proved to be unreliable, might be removed. Precise coaching of observers is needed, especially for the assessment of ‘quality of grasp fingers’. The next evaluation step is to measure concurrent validity. In the children aged twelve years or younger this is possible with the AHA. Because the AHA is not available for the older age group yet, the Jebsen test [19], which measures speed of movement of the affected hand, could be used instead. Concurrent validity will also be determined with the achievement of treatment goals assessed by the Goal Attainment Scaling (GAS) [20] and performance scores of the COPM [21].
In conclusion, the OSAS appears to be a reliable assessment tool, with good agreement between repeated measurements, for measuring the quality of use of the affected assisting hand in forced bimanual task execution in CP children. Some modifications as mentioned above, may improve agreement, reliability and ease of scoring. More agreement and reliability data should be gathered, and the responsiveness of the scores also needs to be tested.