The aim of this study was to establish test-retest reliability of the conventional and the GAITRite parameters of the 10MWTpref, the 10MWTmax, and the 6MinWT. Results showed high to very high relative reliability in conventional gait parameters like the time needed to cover 10 m or the distance covered during 6 min. GAITRite parameters recorded during the 6MinWT showed the highest relative and absolute reliability. Second best was the 10MWTpref.
We could not completely confirm our first hypothesis that all the three tests would show good relative reliability. Although the lowest ICC values were still moderate, this is considered not sufficient for tests used to reveal improvement in gait. For the 10MWTmax, only 10 out of 23 parameters showed high or very high relative reliability (ICC ≥ 0.80), while all the parameters of the 10MWTpref and the 6MinWT exceeded this threshold. Concerning our second hypothesis, some parameters showed a low absolute reliability (i.e. considerable measurement errors) between the two sessions. The rather low relative and absolute reliability of the 10MWTmax might be partially explained by its accomplishment. Since we evaluated only the fastest trial, we did not average two or more walks as for the other tests. Averaging several trials reduces variability, which could have increased the reliability. This approach is supported by the observation that the 6MinWT shows the greatest relative and absolute reliability, as there was a mean of 12.67 (SD 4.86, Range 1–22) walks used for analysis in session 1, and a mean of 12.56 (SD 5.64, Range 1–27) walks in session 2.
The results of the reliability of conventional gait parameters are partially consistent with the findings of other studies. Compared to results obtained in children with CP by Thompson et al.  the present study revealed higher ICCs and smaller SEM and SRD for the parameter time during the 10MWTmax. In the study of Thompson et al., a period of up to 4 weeks lay between the time-points of measurement , which could have led to a more variable performance of the participants between the sessions. Despite that the participants performed the 10MWTmax 1.5 s faster at the second time point (3.1 s faster in a subgroup with GMFCS level III), this change was not significantly different . For the walking distance (6MinWT), however, relative and especially absolute reliability was reported to be considerably better than the results found in our study [6, 8]. In the study by Maher et al., the time between the two 6MinWT measurements lasted only 30 min . Retesting within such a short interval might increase reliability due to the dependence of the measurements. Also in the population of children with myelomeningocele, walking distance appeared more reliable compared to our findings . These children were retested two weeks later and therewith the protocol was more comparable to our study . An important difference that might have led to better results in the study by Maher et al. is the higher level of standardisation: the same tester administered the tests at the same time of day . On the contrary, in our study, we wanted to resemble the clinical situation and abdicated on purpose on a high level of standardisation. In the population of children with cystic fibrosis comparable results of agreements were observed, as the bias (i.e. the average difference between the first and second measurement) was −15.9 m and the limits of agreement were 100.9 m and −132.9 m (Bland-Altman plot) .
The results of the GAITRite parameters of the present study are comparable to the results obtained in the study by Wondra et al. , where 80 % of the parameters met the ICC threshold of ≥ 0.80 (single and multiple trials, barefoot and with shoes and orthosis) in children with motor disabilities. Considering heterogeneity between the participants, the samples of their and our study are quite comparable. As ICCs depend on the between-subject variance (i.e. a larger between-subject variance leads to greater ICCs), ICCs are usually high in heterogeneous patient groups. This fact has to be considered when discussing these results. Nevertheless, while Sorsdahl et al.  investigated test-retest reliability of gait parameters in a rather homogenous group of children with CP (GMFCS levels I and II), they also found (very) high relative reliability, except for the parameter step width, which was not evaluated in our study. The study by Morrison et al.  investigating children with developmental coordination disorder. Here, the authors concluded that the wide range of ICC values they obtained could be explained by the variable and inconsistent gait pattern of these children, which could have resulted in low ICC scores .
As previously stated, also our second hypothesis that absolute reliability will show considerable measurement errors could be confirmed. While SEM values for the parameter cadence were comparable to the results obtained by Wondra et al. , the SEM values for the parameter velocity were larger in our study. The time window between the testing might explain this. Wondra et al. performed their tests on one day, which reduces e.g. the influence of the participant’s day’s form, and, therefore, might lead to more reliable performance .
One important aim of this study was to investigate the reliability of the step time- and step length-symmetry, as possible parameters to quantify the quality of gait. While the average symmetry values were quite comparable between the three tests (table 2), the ICC values appeared only excellent for the 10MWTpref and 6MinWT. Sorsdahl et al.  also determined the relative reliability of the asymmetry of the step length and found an ICC of 0.82. Their and our test procedures were different: the participants walked barefoot and without their orthosis at self-selected, slow and fast speed, a total of eight walks over a 5.88 m GAITRite walkway. Start and end were 1.5 m before and after the walkway. In the current study, ICC values were better for the 6MinWT and 10MWTpref, but poorer for the 10MWTmax. Symmetry indices were also evaluated in adult patients with stroke . In that study, a step length asymmetry ratio was calculated. Despite differences in calculation, they reported similar ICC values for step length symmetry (ICC 0.81 for one walk, 0.92 for six walks) .
Nevertheless, absolute reliability values proved to be utterly poor. SRD% of step length symmetry exceeded 100 % in all three tests, for step time symmetry in the 10MWTmax. We conclude, therefore, that these symmetry parameters appeared promising when considering the quantification of gait quality but, unfortunately, they do not appear reliable enough for longitudinal evaluation.
The heterogeneity of children included in this study reflects the population of children with gait impairments in paediatric neuro-rehabilitation, which is important to determine the generalizability of our results. If these assessments are used for research purposes the examination of reliability in specific populations is needed,  since reliability is described as a varying feature and depends on the tested population .
Interestingly, the participants walked on average substantially faster during the 6MinWT compared to the 10MWTpref. We hypothesise that this observation is caused by the test instructions. While, for the 10MWTpref, the children were instructed to walk at a self-selected comfortable speed, they had to cover as much distance as possible during the 6MinWT. Apparently, the children were able to walk at such a higher-than-comfortable speed for 6 min.
We assume that the quality of gait of the most severely affected youths was overestimated by the GAITRite, as, for those with poor walking ability, data of the walk required considerable editing with the GAITRite software. By deleting unclear steps, the quality of the walk improved. Editing might also have introduced a higher susceptibility to a bias of the investigator due to unclear decisions on when and how to edit data. Despite that different people edited the walks, they all performed this according to internally formulated guidelines. However, as each walking pattern has its specific characteristics that cannot be described in such guidelines, editing remains to a certain extent subjective. This might have impacted our results (but also those of other GAITRite studies).
Younger children and those with reduced cognitive abilities, although able to follow test instructions, showed quite large differences between the first and second session. This bias might be largely due to a lack of motivation. Motivational aspects have to be kept in mind since they might strongly affect the reliability of any assessment.
ICCs are largely influenced by between-subject variability, i.e. a high ICC does not necessarily also reflect a high absolute reliability. See for example Fig. 2b and f where heterogeneous distributions of absolute step length symmetry indices of 10MWTpref and 6MinWT result in excellent ICC values despite poor absolute reliability values. In clinical and research practice, it is very difficult to make judgements on improvements in individual patients when you have only information about relative reliability. Therefore, we deemed it necessary to investigate also the absolute reliability, i.e. measurement errors. An SRD value informs you what change a patient should achieve before this could be considered a true change, i.e. above chance. Compared to other studies (for example Wondra et al. ), we chose a relatively conservative way of calculating the SEM and SRD values, because we included systematic bias, which is not always done.
Limitations of the study
There are limitations that have to be mentioned. The sample size of our study was rather small, and the sample was very heterogeneous due to the different diagnoses, severity levels and walking abilities. A heterogeneous sample involves large between-subject variance, which results in high relative reliability. Nevertheless, as we wanted to picture reliability of a clinical setting, we decided to keep the sample as heterogeneous as it was. One advantage of a heterogeneous sample is that results can be generalised to a broader population.
The participants were wearing the same orthosis and used the same walking aids for both measurements. Furthermore, we also tried to schedule the measurement time-points at the same time of the day. Since our study was conducted parallel to the rehabilitation programme, this was not always possible. Other factors such as medication, daily activity, and others that we did not control for, might have influenced reliability to a certain extent.
A few practical issues have to be mentioned when using the GAITRite walkway system. Firstly, small children might not be heavy enough for the walkway . In our experience, GAITRite measurements are difficult with children with a bodyweight of less than 15 kg since there is not enough pressure on the sensors and the GAITRite does not recognize that there is still a walk in progress. The recording of the walk will stop before it is finished. Secondly, editing difficult walks should be standardised by formulating standardised guidelines, so all walks are edited the same way. Thirdly, automatic and manual editing errors occur. Some are obvious (e.g. calculations are done with 7 steps instead of 8 that are shown in the walk); however, the number of non-recognizable errors is difficult to estimate.
To get a reliable result, several repetitions are recommended, and this can be applied to any test. Nevertheless, the motivational factor and the compliance of the person doing the test must be considered. Compliance decreases with the increasing number of trials, especially in the paediatric field. This might influence reliability to a large extent.
Finally, for clinical purposes, we do not recommend the repeated use of the GAITRite walkway during the 6MinWT because due to a large number of walks, the time required for editing and analysing is considerable.