Skip to main content

Prediction of poststroke independent walking using machine learning: a retrospective study

Abstract

Background

Accurately predicting the walking independence of stroke patients is important. Our objective was to determine and compare the performance of logistic regression (LR) and three machine learning models (eXtreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), and Random Forest (RF)) in predicting walking independence at discharge in stroke patients, as well as to explore the variables that predict prognosis.

Methods

778 (80% for the training set and 20% for the test set) stroke patients admitted to China Rehabilitation Research Center between February 2020 and January 2023 were retrospectively included. The training set was used for training models. The test set was used to validate and compare the performance of the four models in terms of area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score.

Results

Among the three ML models, the AUC of the XGBoost model is significantly higher than that of the SVM and RF models (P < 0.001, P = 0.024, respectively). There was no significant difference in the AUCs between the XGBoost model and the LR model (0.891 vs. 0.880, P = 0.560). The XGBoost model demonstrated superior accuracy (87.82% vs. 86.54%), sensitivity (50.00% vs. 39.39%), PPV (73.68% vs. 73.33%), NPV (89.78% vs. 87.94%), and F1 score (59.57% vs. 51.16%), with only slightly lower specificity (96.09% vs. 96.88%). Together, the XGBoost model and the stepwise LR model identified age, FMA-LE at admission, FAC at admission, and lower limb spasticity as key factors influencing independent walking.

Conclusion

Overall, the XGBoost model performed best in predicting independent walking after stroke. The XGBoost and LR models together confirm that age, admission FMA-LE, admission FAC, and lower extremity spasticity are the key factors influencing independent walking in stroke patients at hospital discharge.

Trial registration

Not applicable.

Peer Review reports

Background

Stroke is a major problem in China due to its high morbidity, mortality and disability [1]. Even with timely treatment in the acute phase, patients may still be disabled and require rehabilitation, resulting in a high economic burden [2]. A significant portion of the cost is directly attributable to the inability of stroke survivors to walk independently [3]. 40% of stroke patients who are initially unable to walk are either ambulatory or require assistance with walking three months after stroke [4]. The ability to walk independently is a key factor in a patient’s daily activities and quality of life, and regaining the ability to walk independently becomes an important goal in the rehabilitation of stroke patients with hemiplegia [5,6,7]. It is critical to accurately predict the subsequent recovery of walking ability in stroke patients who are unable to walk independently at the time of admission to rehabilitation [8]. In this way, clinicians and therapists can provide patients with prognosis, goal setting, treatment selection, and discharge planning, and based on accurate prediction of independent walking, the government or the patient’s family can effectively provide appropriate socioeconomic support and health care resources [9, 10].

In the field of stroke rehabilitation, studies on predictive models for walking recovery have been a hot topic [3, 11, 12]. However, some of the predictive models that have been developed are too complex to be used in a clinical setting. Therefore, there is a need to develop simple, reliable, and feasible models for predicting independent walking that can be applied to stroke patients in inpatient rehabilitation. Logistic regression (LR) has been widely used in prognostic studies of stroke patients. LR measures the relationship between a categorical dependent variable and one or more independent variables by using a probability score as the predictive value of the dependent variable [13]. LR is commonly used in predictive modeling of dichotomous outcomes in health care [14]. However, it has several drawbacks, including easy underfitting, difficulty in handling nonlinear relationships, sensitivity to outliers, and possibly poor classification accuracy [15]. Therefore, there may be limitations in applying LR to predictive modeling of prognosis in stroke patients.

As a scientific and mature modeling method, machine learning (ML) is increasingly used in epidemiological research and medicine [16, 17]. With the increasing complexity and number of data sets available, as well as multi-factor data from a variety of sources, the ML is considered to have advantages over traditional regression models [18, 19], including ease of analysis, the ability to consider a large number of variables simultaneously, and to capture complex interactions between variables.

The eXtreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), and Random Forest (RF) are the more mature and widely used ML modeling algorithms. The XGBoost can be used to solve supervised learning problems using a gradient boosting framework with high accuracy, difficulty in overfitting, and scalability [20, 21]. The XGBoost has been increasingly used in healthcare research to predict or screen for prognostic factors. The SVM are one of the most popular supervised learning algorithms used for pattern recognition, classification, and regression analysis [22]. The RF is an integrated learning method that generates a collection of decision trees branching on random variables. By using the majority principle for all trees and branches, RF can make predictions with high accuracy, less overfitting and strong anti-noise ability [23]. However, the optimal model tends to vary across studies and there is a lack of models that use these ML algorithms to predict independent walking in stroke patients.

Therefore, the aim of this study was to investigate the optimal prediction of independent walking at discharge based on clinical data of stroke patients who were unable to walk independently at admission using classical logistic regression methods and three currently accepted ML models (the XGBoost, SVM, and RF model), and to explore variables related to prognosis.

Methods

Overview

This study protocol was approved by the medical ethics committee of China Rehabilitation Research Center (approval number 2022-141-02). Informed consent was not obtained as this was a retrospective, hospital-based study.

Participants

Between February 2020 and January 2023, a retrospective cohort of inpatients admitted to and discharged from the neurorehabilitation unit of China Rehabilitation Research Center for first-onset stroke was studied. Patients were included if they met the following inclusion criteria: (1) were aged ≥ 18 years; (2) had a first-ever unilateral cerebral stroke; (3) were unable to walk independently at admission and had a Functional Ambulation Category (FAC) score ≤ 3. Patients were excluded according to the following criteria: (1) had other underlying neurological diseases; (2) had a diagnosis of disturbance of consciousness; (3) had unstable vital signs; (4) length of stay (LOS) < 14 days; (5) had incomplete required data.

Data

In this study, a total of 1033 patients were screened and 778 stroke patients who met the inclusion criteria were ultimately included in the analysis. The following data were collected from 778 stroke patients (21 variables in total): age (years), sex (male or female), medical insurance (yes or no), LOS, time since onset, type of stroke (ischemic or hemorrhagic), side of stroke (left or right), lesion location (cortical, subcortical, or both), lower extremity deep vein thrombosis (yes or no), emotional disorder (yes or no), cognitive disorder (yes or no), sleep disorder (yes or no), dysphagia (yes or no), aphasia (yes or no), lower limb spasticity (yes or no), FAC score at admission, Fugl-Meyer Motor Assessment of the Lower Extremity (FMA-LE) score at admission, Fugl-Meyer Balance Assessment (FMB) score at admission, National Institutes of Health Stroke Scale (NIHSS) score at admission, Barthel Index (BI) score at admission, and FAC score at discharge. The FAC scale has been widely used to assess walking independence in stroke patients, with six levels (0–5). According to previous reports, stroke patients with fac score > 3 at discharge were defined as “independent walking”, otherwise as “non-independent walking” [24]. In this study, we used “independent walking at discharge (yes or no)” as the response variable, and the remaining 20 variables were used for prediction.

Statistical analysis

The IBM SPSS Statistics software version 25 (IBM Corp, Armonk, USA) was used for data analysis. Categorical variables were presented as frequencies and percentages. For continuous variables, the Kolmogorov-Smirnov test was used to assess data distribution. Continuous variables were expressed as mean ± standard deviation if they fit the normal distribution; otherwise, they were expressed as medians (QL, QU). The χ2 test was used to compare categorical variables, and the Student’s t-test or the Mann-Whitney nonparametric test was used to compare continuous variables.

In this study, all enrolled patients were randomly divided into two data sets with a split ratio of 4:1. Subsequently, 80% of the patients were used for model training and 20% of the patients were used for model testing. Predictive models were constructed using the walking status at discharge (“independent walking” or “non-independent walking”) as the outcome variable. We used the “autoReg”, “XGBoost”, “e1071”, “randomForest” and “caret” packages in R software version 4.2.2 to develop and test the LR, XGBoost, SVM, and RF models. In constructing the classical LR model, we first screened the training cohort for factors associated with “independent walking” using univariate analyses. Subsequently, factors with P < 0.10 in the univariate analyses were included in the stepwise binary LR analysis. Due to the small number of original variables in this study and the fact that variables of lower importance may also have a beneficial effect on the training of the model, all feature variables were included in the training of the XGBoost, SVM, and RF models. The XGBoost, SVM, and RF models were optimized by either 5-fold cross-validation or hyperparameter tuning. In this study, the “pROC” package was used to plot the receiver operating characteristic (ROC) curves and calculate the area under curve (AUC) [25, 26]. The AUC was used to comprehensively evaluate the models, and the AUC of the models were compared by the Delong method [27]. The predictive performance of the models was further evaluated in terms of accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1 score. A two-tailed P value < 0.05 was considered statistically significant.

Results

Patient characteristics

A total of 778 stroke patients randomly assigned to the training set (n = 622) and the test set (n = 156) were finally enrolled in this study (Fig. 1). The characteristics of the training and test sets are shown in Table 1. For all the variables analyzed, there was no significant difference between the training and testing sets. Overall, 107 patients (17.20%) in the training set achieved “independent walking” at discharge and 28 patients (17.95%) in the testing set achieved “independent walking” at discharge.

Fig. 1
figure 1

Flow-chart of participants enrolled in this study

Table 1 Comparison of the demographic and clinical characteristics of all patients, and of those in the training and testing sets

Logistic regression model

Univariate analyses performed on the training set showed that patients who achieved “independent ambulation” at discharge were significantly different from those who did not on the variables of age, lesion location, lower extremity deep vein thrombosis, cognitive disorder, dysphagia, lower limb spasticity, FAC at admission, FMA-LE at admission, FMB at admission, NIHSS at admission and BI at admission (all P < 0.05) (Table 2). Subsequently, based on the results of the univariate analyses, variables with P < 0.10 were included in the stepwise binary LR analysis. As shown in Table 2, four variables (age, lower limb spasticity, FAC at admission, and FMA-LE at admission) were independent determinants of independent walking at discharge for stroke patients who were unable to walk independently at admission. A logistic regression model is constructed from the four influencing factors examined above, and its expression is logit (P) = − 2.15-0.03 × 1 + 1.32 × 2 + 1.61 × 3 + 0.13 × 4. In the formula, x1, x2, x3 and x4 represent age, no lower limb spasticity, FAC at admission = 3 and FMA-LE at admission, respectively. The Hosmer-Lemeshow goodness of fit test result on the training set was 10.211 (P = 0.251) with 8 degrees of freedom, and the Hosmer-Lemeshow goodness of fit test result on the test set was 6.790 (P = 0.560) with 8 degrees of freedom.

Table 2 Univariate and multivariable logistic regression model of study variables vs. independent walking at discharge in the training set

Comparisons of logistic regression and machine learning models

All baseline variables were used in the development of the three ML models (XGBoost, SVM, and RF) for prediction of “independent walking” at discharge. The test set was used to compare the performance of the models. In the LR model, the ROC curve was used to evaluate the discriminative ability of the prediction model, and its AUC was 0.891 (95%CI = 0.828–0.954) in the test set. The AUC of the XGBoost, SVM and RF models are 0.880 (95%CI = 0.818–0.942), 0.659 (95%CI = 0.567–0.751), and 0.713 (95%CI = 0.617–0.808), respectively. Among the three ML models, the AUC of the XGBoost model is significantly higher than that of the SVM and RF models (P < 0.001, P = 0.024, respectively). Although the LR model had a slightly higher AUC than the XGB model in the test set, there was no significant difference in the comparison (0.891 vs. 0.880, z = 0.570, P = 0.569). ROC curves for all models are shown in Fig. 2. Table 3 shows the number of correct predictive values of all models, based on which the accuracy, sensitivity, specificity, PPV, NPV and F1 scores of the LR, XGBoost, SVM, and RF models were calculated. These values together confirmed that the XGBoost model performed best among the three ML models, as shown in Table 4. Compared to the LR model, the XGBoost model had superior accuracy (87.82% vs. 86.54%), sensitivity (50.00% vs. 39.39%), PPV (73.68% vs. 73.33%), NPV (89.78% vs. 87.94%), and F1 score (59.57% vs. 51.16%), and the specificity was only slightly lower (96.09% vs. 96.88%).

Fig. 2
figure 2

Receiver operating characteristic curve for the models

Table 3 Number of correct predictive values of the LR and ML models
Table 4 The performance of the LR and ML models

Predictors selection

Stepwise logistic regression analysis showed that age, lower limb spasticity, admission FAC, and admission FMA-LE were independent predictors of independent walking in stroke patients. The XGBoost model was used to rank the importance of the feature variables, and the top ten variables are as follows: FMA-LE at admission, FAC at admission, age, NIHSS at admission, LOS, FMB at admission, BI at admission, lower limb spasticity, type of stroke, lesion location (Fig. 3).

Fig. 3
figure 3

Features selected using XGBoost and the corresponding variable importance score

Discussion

It is of great importance to accurately predict the walking independence of stroke patients at the time of rehabilitation admission. In this study, we innovatively developed three machine learning algorithm-based models (XGBoost, SVM, and RF) to predict whether stroke patients would be able to walk independently at discharge from the rehabilitation center and compared them with the traditional stepwise LR model. The results show that, overall, the XGBoost model had the best predictive performance.

Most of the previous studies on related topics have used only LR analysis methods to build only one predictive model [28, 29]. However, the conventional LR analysis has its limitations, for example, it cannot well analyze the complex nonlinear relationship between variables [30]. Recently, new machine learning techniques have shown higher predictive performance compared to traditional predictive methods [31]. In this study, three commonly used machine learning algorithms (XGBoost, SVM, and RF) were selected to establish three models for predicting independent walking in stroke patients and compared with the classic LR model. First, the AUCs of the models were calculated and compared. The higher the AUC of the model, the higher the predictive value. Among the three machine learning models, the AUC of the XGBoost model was significantly higher than that of the SVM model and the RF model, suggesting that the overall performance of the XGBoost model was optimal. As a decision tree-based algorithm, XGBoost was voted the best algorithm in a machine learning and prediction competition hosted by Kaggle.com [32, 33]. Due to its best accuracy and performance, machine learning based on XGBoost algorithms has been increasingly taken seriously as a competitive alternative to regression analysis and used to predict clinical outcomes. The AUC of the two models exceeded 0.85 in both the training and test sets, indicating that the overall predictive performance of the models was good. Although the AUC of the XGBoost model was slightly lower than that of the LR model in the test set, the Delong test revealed no significant difference. Previous studies usually used multiple indicators to evaluate model performance [34, 35]. Thus, we further compared the accuracy, sensitivity, specificity, PPV, NPV, and F1 scores of the two models in the test set. Our results demonstrated that, taken together, the XGBoost model performed better than the LR model. Therefore, it was recommended that the XGBoost model be used to predict whether stroke patients who were unable to walk independently at the time of rehabilitation admission would be able to walk independently at discharge. We also suggested that future studies could consider using the XGBoost algorithm to predict other functional outcomes in stroke patients.

Step logistic regression analysis showed that age, lower extremity spasticity, FAC at admission and FMA-LE at admission were independently associated with independent walking at discharge in stroke patients. The XGBoost model ranked the importance of the variables, and the top 10 variables were FMA-LE at admission, FAC at admission, age, NIHSS at admission, LOS, FMB at admission, BI at admission, lower limb spasticity, type of stroke, lesion location. Together, the two models determined that the key variables affecting independent walking in stroke patients at discharge were age, FMA-LE at admission, FAC at admission, and lower extremity spasticity. A review of 15 studies that explored which factors predicted independent walking at 3, 6, and 12 months for in non-ambulatory people within one month of stroke, and found that younger age predicted independent walking at 3 months [3]. Similarly, we found that the younger the stroke patient, the more likely they were to walk independently at discharge. The same conclusion was also reached by Kennedy et al. [36] and Hirano et al. [12] This study also found that the presence of lower extremity spasticity prevented patients from achieving independent walking at discharge. A recent study, which found that moderate levels of plantar flexors spasticity resulted in the highest sensitivity for predicting poor gait speed performance and the highest specificity for predicting good mobility performance in post-stroke patients, supported our findings to some extent [37]. This study also showed that patients with FAC = 3 at admission were 5.01 times more likely to achieve independent walking at discharge than those who were unable to walk at all, which was consistent with the findings of Louie et al. [38]. They found that those with any ability to walk at admission (with or without therapist assistance) were 9.48 times more likely to be discharged home than those who were unable. In addition, we found that lower limb motor function was an important factor in independent walking. Hiratsuka et al. also found that lower limb motor function was an additional predictor of independent walking in a 30-day poststroke cohort [39]. Notably, the TWIST algorithm proposed by Smith et al. in 2017 incorporated trunk control test scores and hip extension strength to predict whether and when an individual patient walked independently after stroke [40]. They later built on their earlier work to examine other potential predictors, including age, knee extension strength, and Berg Balance Test score [41]. However, the trunk control test and lower limb muscle strength test were not included in the admission assessment records of patients at our hospital, and we will consider including them in future prospective studies. Some studies have also used neurophysiological or neuroimaging measures to predict walking independence in stroke patients [42,43,44], but one study showed that the absence of lower limb motor-evoked potentials did not preclude independent walking [45]. Although this study lacked more types of indicators to predict independent walking, we established a model with good predictive performance by using simple and easily accessible clinical data, which might be more in line with the actual clinical situation and had certain reference significance for clinical practice.

Limitations

Undoubtedly, our study has several limitations. First, this was a retrospective, single-center study, and selection bias was inevitable. In the future, we will conduct prospective studies with larger samples to obtain more accurate results. Second, we did not have a separate data set to externally validate the predictive model established in this study, so the generalizability may not be guaranteed. Further studies using data from other hospitals are needed. Third, our prediction model used only clinical data of rehabilitation admission, whereas other studies may have incorporated imaging features, electrophysiological features, etc. In future prospective studies, we should consider using more types of data to build predictive models. Fourth, we did not follow long-term outcomes of walking function in stroke patients after discharge, and predictors of long-term outcomes in stroke patients may be different from those at discharge. Fifth, we selected only 3 commonly used machine learning algorithms to build the models and compare them, and other algorithms such as AdaBoost and neural networks deserve further investigation. However, in this study, we initially found that the XGBoost model showed better predictive performance than the LR model in predicting independent walking in stroke patients based on clinical data at the time of rehabilitation admission. Our methodology and results will inform future studies.

Conclusions

Overall, the XGBoost model showed the best performance in predicting independent walking after stroke. The XGBoost and LR models together confirm that age, FMA-LE at admission, FAC at admission, and lower extremity spasticity are key factors affecting independent walking in stroke patients at discharge from hospital. Our study suggests that XGBoost can be used to build a predictive model of independent walking in stroke patients at discharge based on clinical data of hospitalized stroke patients, providing guidance for setting rehabilitation goals, selecting treatment plans, and making discharge plans.

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

AUC:

Area Under Curves

BI:

Barthel Index

FAC:

Functional Ambulation Category

FMA-LE:

Fugl-Meyer Motor Assessment of the Lower Extremity

FMB:

Fugl-Meyer Balance Assessment

LR:

Logistic Regression

LOS:

Length of Stay

ML:

Machine Learning

NIHSS:

National Institutes of Health Stroke Scale

NPV:

Negative Predictive Value

PPV:

Positive Predictive Value

RF:

Random Forest

ROC:

Receiver Operating Characteristic

SVM:

Support Vector Machines

XGBoost:

eXtreme Gradient Boosting

References

  1. Zhou M, Wang H, Zeng X, et al. Mortality, morbidity, and risk factors in China and its provinces, 1990–2017: a systematic analysis for the global burden of Disease Study 2017. Lancet. 2019;394:1145–58. https://doi.org/10.1016/S0140-6736(19)30427-1.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Rabinstein AA, Albers GW, Brinjikji W, Koch S. Factors that may contribute to poor outcome despite good reperfusion after acute endovascular stroke therapy. Int J Stroke. 2019;14:23–31. https://doi.org/10.1177/1747493018799979.

    Article  PubMed  Google Scholar 

  3. Preston E, Ada L, Stanton R, et al. Prediction of independent walking in people who are nonambulatory early after stroke: a systematic review. Stroke. 2021;52:3217–24. https://doi.org/10.1161/STROKEAHA.120.032345.

    Article  PubMed  Google Scholar 

  4. Preston E, Ada L, Dean CM, et al. What is the probability of patients who are nonambulatory after stroke regaining independent walking? A systematic review. Int J Stroke. 2011;6:531–40. https://doi.org/10.1111/j.1747-4949.2011.00668.x.

    Article  PubMed  Google Scholar 

  5. Harris JE, Eng JJ. Goal priorities identified through client-centred measurement in individuals with chronic stroke. Physiother Can. 2004;56:171–6. https://doi.org/10.2310/6640.2004.00017.

    Article  PubMed  Google Scholar 

  6. Teasell RW, Bhogal SK, Foley NC, Speechley MR. Gait retraining post stroke. Top Stroke Rehabil. 2003;10:34–65. https://doi.org/10.1310/UDXE-MJFF-53V2-EAP0.

    Article  PubMed  Google Scholar 

  7. Mayo NE, Wood-Dauphinee S, Côté R, et al. Activity, participation, and quality of life 6 months poststroke. Arch Phys Med Rehabil. 2002;83:1035–42. https://doi.org/10.1053/apmr.2002.33984.

    Article  PubMed  Google Scholar 

  8. Kwah LK, Herbert RD. Prediction of walking and arm recovery after stroke: a critical review. Brain Sci. 2016;6:53. https://doi.org/10.3390/brainsci6040053.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Craig LE, Wu O, Bernhardt J, Langhorne P. Predictors of poststroke mobility: systematic review. Int J Stroke. 2011;6:321–7. https://doi.org/10.1111/j.1747-4949.2011.00621.x.

    Article  PubMed  Google Scholar 

  10. Veerbeek JM, Kwakkel G, van Wegen EE, et al. Early prediction of outcome of activities of daily living after stroke: a systematic review. Stroke. 2011;42:1482–8. https://doi.org/10.1161/STROKEAHA.110.604090.

    Article  PubMed  Google Scholar 

  11. Bland MD, Sturmoski A, Whitson M, et al. Prediction of discharge walking ability from initial assessment in a stroke inpatient rehabilitation facility population. Arch Phys Med Rehabil. 2012;93:1441–7. https://doi.org/10.1016/j.apmr.2012.02.029.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Hirano Y, Hayashi T, Nitta O, et al. Prediction of independent walking ability for severely hemiplegic stroke patients at Discharge from a Rehabilitation Hospital. J Stroke Cerebrovasc Dis. 2016;25:1878–81. https://doi.org/10.1016/j.jstrokecerebrovasdis.2015.12.020.

    Article  PubMed  Google Scholar 

  13. Lin WY, Chen CH, Tseng YJ, et al. Predicting post-stroke activities of daily living through a machine learning-based approach on initiating rehabilitation. Int J Med Inf. 2018;111:159–64. https://doi.org/10.1016/j.ijmedinf.2018.01.002.

    Article  Google Scholar 

  14. Liang Y, Li Q, Chen P, et al. Comparative study of back Propagation Artificial neural networks and logistic regression model in Predicting Poor Prognosis after Acute ischemic stroke. Open Med (Wars). 2019;14:324–30. https://doi.org/10.1515/med-2019-0030.

    Article  PubMed  CAS  Google Scholar 

  15. Qu S, Zhou M, Jiao S, et al. Optimizing acute stroke outcome prediction models: comparison of generalized regression neural networks and logistic regressions. PLoS ONE. 2022;17:e0267747. https://doi.org/10.1371/journal.pone.0267747.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Liu S, See KC, Ngiam KY, et al. Reinforcement learning for clinical decision support in critical care: Comprehensive Review. J Med Internet Res. 2020;22:e18477. https://doi.org/10.2196/18477.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Campagnini S, Arienti C, Patrini M, et al. Machine learning methods for functional recovery prediction and prognosis in post-stroke rehabilitation: a systematic review. J Neuroeng Rehabil. 2022;19:54. https://doi.org/10.1186/s12984-022-01032-4.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Khan O, Badhiwala JH, Wilson JRF, et al. Predictive modeling of outcomes after traumatic and nontraumatic spinal cord Injury using machine learning: review of current progress and future directions. Neurospine. 2019;16:678–85. https://doi.org/10.14245/ns.1938390.195.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Zhang X, Fei N, Zhang X, et al. Machine learning prediction models for postoperative stroke in Elderly patients: analyses of the MIMIC database. Front Aging Neurosci. 2022;14:897611. https://doi.org/10.3389/fnagi.2022.897611.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. Hu Y, Yang T, Zhang J, et al. Dynamic prediction of mechanical thrombectomy outcome for Acute ischemic stroke patients using machine learning. Brain Sci. 2022;12. https://doi.org/10.3390/brainsci12070938.

  21. Chen R, Zhang S, Li J, et al. A study on predicting the length of hospital stay for Chinese patients with ischemic stroke based on the XGBoost algorithm. BMC Med Inf Decis Mak. 2023;23:49. https://doi.org/10.1186/s12911-023-02140-4.

    Article  Google Scholar 

  22. Noble WS. What is a support vector machine? Nat Biotechnol. 2006;24:1565–7. https://doi.org/10.1038/nbt1206-1565.

    Article  PubMed  CAS  Google Scholar 

  23. Luo Y, Li Z, Guo H, et al. Predicting congenital heart defects: a comparison of three data mining methods. PLoS ONE. 2017;12:e177811. https://doi.org/10.1371/journal.pone.

    Article  CAS  Google Scholar 

  24. Maeshima S, Okamoto S, Mizuno S, et al. Predicting walking ability in hemiplegic patients with putaminal hemorrhage: an observational study in a rehabilitation hospital. Eur J Phys Rehabil Med. 2021;57:321–6. https://doi.org/10.23736/S1973-9087.20.05823-2.

    Article  PubMed  Google Scholar 

  25. Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S + to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. https://doi.org/10.1186/1471-2105-12-77.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Heo J, Yoon JG, Park H, et al. Machine learning-based model for prediction of outcomes in Acute Stroke. Stroke. 2019;50:1263–5. https://doi.org/10.1161/STROKEAHA.118.024293.

    Article  PubMed  Google Scholar 

  27. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–45.

    Article  PubMed  CAS  Google Scholar 

  28. Yan C, Zheng Y, Zhang X, et al. Development and validation of a nomogram model for predicting unfavorable functional outcomes in ischemic stroke patients after acute phase. Front Aging Neurosci. 2023;15:1161016. https://doi.org/10.3389/fnagi.2023.1161016.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Gianella MG, Gath CF, Bonamico L, et al. Prediction of Gait without Physical Assistance after Inpatient Rehabilitation in severe subacute stroke subjects. J Stroke Cerebrovasc Dis. 2019;28:104367. https://doi.org/10.1016/j.jstrokecerebrovasdis.2019.104367.

    Article  PubMed  CAS  Google Scholar 

  30. Hassanipour S, Ghaem H, Arab-Zozani M, et al. Comparison of artificial neural network and logistic regression models for prediction of outcomes in trauma patients: a systematic review and meta-analysis. Injury. 2019;50:244–50. https://doi.org/10.1016/j.injury.2019.01.007.

    Article  PubMed  Google Scholar 

  31. Hou N, Li M, He L, et al. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. J Transl Med. 2020;18:462. https://doi.org/10.1186/s12967-020-02620-5.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  32. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining-KDD 2016, San Francisco, CA, USA; 2016. pp. 785–94.

  33. LANL Earthquake Prediction. 2019. https://www.kaggle.com/c/LANLEarth quake Prediction. Accessed 15 Mar 2020.

  34. Kim JK, Choo YJ, Chang MC. Prediction of motor function in stroke patients using machine learning algorithm: development of practical models. J Stroke Cerebrovasc Dis. 2021;30:105856. https://doi.org/10.1016/j.jstrokecerebrovasdis.2021.105856.

    Article  PubMed  Google Scholar 

  35. Cerasa A, Tartarisco G, Bruschetta R, et al. Predicting Outcome in patients with Brain Injury: differences between machine learning versus Conventional statistics. Biomedicines. 2022;10:2267. https://doi.org/10.3390/biomedicines10092267.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Kennedy C, Bernhardt J, Churilov L, et al. Factors associated with time to independent walking recovery post-stroke. J Neurol Neurosurg Psychiatry. 2021;92:702–8. https://doi.org/10.1136/jnnp-2020-325125.

    Article  PubMed  Google Scholar 

  37. Freire B, Bochehin do Valle M, Lanferdini FJ, et al. Cut-off score of the modified Ashworth scale corresponding to walking ability and functional mobility in individuals with chronic stroke. Disabil Rehabil. 2023;45:866–70. https://doi.org/10.1080/09638288.2022.2037753.

    Article  PubMed  Google Scholar 

  38. Louie DR, Simpson LA, Mortenson WB, et al. Prevalence of walking Limitation after Acute Stroke and its impact on discharge to Home. Phys Ther. 2022;102:pzab246. https://doi.org/10.1093/ptj/pzab246.

    Article  PubMed  Google Scholar 

  39. Hiratsuka K, Tamiya T, Matsuoka S, Kimura K. Stroke impairment, balance, and cognitive status on admission predict walking independence up to 90 days post-stroke but their contributions change over time. Int J Rehabil Res. 2023;46:61–9. https://doi.org/10.1097/MRR.0000000000000561.

    Article  PubMed  Google Scholar 

  40. Smith MC, Barber PA, Stinear CM. The TWIST Algorithm Predicts Time to walking independently after stroke. Neurorehabil Neural Repair. 2017;31:955–64. https://doi.org/10.1177/1545968317736820.

    Article  PubMed  Google Scholar 

  41. Smith MC, Barber AP, Scrivener BJ, Stinear CM. The TWIST Tool predicts when patients will recover independent walking after stroke: an observational study. Neurorehabil Neural Repair. 2022;36:461–71. https://doi.org/10.1177/15459683221085287.

    Article  PubMed  Google Scholar 

  42. Nomoto M, Miyata K, Kohno Y. White matter hyperintensity predicts independent walking function at 6 months after stroke: a retrospective cohort study. NeuroRehabilitation. 2023;53:557–65. https://doi.org/10.3233/NRE-230225.

    Article  PubMed  Google Scholar 

  43. Soulard J, Huber C, Baillieul S, et al. Motor tract integrity predicts walking recovery: a diffusion MRI study in subacute stroke. Neurology. 2020;94:e583–93. https://doi.org/10.1212/WNL.0000000000008755.

    Article  PubMed  Google Scholar 

  44. Piron L, Piccione F, Tonin P, Dam M. Clinical correlation between motor evoked potentials and gait recovery in poststroke patients. Arch Phys Med Rehabil. 2005;86:1874–8. https://doi.org/10.1016/j.apmr.2005.03.007.

    Article  PubMed  Google Scholar 

  45. Smith MC, Scrivener BJ, Stinear CM. Do lower limb motor-evoked potentials predict walking outcomes post-stroke? J Neurol Neurosurg Psychiatry. 2023. https://doi.org/10.1136/jnnp-2023-332018.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We acknowledge our colleagues in the medical records Department of the China Rehabilitation Research Center for their assistance in this study.

Funding

This work was supported by the general program of China Rehabilitation Research Center [grant numbers 2023ZX-14].

Author information

Authors and Affiliations

Authors

Contributions

ZT- design of work, analysis, interpretation of data, drafted the manuscript. WS- analysis, interpretation of data, drafted and substantively revised the manuscript. TL- design of work, acquisition of data, drafted the manuscript. HL- design of work, substantively revised the manuscript. YL- analysis, interpretation of data. HL- analysis, interpretation of data. KH- analysis, interpretation of data. MM- acquisition of data. JL- acquisition of data. XL: acquisition of data. XZ: substantively revised the manuscript. LS: substantively revised the manuscript. HZ- conception, design of work, drafted and substantively revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Hao Zhang.

Ethics declarations

Ethics approval and consent to participate

This study protocol was approved by the medical ethics committee of China Rehabilitation Research Center (approval number 2022-141-02). Informed consent was not obtained as this was a retrospective, hospital-based study. As this was a hospital-based retrospective study, the exemption from obtaining informed consent was granted by the medical ethics committee of China Rehabilitation Research Center. All methods and procedures were carried out in accordance with relevant guidelines and regulations.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, Z., Su, W., Liu, T. et al. Prediction of poststroke independent walking using machine learning: a retrospective study. BMC Neurol 24, 332 (2024). https://doi.org/10.1186/s12883-024-03849-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12883-024-03849-z

Keywords