The machine learning methods to analyze the using strategy of antiplatelet drugs in ischaemic stroke patients with gastrointestinal haemorrhage

Background For ischaemic stroke patients with gastrointestinal haemorrhage, stopping antiplatelet drugs or reducing the dose of antiplatelet drugs was a conventional clinical therapy method. But not a study to prove which way was better. And the machinery learning methods could help to obtain which way more suit for some patients. Methods Data from consecutive ischaemic stroke patients with gastrointestinal haemorrhage were prospectively collected. The outcome was a recurrent stroke rate, haemorrhage events, mortality and favourable functional outcome (FFO). We analysed the data using conventional logistic regression methods and a supervised machine learning model. We used unsupervised machine learning to group and analyse data characters. Results The patients of stopping antiplatelet drugs had a lower rate of bleeding events (p = 0.125), mortality (p = 0.008), rate of recurrence of stroke (p = 0.161) and distribution of severe patients (mRS 3–6) (p = 0.056). For Logistic regression, stopping antiplatelet drugs (OR = 2.826, p = 0.030) was related to lower mortality. The stopping antiplatelet drugs in the supervised machine learning model related to mortality (AUC = 0.95) and FFO (AUC = 0.82). For group by unsupervised machine learning, the patients of better prognosis had more male (p < 0.001), younger (p < 0.001), had lower NIHSS score (p < 0.001); and had a higher value of serum lipid level (p < 0.001). Conclusions For ischemic stroke patients with gastrointestinal haemorrhage, stopping antiplatelet drugs had a better prognosis. Patients who were younger, male, with lesser NIHSS scores at admission, with the fewest history of a medical, higher value of diastolic blood pressure, platelet, blood lipid and lower INR could have a better prognosis. Supplementary Information The online version contains supplementary material available at 10.1186/s12883-023-03422-0.


Introduction
Antiplatelet drugs were a guideline-recommended secondary prevention method for ischemic stroke patients [1].It also had many sides effect, such as gastrointestinal toxicity and haemorrhage events [2].Gastrointestinal haemorrhage was a common complication for stroke patients with antiplatelet drugs [3].
When stroke patients occurred gastrointestinal haemorrhage, the antiplatelet drugs are discontinued temporarily, or reduced doses of drugs prevent the progression of bleeding.A study suggested that stopping antiplatelet medicines for more than ten days could bring a higher risk of ischemic stroke, for the remnant drugs in the blood could not effectively inhibit platelet circulation [4].The second method, reducing the dose of antiplatelet medicines, may prevent an ischemic stroke attack.Still, the lower dose of drugs did not reduce the risk of haemorrhage events for patients [5].Therefore, there was no best method for ischemic stroke patients with gastrointestinal haemorrhage, especially since there had no definite study evidence to compare the effect of the two ways.
As mentioned above, many studies had different conclusions for the same research question.Machinery learning methods were better for dealing with these conditions [6].Factor analysis and reduction of dimensionality could effectively deal with complicated confounders and collinearity of a great deal of character [7].Unsupervised machine learning could analyse and classify the prospective data [8].And the analysis and classification could be more suit for clinical patients in hospitals [9].
In our study, we observed the relationship between discontinued temporarily or reduced antiplatelet doses of drugs with stroke patients with gastrointestinal haemorrhage.Then we deal with data with a supervised machine learning model.At last, we used unsupervised machine learning methods to group and explore the character of patients.

Patients
The study is a prospective observational cohort design.We consecutively recruited acute ischemic stroke patients who occurred gastrointestinal haemorrhage within 30 days after onset.All patients were from the Neurology Department and Rehabilitation Department of Affiliated Liutie Central Hospital of Guangxi Medical University.The patients were enrolled from January 1, 2017, to March 30, 2022, and followed up until June 30, 2022.All patients' diagnoses met the WHO stroke diagnostic criteria, which included clinical character and neurological image examination.The gastrointestinal haemorrhage had hematemesis, hematochezia and tarry stool.The gastrointestinal haemorrhage was proven through clinical features and laboratory examination by experienced neurologists or gastroenterologists.
The patients who stopped antiplatelet drugs after gastrointestinal haemorrhage were assigned to stopped group.And those who reduced the dose of antiplatelet medicines were assigned to reduced group.The different therapeutic schedule of patients was for their various conditions or evaluated by their attending doctor.The time range of stopping antiplatelet drugs was about 15 days for patients.The dose of aspirin or clopidogrel in reduced group was 25 mg twice or 50 mg once daily.We are calculated the sample size At the α = 5%, 1-β = 0.80 level by SAS based on a previous article [3].Considering a 10% loss to follow-up, we initially recruited more than 200 patients.
The inclusion criteria were as follows: (1) aged 18 years or older; (2) having received an antiplatelet drug and other conventional therapy after admission; (3) gastrointestinal haemorrhage occurred within 30 days after onset.
The exclusion criteria were as follows: (1) patients with a recent history of peptic ulcer or gastrointestinal bleeding before onset; (2) taking anticoagulant drugs; (3) patients with intracerebral haemorrhage, subarachnoid haemorrhage or severe systemic disease.(4) patients withdraw study or cannot provide outcome events.

Data collected and outcome
Baseline data were collected from electronic clinical records and structured questionnaires after admission.The detail of the data was in the baseline character table (Table 1).Due to the narrow inclusion criteria, the type of patients was lacunar infarction (n = 71, 26.49%) and largeartery atherosclerosis ischemic stroke (n = 197,73.51%),the study did not include atrial fibrillation or other patients cardioembolic Stroke patients.
The primary outcome was a recurrent stroke rate within 90 days after onset.All recurrence patience had a conventional therapy in hospital base on their condition.The second outcome included haemorrhage events, mortality within 90 days after onset and favourable functional outcome at 90 days after onset.The favourable functional outcome (FFO) was defined as mRs < = 2 at 90 days after admission.The unfavourable functional outcome (UFO) was defined as mRs > 2 at 90 days after admission.The haemorrhage events included intracerebral haemorrhage and gastrointestinal haemorrhage.The score of clinical scale and outcome events were determined by experienced neurologists blind to the patients' group.

Baseline data
We process the data through SPSS 23.0 for Windows (conventional statistical analysis) and Python 3.80 (machine learning).We used a t-test to analyse continuous variables following a normal distribution.For abnormally distributed continuous variables, we used a non-parametric test.A chi-square test was conducted for categorical data and ranked data.

Outcome events
We compared the events of group differences using the chi-square test.We analysed the relationship between risk factors such as difference group and outcome events using conventionally univariable and multivariable logistic regression methods.We screen the significant risk factors by univariable logistic regression methods (P < = 0.0.05 or factor with clinical significance).These factors would bring into multivariable logistic regression methods and supervised machine learning.

Supervised machine learning
We train all data (including outcomes events) by logistic regression methods, support vector machine (SVM) and decision trees methods, voting classifier and random forest.We compare model performance through precision and area under the curve (AUC) plot of each model.Then we evaluated the importance of variables in the model.

Unsupervised machine learning
We analyzed all baseline data by cluster analysis.We cluster baseline by K-means methods and Hierarchical Clustering.Then we choose the best number of groups by the two methods (the detail of Unsupervised machine learning in Supplemental Material).
We compare outcome events by the above difference group.Most outcome events between groups had a significant difference and were the best choice for grouping.Then we analyzed the different characteristics of factors in the group.

Baseline character
Initially, the study recruited 300 patients.Among them, eight patients were no longer suited for inclusion criteria, 15 were lost to follow-up, and nine withdrew from the study.Finally, we acquired the data from 268 patients (176 in stopped group and 92 in reduced group).Among the 268 patients (mean age: 67.00 ± 13.49 years), 113 (40.3%) were female.Stopped group had a lower rate of patients with a history of antihypertensive drugs than reduced group.The other baseline data between the two groups had not shown a significant difference (Table 1).

Primary outcome events
As shown in Fig. 1a, to compare with reduced group (recurrence patients = 6, ischemic stroke = 5, hemorrhagic stroke = 1), the patients of stopped group (recurrence patients = 5, ischemic stroke = 5) had lower rate of recurrence of stroke (p = 0.161), but the difference had not a statistic significant.
In the univariate logistic regression analysis, we could not acquire the relationship between risk factors and the recurrence of stroke events or bleeding events.

Second outcome events
As shown in Fig. 1a, to compare with reduced group, the patients of stopped group had lower mortality (p = 0.008) and a lower rate of bleeding events (p = 0.125), but the last results had not a statistic significant.As shown in Fig. 1b, to compare with reduced group, the patients of stopped group had a lower rate distribution of severe patients (mRS 3-6) at 90 days (p = 0.056) but the results had not a statistic significant.
By the multivariable logistic regression analysis, we found that the older age (OR = 1.060, p = 0.009), higher NIHSS scores at admission (OR = 2.179, p = 0.008) and patients in reduced group (OR = 2.826, p = 0.030) were still related to the higher mortality within 90 days after admission.The older age (OR = 1.028, p = 0.021), patients in reduced group (OR = 0.621, p = 0.044) and higher NIHSS scores at admission (OR = 1.090, p < 0.001) were still related to the unfavourable functional outcome at 90 days after admission.

Supervised machine learning
We explored the relationship between risk factors and mortality and FFO with a machine learning model.The support vector machine model, including six factors, had the best performance for mortality (the accuracy score = 0.952), and the support vector machine model, including three factors, had the best performance for FFO (the accuracy score = 0.652).
The final support vector machine model for mortality showed that its AUC was 0.92, and prediction accuracy was 0.95 (Fig. 2a).The final support vector machine model for FFO showed that its AUC was 0.820, and prediction accuracy was 0.78 (Fig. 2b).The coefficient of the model is displayed in Table 2.

Unsupervised machine learning
As shown in Fig. 3a, the line chart of Silhouette score showed that the 2 group was the best choice.The hot map showed that the 2 group was better for the Hierarchical Clustering model (Fig. 3b).The two-dimension scatter plot showed that the two groups by k-means had more precise discrimination (Fig. 3c).

Fig. 1 Difference rates of outcome between stopped group and reduced group
We grouped data into km2-1group and km2-2 group by K-means methods.We grouped data into hc2-1 group and hc2-2 group by the hierarchical clustering methods.The outcome event rate had not shown a significant difference between hc2-1 group and hc2-2 group.The km2-1 group had a lower rate of bleeding events (p < 0.001), lower mortality (p < 0.001), a lower rate of recurrence of stroke (p = 0.107) and a higher rate of FFO (P = 0.007) than km2-2 group (Fig. 4).Therefore, the 2 group by K-means method was the best grouping method.
To compare with km2-2 group, the patients of the km2-1 group had lesser female patients, lesser older patients, higher diastolic blood pressure at admission, lower NIHSS score at admission, higher rate of statin used, lower rate of disease history, higher value of platelet at admission and higher value of blood lipid at admission (Table 3).

Discussion
Our study found that for ischemic stroke patients with a gastrointestinal haemorrhage, stopping antiplatelet drugs had a lower rate of bleeding events, mortality, rate of recurrence of stroke and a higher rate of FFO than reducing the dose of antiplatelet medicines.Stopping antiplatelet drugs is related to a higher rate of FFO and lower mortality.The machine learning model also showed that stopping antiplatelet drugs affects patients' mortality and   FFO.The group by unsupervised machine learning methods suggested that many risk factors significantly differed for different outcome events groups.
The stopping antiplatelet drugs had a lower rate of bleeding events and stroke recurrence rate for patients.However, the difference was not statistically significant for the fewer events.Our results suggested that stopping antiplatelet drug had a better effect and safety than reducing the dose of antiplatelet drug.One study showed that discontinuous antiplatelet drugs did not affect patient outcomes [4].Another study suggested that lower doses of antiplatelet drugs did not reduce the risk of bleeding  events for ischemic stroke patients [5].These studies all were consistent with our research.The difference type of antiplatelet drug had not a significant difference in two group and the type had not relate to all outcome events.For mortality and FFO outcomes, it was stopping antiplatelet drugs related to better outcomes through univariable and multivariable logistic regression methods.The coefficient of the machine learning model suggested that stopping antiplatelet drugs affected predicting mortality and FFO for ischemic stroke patients with gastrointestinal haemorrhage.These results proved the significant relationship between stopping antiplatelet drugs and lower mortality and FFO.Some studies also had similar results for ischemic stroke or coronary artery disease patients [10].
When we analysis the difference in the km2-1 and km2-2 group.The km 2-1 group had a better prognosis.We found that younger or lower NIHSS scores had a better outcome, consistent with other studies and guides [1,11,12].Meanwhile, the patients using statin had a better outcome, consistent with many studies, including our research [13][14][15].The km2-1 group had fewer patients who had strokes, and other medical histories also were reasonable for the patients with medical history had a worse condition.Therefore, the km2-1 group had fewer patients with a history of using drugs for less medical history.The female had a worse outcome, consistent with some recent studies and our study [15,16].The results also could explain that the km2-1 group had more patients smoking and drinking.So, we should realize that female patient had a worse prognosis though they had lower morbidity of ischemic stroke.The km2-1 group patients had higher diastolic blood pressure at admission, which was also unusual.The patients with higher diastolic blood pressure may have a better condition than those with lower blood pressure because the mean value of diastolic blood pressure in the km2-2 group is less than normal values (80 mmHg).
To analysis the difference in laboratory data between the km2-1 group and the km2-2 group, we found that the patients of the km2-1 group with a higher value of platelet and a lower value of INR at admission were reasonable for a better outcome.Patients with bleeding events obviously could benefit from better coagulation function.The patients of the km2-1 group with a higher value of glutamic-pyruvic transaminase at admission had a complex reason.The value in the two groups all was in normal values.Therefore, the lower value of glutamicpyruvic transaminase may suggest that patients had poor conditions such as malnutrition.It also could explain that patients of the km2-1 group had a higher value of serum lipid level (TG, TC, LDL).The lipid contradiction also might partly explain the condition [17].Though a higher serum lipid level was a risk factor for an ischemic stroke attack, the ischemic stroke patients with a higher serum lipid level had a better outcome, such as lower mortality and rate of haemorrhage events [18].However, the conclusion needs further study to prove.We may have a caution for lower serum lipid levels for ischemic stroke patients with gastrointestinal haemorrhage.
Our study has several limitations.First, our machinelearning model did not verify by external data.Therefore, the extensionality of the model needs to be further proved.But our study focused on the relationship between factors and outcomes.The extensionality of the model did not affect the statistical power of the relationship.Second, the results of unsupervised machine learning also did not verify by external data.However, the results were confirmed by the outcome events of our data.To further prove these results with more data was reasonable.
In conclusion, for ischemic stroke patients with gastrointestinal haemorrhage, patients stopping antiplatelet drugs had a lower rate of mortality and a higher rate of FFO than patients of reducing the dose of antiplatelet medicines.For grouping by unsupervised machine learning methods, younger, male, with lesser NIHSS scores at admission, with the fewest history of a medical and slightly higher value of diastolic blood pressure, with higher value of platelet, lower value of INR and higher value of blood lipid could have a better prognosis.

Fig. 2 a
Fig. 2 a The AUC plot of the support vector machine model for mortality; b The AUC plot of the support vector machine model for FFO.AUC area under the curve; FFO favourable functional outcome

Fig. 3 a
Fig. 3 a The line chart of Silhouette score for k-means methods; b The hot map for hierarchical clustering methods; c The two-dimension scatter plot of 2 groups by k-means

Table 1
Baseline characteristic P* was calculated by ANOVA, Chi-square test as appropriate.CHD Coronary Heart Disease, INR international normalized ratio, ALT glutamic-pyruvic transaminase, HDL-C high-density lipoprotein cholesterol, LDL-C low-density lipoprotein

Table 2
The coefficient of support vector machine modelThe coefficient was calculated by support vector machine model